Hmm...this is not good. I think if I change the match pattern from this:
pattern: ['<div\s+class="song_body-lyrics"[^>]*?>[\s\S]*?<p>(?<lyrics>[\s\S]*?)</p>', s]
...to this...
pattern: ['<div\s+class="song_body-lyrics"[^>]*?>[\s\S]*?(?<div\s+class="lyrics">[\s\S]*?)</div>', s]
...or something, then I can probably get rid of the error, but I don't think with everything that's going on around the actual text of the lyrics it'll work anyway.
Here's everything that happens between when it hits the match pattern and when the text of the lyrics starts
<div class="song_body-lyrics">
<h2 class="text_label text_label--gray text_label--x_small_text_size u-top_margin">Tell Your Friends Lyrics</h2>
<div initial-content-for="lyrics">
<div class="lyrics">
<!--sse-->
<p><a href="/The-weeknd-tell-your-friends-lyrics#note-7670658" data-id="7670658" class="referent" ng-click="open()" ng-class="{
'referent--linked_to_preview': song_ctrl.referent_has_preview(fragment_id),
'referent--linked_to_preview_active': song_ctrl.highlight_preview_referent(fragment_element_id),
'referent--purple_indicator': song_ctrl.show_preview_referent_indicator(fragment_element_id)
}" prevent-default-click="" annotation-fragment="7670658" on-hover-with-no-digest="set_current_hover_and_digest(hover ? fragment_id : undefined)" classification="accepted" image="false" pending-editorial-actions-count="0">
Every line or block of text has that kind of mess in it.
Genius may be hosed unless anyone has any ideas. Here's are the filters for it.
post-filters:
- strip_html
- clean_spaces
- utf8_encode
- [regex, 'googletag.*\);', "\n"]
- [regex, '\[.{1,75}\]', ""]
- [regex, "’", "'"]
- [replace, "\n\n", "\n"]
- trim
It might be possible, but if it is, it's beyond my abilities to figure the RegEx out. It took every working cell in my brain to come up with this to strip the section headers.
- [regex, '\[.{1,75}\]', ""]