Author Topic: LyricsReloaded (Updated)  (Read 97217 times)

frankz

  • Hero Member
  • *****
  • Posts: 3088
Also, these were in the original and not this new and improved version.  I don't know if they're relevant any more.
Code
- [regex, 'googletag.*\);', "\n"]
- [regex, '<!--sse-->', ""]
- [regex, '<!--/sse-->', ""]
- [regex, "’", "'"]
Nice to see you're still keeping you hand in this, frankz. Does this part go in the filters section or the post-filters?
They were in the post-filters.  

I'm pretty sure the sse ones are irrelevant now - I put those in because that text was showing up in the lyrics and now it's not, so either the new version fixes whatever I couldn't figure out and fix in a more elegant way the first time or genius no longer includes that.

I think the googletag one is from way back when the original version was first posted in the original thread by the original author of this provider.  I don't know if it is needed any more (it doesn't seem like it).

For the plugin update I'll post once I get the time (and permission from crisp and Redearth to use their great works), I plan to use the header replacement from hiccup, the double line to single line from the user suggestion, and the quote character replacement and trim from the original, but not those other old ones as they don't appear to be needed any longer.
A smile is happiness you'll find right under your nose.

phred

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7298
Thanks again. I guess for the time being I'll leave them out as Genius does seem to be working much better now.

And it sounds like you've got (at least) one more update left in you before you completely bow out. I appreciate what you've done to breathe new life into the plugin up until now and can certainly understand if it's time to pass the torch.
Download the latest MusicBee v3.4 patch from here.
Download the latest MusicBee v3.5 beta patch from here.
Unzip into your MusicBee directory and overwrite existing files.

----------
Check out the MusicBee Wiki.
How to post screenshots is here

hiccup

  • Hero Member
  • *****
  • Posts: 5579
Just throwing this out here, not even sure if it is something the plugin or a regex could address, and I also have no idea from what lyrics provider it was sourced:

Code
I fought against the bottle,
But I had to do it drunk –
Took my diamond to the pawnshop –
But that don’t make it junk.
So instead of e.g. don't  you will get donât.
For another song I got itâs instead of it's.
A quick google hints towards this being related to things like PHP, UTF8.
Maybe somebody with some coding talents has an idea about this?

A second thought:
I was assuming this was retrieved and output using the plugin.
But I now wonder about that: the plugin does not fully disable MusicBee's internal lyrics retrieval, does it?
So if you press 'search next provider', it could either present a result from the plugin, or from MB's internal engine?
If that assumption is correct, this could be completely beyond what the plugin could fix?

LazR

  • Newbie
  • *
  • Posts: 14
I also fixed Musixmatch, and added the minor tweaks to crisp's fix for Genius.
For the sake of easy copy-pasting, I'm putting them both here.

Genius:
Code
name: Genius

variables:
    artist:
        type: artist
        filters:
        - strip_diacritics
        - lowercase
        - [replace, "!!!", "chk-chik-chick"]
        - [replace, "&", "and"]
        - [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
        - [regex, '\.+|,+|(\W+(?=$))|(^\W+)', ""]
        - [regex, "'", ""]
        - [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
        - [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
        - [strip_nonascii, -]
    title:
        type: title
        filters: artist

config:
    url: "https://genius.com/{artist}-{title}-lyrics"
    pattern: ['<div class="Lyrics__Container.*?">(?<lyrics>.*)<div class="Lyrics__Footer.*?">']

post-filters:
- br2nl
- strip_html
- utf8_encode
- entity_decode
- clean_spaces


Musixmatch:
Code
name: Musixmatch

variables:
    artist:
        type: artist
        filters:
        - strip_diacritics
        - lowercase
        - [regex, "'", ""]
        - [regex, "/", " "]
        - [regex, '\s&(?=\s)', " "]
        - [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
        - [regex, '[^\sa-z0-9]\s*', ""]
        - [strip_nonascii, -]
    title:
        type: title
        filters:
        - strip_diacritics
        - lowercase
        - [regex, " '|' |/", " "]
        - [regex, "'", " "]
        - [regex, '\.+|,+|/+|(\W+(?=$))|(^\W+)', ""]
        - [regex, '\s&(?=\s)', " and"]
        - [strip_nonascii, -]

config:
    url: "http://www.musixmatch.com/lyrics/{artist}/{title}"
    pattern: ['<p class="mxm-lyrics__content.*?">(?<lyrics>.*?)<div [^>]*"lyrics-report".*?>', s]

post-filters:
- [regex, "<script.*?</script>", "", s]
- strip_html
- utf8_encode
- entity_decode
- clean_spaces

(I would think this would also fix Musixmatch-Asian, but I'm not a good tester for it. If anyone wants it, I can add that too.)

Oh my gosh, thank you so much Redearth. The only issue I am noticing is where ads are placed between lyric sections (ex. [chorus] [verse] etc), it is not adding an extra space. And it only happens on some songs not all.
Here is a screenshot detailing it -- https://imgur.com/a/PFBA6eJ

Is there a way to get that fixed?

Redearth

  • Newbie
  • *
  • Posts: 10
Oh my gosh, thank you so much Redearth. The only issue I am noticing is where ads are placed between lyric sections (ex. [chorus] [verse] etc), it is not adding an extra space. And it only happens on some songs not all.
Here is a screenshot detailing it -- https://imgur.com/a/PFBA6eJ

Is there a way to get that fixed?

You would need to fix the source, or be okay with inconveniences like this. This plugin just brings in whatever html/script is on the page, not the page you see. (Always view source.)

You'd have to add a post-filter regex that adds a newline in front of lines that begin with "[" that also don't have a blank line before them.
Last Edit: November 10, 2021, 07:21:20 AM by Redearth

sveakul

  • Hero Member
  • *****
  • Posts: 1839
It pains me to say it, but I think the new Genius yml is no longer pulling lyrics--can others confirm?  If so, the Bat Beacon goes out to crisp and Redearth for a fix!

crisp

  • Newbie
  • *
  • Posts: 4
This is an odd one. The way I zeroed in on the solution before was (1) making the config pattern as permissive as possible and turning off all the post-filters to dump the whole HTML in the MusicBee lyrics pane, (2) finding the tags that surrounded the lyrics (i.e., Lyrics__Container and Lyrics__Footer) and modifying the config pattern with them, and (3) iteratively add post-filters to get rid of HTML tags, bad formatting, etc. In this case, I'm not even getting any HTML when I use config pattern '(?<lyrics>.*)' (which I think should just capture the whole page?). I do get a "Lyrics found" printout in the log file though.

My first instinct was to curl an example URL from the log file, which got me a tiny HTML telling me I got redirected. If I uppercased the first character of the artist's name (so `curl https://genius.com/Artist-title-lyrics` instead of `curl https://genius.com/artist-title-lyrics`), curl returned the whole page as expected, but doing the same thing on the plugin side didn't really help. Also tried building the plugin myself to add some debug prints, but it looks like it depends on an old unavailable (?) version of YamlDotNet, so no luck there either.

sveakul

  • Hero Member
  • *****
  • Posts: 1839
@crisp:  I'm guessing that you haven't found a way to restore this yml despite the data I PM'd you?  If not, thanks anyway for making the attempt.  I know it's bound by the limits of the dll itself.

crisp

  • Newbie
  • *
  • Posts: 4
Well this is embarassing, I wasn't seeing lyrics because at some point during debugging I removed the 's' flag from the config pattern.

@crisp:  I'm guessing that you haven't found a way to restore this yml despite the data I PM'd you?  If not, thanks anyway for making the attempt.  I know it's bound by the limits of the dll itself.
Thanks sveakul, your message pointed me in the right direction, here's the updated yml:
Code
name: Genius (2021-11-30)

variables:
    artist:
        type: artist
        filters:
        - strip_diacritics
        - lowercase
        - [replace, "!!!", "chk-chik-chick"]
        - [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
        - [regex, '\.+|,+|(\W+(?=$))|(^\W+)', ""]
        - [regex, "'", ""]
        - [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
        - [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
        - [strip_nonascii, -]
    title:
        type: title
        filters: artist

config:
    url: "https://genius.com/{artist}-{title}-lyrics"
    pattern: ['<div id="lyrics-root-pin-spacer">(?<lyrics>.*)<div class="Lyrics__Footer-sc-', 's']

post-filters:
- br2nl
- strip_html
- utf8_encode
- entity_decode
- clean_spaces
- [regex, '[\[\{].{1,75}[\]\}]', ""]
- [replace, "\n\n", "\n"]
- trim

sveakul

  • Hero Member
  • *****
  • Posts: 1839
THANKS, crisp, this is working terrific again!  I think it may now bypass the "gotchas" that they throw in occasionally as it works more on the "root."  I did notice that it seems to strip ALL the empty lines between verses/sections, but I removed the - [replace, "\n\n", "\n"]  line from the post filters section (and for the heck of it rolled back - [regex, '[\[\{].{1,75}[\]\}]', ""] to - [regex, '\[.{1,75}\]', "") and it displays fine for me;  just personal taste, others may choose to use the originals.

You are a gentleman and a scholar, and thanks again for fixing Genius for us.

phred

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7298
For anyone who wants to use the update provided by crisp, and is too lazy to make sveakul's changes...

Code
name: Genius (2021-11-30)

variables:
    artist:
        type: artist
        filters:
        - strip_diacritics
        - lowercase
        - [replace, "!!!", "chk-chik-chick"]
        - [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
        - [regex, '\.+|,+|(\W+(?=$))|(^\W+)', ""]
        - [regex, "'", ""]
        - [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
        - [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
        - [strip_nonascii, -]
    title:
        type: title
        filters: artist

config:
    url: "https://genius.com/{artist}-{title}-lyrics"
    pattern: ['<div id="lyrics-root-pin-spacer">(?<lyrics>.*)<div class="Lyrics__Footer-sc-', 's']

post-filters:
- br2nl
- strip_html
- utf8_encode
- entity_decode
- clean_spaces
- [regex, '\[.{1,75}\]', ""]
- [regex, '\n{2,}',"\n\n", 's']
- trim

EDIT: Fixed typo and added line to get rid of the double line spacing per crisp's reply immediately following.
Last Edit: December 01, 2021, 07:36:56 PM by phred
Download the latest MusicBee v3.4 patch from here.
Download the latest MusicBee v3.5 beta patch from here.
Unzip into your MusicBee directory and overwrite existing files.

----------
Check out the MusicBee Wiki.
How to post screenshots is here

crisp

  • Newbie
  • *
  • Posts: 4
For anyone who wants to use the update provided by crisp, and is too lazy to make sveakul's changes...
Thanks phred, if this doesn't work for anyone, it's just a missing square bracket at the end of the last regex. And if you wanna get rid of the leftover double newlines, this can go right after that regex:
Code
- [regex, '\n{2,}',"\n\n", 's']

sveakul

  • Hero Member
  • *****
  • Posts: 1839
Thanks phred, if this doesn't work for anyone, it's just a missing square bracket at the end of the last regex.
Sorry, my fault not phred's, that will teach me not do copy/paste operations when I'm falling asleep.  crisp, thanks for the extra regex code that does indeed remove DOUBLE blank spaces that slipped by my edit.  For the sake of completeness, below is crisp's new full genius.yml code that includes the bracket I missed and the extra regex to be added to prevent the double blanks:

Code
name: Genius (2021-11-30)

variables:
    artist:
        type: artist
        filters:
        - strip_diacritics
        - lowercase
        - [replace, "!!!", "chk-chik-chick"]
        - [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
        - [regex, '\.+|,+|(\W+(?=$))|(^\W+)', ""]
        - [regex, "'", ""]
        - [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
        - [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
        - [strip_nonascii, -]
    title:
        type: title
        filters: artist

config:
    url: "https://genius.com/{artist}-{title}-lyrics"
    pattern: ['<div id="lyrics-root-pin-spacer">(?<lyrics>.*)<div class="Lyrics__Footer-sc-', 's']

post-filters:
- br2nl
- strip_html
- utf8_encode
- entity_decode
- clean_spaces
- [regex, '\[.{1,75}\]', ""]
- [regex, '\n{2,}',"\n\n", 's']
- trim

Moshi_

  • Full Member
  • ***
  • Posts: 132
  • http://www.last.fm/user/Moshi_
The only issue I am noticing is where ads are placed between lyric sections (ex. [chorus] [verse] etc), it is not adding an extra space. And it only happens on some songs not all.
Here is a screenshot detailing it -- https://imgur.com/a/PFBA6eJ

Is there a way to get that fixed?

I've got a different problem. Bracket words are not showing at all. Any way to fix this? Using sveakul's edit.

Screenshot:

sveakul

  • Hero Member
  • *****
  • Posts: 1839
The bracketed section headers are removed by default, via a contribution to the code from frankz.  Most of us prefer this, but if you want them left there remove frankz's code line below from the rest of the script:

Code
- [regex, '\[.{1,75}\]', ""]