Here is how to add Genius.com as a provider with this plugin.
1. Save the following code as a text file named "genius.com.yml"
2. Place the file in <MusicBee's appdata folder>/mb_LyricsReloaded/providers/ folder (the folder already exists if the plugin is enabled)
3. Now you will see "Genius" entry from the lyrics providers list in Preferences> Tags(2)> lyrics setting.
Indeed, thank you, redwing, for posting that extremely helpful Genius.com provider template. I started using it awhile ago and eventually realized that it was having issues retrieving lyrics for some tracks/artists with various punctuation patterns, so I decided to take a look at the log and see what was up. And then I accidentally spent waaaaaaaay too much time discovering all the annoying variations in how lyrics sites format artist+title strings and flexing my puny regex muscles, with the aid of
Pythex.org.
The result of my nerdocity is a modified
Genius.com provider config + a config for
Musixmatch.com. Neither are 100%, of course, but I tested it with a bunch of stupid character-containing bands and song titles and, while I'm quite sure that both configs could be significantly cleaned up by utilizing some of the built-in plugin filters and less terrible regex skillz, they seem to get the job done most of the time. They
will, however, both still fail on song titles with parentheses, as the plugin strips everything on and after the 1st "("
before being scrubbed by the provider config (plus, feat. <artist> patterns are all over the place, so
F that).
Anyway, here's my modified code for
Genius (see redwing's
original instructions for how to install)
name: "Genius"
variables:
artist:
type: artist
filters:
- strip_diacritics
- lowercase
- [replace, "!!!", "chk-chik-chick"] # !!! (Chk Chk Chk) artist exception
- [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
# ^ Strip F./ft/eat/uring + everything after
- [regex, '\s&(?=\s)', " and"] # Replace " &" with " and"
## ^ Currently superfluous (for ARTIST) bc plugin preemptively replaces "&" with "and"
### ^ But still necessary for <title> "&" replacement
- [regex, '[’\.,'']+|(\W+(?=$))|(^\W+)', ""] # Strip "'/’" + "." + "," + beginning/end of string non-word chars
- [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
# ^ Replace medial non-alphanumeric char(s) with single "-" e.g., f**k > f-k
- [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
# ^ Clean up any remaining successive non-alphanumeric chars before strip_nonascii
## ^ Long & stupid bc couldn't figure out YAML-friendly non-capturing "(?:...)"
### ^ EZ version: [regex, '\W+(?:\W)(?<!$)', " "]
- [strip_nonascii, -]
title:
type: title
filters: artist
config:
url: "http://genius.com/{artist}-{title}-lyrics"
pattern: ['<div\s+class="lyrics"[^>]*?>(?<lyrics>.*?)</div>', s]
post-filters:
- strip_html
- clean_spaces
- utf8_encode
For
Musixmatch, copy/paste below code to a text file named "
musixmatch.com.yml", save to MusicBee's
<appdata folder>/mb_LyricsReloaded/providers/, open Musicbee and add new Musixmatch provider from
Preferences > Tags (2) > Auto-tagging > Lyrics [...]name: "Musixmatch"
variables:
artist:
type: artist
filters:
- strip_diacritics
- lowercase
- [replace, "!!!", "artist-46206"] # !!! (Chk Chk Chk) artist exception
- [replace, "+/-", "p%m"] # +/- artist janky exception (step 1)
- [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
# ^ Strip F./ft/eat/uring + everything after
- [regex, '[’'']+', "%27"] # URL encode "'/’" bc MM is cool w/ dat shit
- [regex, '\s&\s(?=the)', " and "] # ONLY if succeeded by "the", replace " & " with " and "
## ^ Currently superfluous bc plugin preemptively replaces "&" with "and"
### ^ So MM will choke if <artist> contains "& (?!the)"
- [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
# ^ Replace medial non-alphanumeric char(s) with single "-" (except "'") e.g., M.I.A. > M-I-A.
- [regex, '\W+(?=$)', ""] # Strip end-of-string non-word chars
- [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
# ^ Clean up any remaining successive non-alphanumeric char(s) before strip_nonascii
## ^ Long & stupid bc couldn't figure out YAML-friendly non-capturing "(?:...)"
### EZ version: [regex, '\W+(?:\W)(?<!$)', " "]
- [strip_nonascii, -]
- [replace, "-27", "%27"] # Fix "'/’" URL encoding after strip_nonascii
- [regex, '(?<=^)p-m', "-"] # Janky replace for artist +/- after strip_nonascii (step 2)
title:
type: title
filters: # Not using identical artist filters due to different treatment of "&" in titles vs artist names
# Musixmatch strips "&" from <title> ALWAYS; from <artist> too, *unless* "& the", then replaced with "and"
## ^ see "&" treatment note above, under 'artist'
- strip_diacritics
- lowercase
- [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
# ^ Strip F./ft/eat/uring + everything after
- [regex, '[’'']+', "%27"] # URL encode "'/’" bc MM is cool w/ dat shit
- [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
# ^ Replace medial non-alphanumeric char(s) with single "-" (except "'") e.g., f**k > f-k
- [regex, '\W+(?=$)', ""] # Strip end-of-string non-word chars
- [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
# ^ Clean up any remaining successive non-alphanumeric char(s) before strip_nonascii
- [strip_nonascii, -]
- [replace, "-27", "%27"] # Fix "'/’" URL encoding after strip_nonascii
config:
url: "http://www.musixmatch.com/lyrics/{artist}/{title}"
pattern: ['<span\s+id="lyrics-html"[^>]*?>(?<lyrics>.*?)</span>', s]
post-filters:
- utf8_encode
*
I included my comments, just in case anyone wants to take stab at updating or modifying my very silly regex and wants to know what the hell I was thinking (delete at will - they have no effect on functionality).So, yeah, with both Genius and Musixmatch as my #1 & #2 lyrics providers, I wanna say like 90% of my 25K songs now come with bonus
words. Yay and thanks again
redwing and, of course,
quick_wango!
* Edit: 2015.09.02 10:30 PST - added featured artist stripping to both configs
* Edit: 2015.09.11 13:10 PST - updated both configs to handle curly apostrophes like straight (
’ ->
' )