First of all, many thanks to quick_wango for writing this plug in, as well as redwing and emmaoninternet for contributing the code for fetching lyrics from Genius. Although they work wonders for me most of the time, save me hours of painstakingly copying and pasting lyrics, I notice an issue with the filters for artists that break the searching algorithm for me. I currently use this code from redwing
name: "Genius"
variables:
artist:
type: artist
filters:
- strip_diacritics
- lowercase
- [replace, "!!!", "chk-chik-chick"]
- [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
- [regex, '\.+|,+|(\W+(?=$))|(^\W+)', ""]
- [regex, "'", ""]
- [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
- [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
- [strip_nonascii, -]
title:
type: title
filters:
- strip_diacritics
- lowercase
- [replace, "!!!", "chk-chik-chick"]
- [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
- [regex, '\s&(?=\s)', " and"]
- [regex, '\.+|,+|(\W+(?=$))|(^\W+)', ""]
- [regex, "'", ""]
- [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
- [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
- [strip_nonascii, -]
config:
url: "http://genius.com/{artist}-{title}-lyrics"
pattern: ['<div\s+class="song_body-lyrics"[^>]*?>[\s\S]*?<p>(?<lyrics>[\s\S]*?)</p>', s]
post-filters:
- strip_html
- clean_spaces
- utf8_encode
- [regex, 'googletag.*\);', "\n"]
- [regex, "’", "'"]
It may be different for others, but most of the albums that have multiple features I have are tagged under this format "Album artist, featuring artist 1, featuring artist 2, etc." The problem is that Genius only care about the album artist when constructing the url, hence having additional artists under artist tag may mess up the code. That is just my guest since I have no clue about coding language. However, from seeing that every songs with more than one artist fails to find lyrics when other songs work flawlessly, I assume that it must be missing a line of code that essentially tell the plugin/algorithm to ignore everything following the coma, which again I'm not sure if it's already included.
Can anyone help me write this line, it would be much appreciated.
Another thing, if it's at all possible, as pretty much any lyrics I can't find on Genius available at Musixmatch, could anyone revise this code of emmaoninternet on page 10.
name: "Musixmatch"
variables:
artist:
type: artist
filters:
- strip_diacritics
- lowercase
- [replace, "!!!", "artist-46206"] # !!! (Chk Chk Chk) artist exception
- [replace, "+/-", "p%m"] # +/- artist janky exception (step 1)
- [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
# ^ Strip F./ft/eat/uring + everything after
- [regex, '[’'']+', "%27"] # URL encode "'/’" bc MM is cool w/ dat shit
- [regex, '\s&\s(?=the)', " and "] # ONLY if succeeded by "the", replace " & " with " and "
## ^ Currently superfluous bc plugin preemptively replaces "&" with "and"
### ^ So MM will choke if <artist> contains "& (?!the)"
- [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
# ^ Replace medial non-alphanumeric char(s) with single "-" (except "'") e.g., M.I.A. > M-I-A.
- [regex, '\W+(?=$)', ""] # Strip end-of-string non-word chars
- [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
# ^ Clean up any remaining successive non-alphanumeric char(s) before strip_nonascii
## ^ Long & stupid bc couldn't figure out YAML-friendly non-capturing "(?:...)"
### EZ version: [regex, '\W+(?:\W)(?<!$)', " "]
- [strip_nonascii, -]
- [replace, "-27", "%27"] # Fix "'/’" URL encoding after strip_nonascii
- [regex, '(?<=^)p-m', "-"] # Janky replace for artist +/- after strip_nonascii (step 2)
title:
type: title
filters: # Not using identical artist filters due to different treatment of "&" in titles vs artist names
# Musixmatch strips "&" from <title> ALWAYS; from <artist> too, *unless* "& the", then replaced with "and"
## ^ see "&" treatment note above, under 'artist'
- strip_diacritics
- lowercase
- [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
# ^ Strip F./ft/eat/uring + everything after
- [regex, '[’'']+', "%27"] # URL encode "'/’" bc MM is cool w/ dat shit
- [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
# ^ Replace medial non-alphanumeric char(s) with single "-" (except "'") e.g., f**k > f-k
- [regex, '\W+(?=$)', ""] # Strip end-of-string non-word chars
- [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
# ^ Clean up any remaining successive non-alphanumeric char(s) before strip_nonascii
- [strip_nonascii, -]
- [replace, "-27", "%27"] # Fix "'/’" URL encoding after strip_nonascii
config:
url: "http://www.musixmatch.com/lyrics/{artist}/{title}"
pattern: ['<span\s+id="lyrics-html"[^>]*?>(?<lyrics>.*?)</span>', s]
post-filters:
- utf8_encode
As I understand it from the log bellows. The line "pattern: ['<span\s+id="lyrics-html"[^>]*?>(?<lyrics>.*?)</span>', s]" needs to be fixed for the plug in to work with Musixmatch. I'd like to fix it but again I'm not well verse in coding language.
13/04/2018 09:51:05 [DEBUG] Lyrics request: - Weezer - Tired Of Sex - Pinkerton - Musixmatch
13/04/2018 09:51:05 [INFO] Musixmatch tries to load the lyrics...
13/04/2018 09:51:05 [DEBUG] The constructed URL: http://www.musixmatch.com/lyrics/weezer/tired-of-sex
13/04/2018 09:51:05 [DEBUG] gzip compression detected
13/04/2018 09:51:05 [WARN] The pattern <span\s+id="lyrics-html"[^>]*?>(?<lyrics>.*?)</span> didn't match!
13/04/2018 09:51:05 [INFO] No lyrics found.
13/04/2018 09:51:05 [DEBUG] no lyrics found