I think I'm getting a yaml for the actual Genius too, but it's really complicated. Just nothing else works anymore, this plugin needs updating.
I think I have a yaml that works for Genius (until they change their formatting again).
name: Genius
variables:
artist:
type: artist
filters:
- strip_diacritics
- lowercase
- [replace, "!!!", "chk-chik-chick"]
- [regex, '(?<=\W|\s)+(feat.+|ft[\W\s]+|(f\.\s)).+', ""]
- [regex, '\.+|,+|(\W+(?=$))|(^\W+)', ""]
- [regex, "'", ""]
- [regex, '(?<=[a-z0-9%])[^\sa-z0-9%]+(?=[a-z0-9%]+)', "-"]
- [regex, '((?<=\s)([^a-z0-9\s-])+(\s|\W)+)|((?<=\w)([^a-z0-9-])+(\s|\W)+)', " "]
- [strip_nonascii, -]
title:
type: title
filters: artist
config:
url: "https://genius.com/{artist}-{title}-lyrics"
pattern: ['<div class="Lyrics__Container.*?">(?<lyrics>.*)<div class="Lyrics__Footer.*?">']
post-filters:
- utf8_encode
- entity_decode
- [regex, "<br/>", "\n"]
- strip_html
- clean_spaces
I haven't tested it extensively, but it works well when the URL is generated right. A few test failures I found:
1) Artist "X, the Y" is logged by the plugin as "X". I suspect this happens before the plugin regexes anything (I turned off the filters to check), maybe Musicbee just gives the plugin the segment before the first comma.
2) "X (Y)" is passed to the plugin as "X". Similar issue as before, but with parentheses. In my test case, Y wasn't a featured artist. Curiously, "X (Y) (ft. Z)" was correctly logged as "X (Y)". Is it only the last parenthesized phrase that's removed?
3) "Cygnus....Vismund Cygnus" is regexed to "cygnusvismund-cygnus", while the Genius URL expected "cygnus-vismund-cygnus". Other titles with ellipses remove them altogether, maybe this needs special handling.