Author Topic: Tricky regex expression...?  (Read 523 times)

tangotonyb

  • Jr. Member
  • **
  • Posts: 115
Hoping someone can help with this.
I have a custom field in MusicBee where I enter multiple values separated by a semi-colon.
e.g. FGH; JKL; ABC

I have an expression that I use in a Virtual Tag to determine whether any values in the Custom Field have been duplicated.

$IsMatch(<MyCustomField>,"(?<=(\b\w+\b).*)\1")

The problem which I've just realised is that this expression only works for the first value, i.e it picks up
ABC; FGH; JKL; ABC

but not
ABC; FGH; JKL; JKL

Would anyone be able to help me correct this expression?




karbock

  • Sr. Member
  • ****
  • Posts: 343
A first remark:
In this case, you must include the separator ("; "), and not just rely on "\w+" to define an element.
Otherwise, you will get a false positive with "ABC; ABCD".

The following RegEx works:
$IsMatch(<MyCustomField>,"^(.*; )?([^;]+); ([^;]+; )*\2(; [^;]+)*$")

I have tested it with different cases, but I may have forgotten a particular configuration. Let me know if you come across a false positive/negative.

Explanation of the RegEx parts:
Nr.
RegEx part
Alternative RegEx part
Meaning
Condition
1^beginning of field
2([^;]+; )*(.*; )?any number of elementsThe substring, if present, must end with a semicolon.
3([^;]+); a specific elementFollowed by semicolon + space. The separator (; ) is not included in the capturing group.
4([^;]+; )*(.*; )?any number of elementsThe substring, if present, must end with a semicolon.
5\2repeated specific element
6(; [^;]+)*(; .*)?any number of elementsThe substring, if present, must start with a semicolon.
7$end of field
Last Edit: July 08, 2023, 08:04:22 PM by karbock

tangotonyb

  • Jr. Member
  • **
  • Posts: 115
Thankyou so much karbock. You're a genius! I must confess, I doubted if it was even possible.

Tested and seems to work perfectly - that was the easy bit! It'll take me quite a bit longer to figure out HOW it works!


karbock

  • Sr. Member
  • ****
  • Posts: 343
Thankyou so much karbock. You're a genius! I must confess, I doubted if it was even possible.
You're very welcome! As for me: I'm not a genius, but simply used to analysing. :)

Tested and seems to work perfectly - that was the easy bit! It'll take me quite a bit longer to figure out HOW it works!
Maybe the updated presentation above will give better insight into the different RegEx parts.

hiccup

  • Sr. Member
  • ****
  • Posts: 7896
Maybe the updated presentation above will give better insight into the different RegEx parts.
Same as the OP I also thought this could not be done.
But it looks like you indeed nailed it.

In trying to understand this very clever solution of yours I fail miserably.
I have had the notion that regexes query a string one step (character) at a time.
But it can also look what follows immediately, or what preceded immediately. (adjacent is the word I think)

But your solution seems to be able to look both ahead and behind, irrespective if something is adjacent or not.

What's the magic there?

karbock

  • Sr. Member
  • ****
  • Posts: 343
What's the magic there?
Hi, guy! ;)
In the comments below, the numbers refer to the table rows in post #2 and spaces are replaced by "_" for legibility purposes.
A keyword is a non-null sequence of characters different from ";", thus a keyword = "[^;]+".

3. "([^;]+);_"
Between round brackets: the keyword that will be repeated, followed by ";_".
5. "\2"
The repeated keyword (captured group nr. 2).

1. + 2.
The keyword to look for is possibly preceded by 0, 1 or more other keywords with semi-colons in between, thus 2 possible variants: "([^;]+;_)*" or "(.*;_)?".

6. + 7.
The 2nd occurrence of the keyword is possibly followed by 0, 1 or more other keywords with semi-colons in between, thus 2 possible variants: "(;_[^;]+)*" or "(;_.*)?". Note that the semi-colon starts the RegEx part this time.

4.
Between the keyword and its repetition, you can have 0, 1 or more other keywords, with semi-colons in between.

I made the RegEx as general as possible, so as to normally cover all the cases.
Moreover, it is important to use the semi-colon at each transition: between 2 and 3, 3 and 4, etc. By doing so, you know that no RegEx part matches a substring of a keyword.

I could have used non-capturing groups in 2., 4. and 6 but it would have been even more illegible.
Last Edit: July 08, 2023, 08:02:55 PM by karbock