Author Topic: regex (regular expressions) - open discussion topic  (Read 28864 times)

theta_wave

  • Sr. Member
  • ****
  • Posts: 680
Since all of my classical music follows the "Work: Movement" format, I simply use regex in a virtualtag for the job: $rxreplace(<Title>,"(^.+?)(\:\s)(.*$)","$1")

I am guessing you mean $3 at the end here too?
No, that expression is used to extract "Work" from "Work: Movement".  The above regex is broken down to "(Work)(: )(Movement)".  Naturally, I use $1 to grab "Work" because the 1st capture group contains the information I want.

marlonob

  • Jr. Member
  • **
  • Posts: 43
Hmm, I don't know why you need to include <work> in your regex when you can still easily parse <Title> without including that tag field.  I don't know the significance of "⋮" in your setup, but I'll include it anyways.  You seem to know your way around regex, so you can remove it if you don't need it.

What I would use for <Title->:
Code
$rxreplace(<Title>,"(^.+?)(\:\s)(.*$)","⋮$3")
Thank you for your reply.
The ⋮ is just an indicator that the title has been abreviated.
As for your solution: that would not work as well for me. Consider the following cases.
The Hollywood Songbook no. 19: Panzerschlacht
The Hollywood Songbook no. 16: Die letzte Elegie
Concerto grosso no. 5 in G‐major (arr. of Corelli: sonate op. 5 no.5): I. Adagio
Star Wars Episode V: The Empire Strikes Back: The Imperial March

In this cases, the expected results will be.
The Hollywood Songbook no. 19: Panzerschlacht
The Hollywood Songbook no. 16: Die letzte Elegie
Concerto grosso no. 5 in G‐major (arr. of Corelli: sonate op. 5 no.5): I. Adagio
Star Wars Episode V: The Empire Strikes Back: The Imperial March

But with that expression, it will be.
The Hollywood Songbook no. 19: Panzerschlacht
The Hollywood Songbook no. 16: Die letzte Elegie
Concerto grosso no. 5 in G‐major (arr. of Corelli: sonate op. 5 no.5): I. Adagio
Star Wars Episode V: The Empire Strikes Back: The Imperial March


I do have a very similar expression in mp3tag to auto-populate the <Work> tag, along with one to copy to it everything before the last colon, and another to copy everyting before "n(o|r|º)\.\s?\d" for this cases, but I have to select which is the right one per case.

The expression you suggested is, of course, the most common, but not the only one I’ll need.

theta_wave

  • Sr. Member
  • ****
  • Posts: 680
Thank you for your reply.
The ⋮ is just an indicator that the title has been abreviated.
As for your solution: that would not work as well for me. Consider the following cases.
The Hollywood Songbook no. 19: Panzerschlacht
The Hollywood Songbook no. 16: Die letzte Elegie
Concerto grosso no. 5 in G‐major (arr. of Corelli: sonate op. 5 no.5): I. Adagio
Star Wars Episode V: The Empire Strikes Back: The Imperial March

In this cases, the expected results will be.
The Hollywood Songbook no. 19: Panzerschlacht
The Hollywood Songbook no. 16: Die letzte Elegie
Concerto grosso no. 5 in G‐major (arr. of Corelli: sonate op. 5 no.5): I. Adagio
Star Wars Episode V: The Empire Strikes Back: The Imperial March

But with that expression, it will be.
The Hollywood Songbook no. 19: Panzerschlacht
The Hollywood Songbook no. 16: Die letzte Elegie
Concerto grosso no. 5 in G‐major (arr. of Corelli: sonate op. 5 no.5): I. Adagio
Star Wars Episode V: The Empire Strikes Back: The Imperial March


I do have a very similar expression in mp3tag to auto-populate the <Work> tag, along with one to copy to it everything before the last colon, and another to copy everyting before "n(o|r|º)\.\s?\d" for this cases, but I have to select which is the right one per case.

The expression you suggested is, of course, the most common, but not the only one I’ll need.
From the examples you gave me, if you want the search to go select everything prior to the last colon as "Work", then remove the "?" from my expression to make it greedy.  I'm just going off of the examples you provided.

marlonob

  • Jr. Member
  • **
  • Posts: 43
That still wouldn’t function for my case in the first two examples (
The Hollywood Songbook no. 19: Panzerschlacht
The Hollywood Songbook no. 16: Die letzte Elegie

 since the main work title ends before any colon) neither will for titles such as
Saeviat tellus inter rigores, HWV 240: Recitativo: Carmelitarum ut confirmet ordinem
Mass no. 17 for Soloists, Chorus & Orchestra in C minor, K. 417a/427 (fragment) “Great”: IIa. Gloria: “Gloria in excelsis”

 where the work title ends just before the first colon.

As I mentioned before, I have three substitutions in mp3tag to help me automate the tagging of <Work>:
^(.+):\s.+ →  $1
:\s.+ →
\sn(o|r|º)\.\s?\d.* →

but which one will be needed each time has to be determined manually.

So, the <Work> tag contains the result of this task, and it would be useful to have the possibility to use it  for other purposes as well.

theta_wave

  • Sr. Member
  • ****
  • Posts: 680
Then create an if-else function where you $ismatch titles with two or more semicolons:

Code
(.*?:){2,}
Do one $rxreplace for that and another for titles containing only one semicolon

marlonob

  • Jr. Member
  • **
  • Posts: 43
Then create an if-else function where you $ismatch titles with two or more semicolons:

Code
(.*?:){2,}
Do one $rxreplace for that and another for titles containing only one semicolon
That also wouldn’t work, since there are cases where the relevant part is after the first colon (Saeviat tellus inter rigores, HWV 240: Recitativo: Carmelitarum ut confirmet ordinem) and others where it is after the second (Star Wars Episode V: The Empire Strikes Back: The Imperial March)

Your post, however, gave me an idea, and I think I got a solution. For anyone interested, it’s:
Code
$If($IsMatch($Replace(<Title>,<Work>,⋮),"(.*⋮){2}"),
$RxReplace(<Work>:::$Replace(<Title>,<Work>,⋮),"^(.+):::⋮[\s:,]*(.+)⋮(.*)$","⋮$2$1$3"),
$RxReplace($Replace(<Title>,<Work>,⋮),"⋮[\s:,]*","⋮"))

This first checks whether there’s two ⋮ (meaning, there have been two substitutions). If yes, then will put the content of <Work> along with a unique string (“:::”) before the resulting $Replace, so it can be used by the regex and put it instead of the second ⋮.

Thank you very much for your help, @theta_wave and for your commitment with the community.

hiccup

  • Sr. Member
  • ****
  • Posts: 7790
I have some difficulty in understanding the workings and/or syntax of RxSplit.

Suppose a title being:
Misa criolla: Credo

If I use:
$RxSplit(<Title>,"(: )",1)
To get the complete contents before the colon that works.
(displaying: Misa criolla)

But if I try the same to get the contents after the colon this won't work:
$RxSplit(<Title>,"(: )",2)
(it will only display ":")

Only if I add some arguments it will work:
$RxSplit(<Title>,"(: .*)",2)

Why doesn't the first example need more arguments, and why does the second?


Edit:
A second question came up:

Using the same track title as above (Misa criolla: Credo), and trying to isolate the 'work (before the colon) using this:
$RxReplace(<Title>,"(^.+?)\:\s","$1")
I would expect it to only show the work, but it will display:
MisacriollaCredo
So it doesn't stop at running into  \:\s
Shouldn't it?
Last Edit: February 21, 2017, 05:36:32 PM by hiccup

theta_wave

  • Sr. Member
  • ****
  • Posts: 680
Thank you very much for your help, @theta_wave and for your commitment with the community.

No problem, I'm no expert at this kind of thing, but I try to do what I can to help.  I'm here to learn as well.  For example, I'm wondering about your use of [\s:,]*.  To me, read left to right, it looks inconsequential because of the use of the "*" rather than "+" since "*" would still select a character even in the absence of the whitespace, colon or comma at that location.


I have some difficulty in understanding the workings and/or syntax of RxSplit.

Suppose a title being:
Misa criolla: Credo

If I use:
$RxSplit(<Title>,"(: )",1)
To get the complete contents before the colon that works.
(displaying: Misa criolla)

But if I try the same to get the contents after the colon this won't work:
$RxSplit(<Title>,"(: )",2)
(it will only display ":")

Only if I add some arguments it will work:
$RxSplit(<Title>,"(: .*)",2)

Why doesn't the first example need more arguments, and why does the second?


Edit:
A second question came up:

Using the same track title as above (Misa criolla: Credo), and trying to isolate the 'work (before the colon) using this:
$RxReplace(<Title>,"(^.+?)\:\s","$1")
I would expect it to only show the work, but it will display:
MisacriollaCredo
So it doesn't stop at running into  \:\s
Shouldn't it?

1) For the first question, I'm not familiar with $rxsplit, as I have yet to use it since Steven included it in MusicBee.  Your guess is as good as mine.  For what you are trying to achieve here, couldn't $Split do the job?

2)  The reason why is that your search is running twice in that line, try it in notepad++.  (^.+)?\:\s starts at the beginning and ends at ": ", but its position in the line is not at the end yet.  So, the expression repeats itself and starts at the beginning. "(^.+)?\:\s" matches "(Misa criolla): " and when it restarts after ": ", "(^.+)?\:\s" matches "(Credo)" because there's no ": " stop sign, so "(^.+?)\:\s" continues all the way to the end of the line.  Now, you have two groups with "$1" (see the groups in the previous sentence enclosed in parenthesis).

In this case, you need to have an expression that's good for the whole line.  From what you are trying to do, "(^.+?)\:\s.*$" should be good enough.  Still, I have a habit to enclose groups even if I'm not going to use them for a particular virtualtag because I can simply copy-paste the regex into another virtualtag unchanged and simply swap $2 for the $1.  So, in your case, my ideal regex would be "(^.+?)\:\s(.*$)"

I hope this helps.

hiccup

  • Sr. Member
  • ****
  • Posts: 7790
I have some difficulty in understanding the workings and/or syntax of RxSplit.

1) For the first question, I'm not familiar with $rxsplit, as I have yet to use it since Steven included it in MusicBee.  Your guess is as good as mine.  For what you are trying to achieve here, couldn't $Split do the job?

The only purpose of this simple example was to achieve understanding of how $RxSplit (should) work.
Not to solve this particular example in other ways.

Hopefully Steven can chip in and give some explanation on the workings (and advantages) of $RxSplit in MusicBee.


Quote from: theta_wave
2)  The reason why is that your search is running twice in that line, try it in notepad++.  (^.+)?\:\s starts at the beginning and ends at ": ", but its position in the line is not at the end yet.

Thnks, that's clear.
I did run the expression through an online regex tester I found and like:
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

With the same formula and target string it ran fine there. So it's probably behaving slightly different from MB's regex engine.
B.t.w. what's also nice about this tester, it clearly shows content groups.

theta_wave

  • Sr. Member
  • ****
  • Posts: 680
The only purpose of this simple example was to achieve understanding of how $RxSplit (should) work.
Not to solve this particular example in other ways.

Hopefully Steven can chip in and give some explanation on the workings (and advantages) of $RxSplit in MusicBee.
Hah, yeah that too.  I haven't come across a situation where I would think I'd use it.  However, I'm all ears on some use cases.


I did run the expression through an online regex tester I found and like:
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

With the same formula and target string it ran fine there. So it's probably behaving slightly different from MB's regex engine.
B.t.w. what's also nice about this tester, it clearly shows content groups.
Good catch.  I'll check it out too.  I'm just assuming here, but maybe notepad++ and MB have regex expressions behave as if they were bordered like this /regex/g.  This simply calls for the regex to run repeatedly, not just once.  Again, this is just a guess and I'm used to this repeating behavior by default in notepad++ w/o the "/" and "/g".  The regex above is the type of syntax that is used in programming or sed.

marlonob

  • Jr. Member
  • **
  • Posts: 43
[…]I'm wondering about your use of [\s:,]*.  To me, read left to right, it looks inconsequential because of the use of the "*" rather than "+" since "*" would still select a character even in the absence of the whitespace, colon or comma at that location.

According to my experience, * will select zero or more [\s:,], but I'm having trouble imagining a case with zero occurrences of  [\s:,], so + may be more suitable.

I have some difficulty in understanding the workings and/or syntax of RxSplit.

[…] if I try the same to get the contents after the colon this won't work:
$RxSplit(<Title>,"(: )",2)
(it will only display ":")

$RxSplit(<Title>,"(: )",3) will give you what you need. It seem that, for whatever reason, MB counts the split pattern as part of the splitted series. This may be a bug, though.

hiccup

  • Sr. Member
  • ****
  • Posts: 7790
I did run the expression through an online regex tester I found and like:
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
With the same formula and target string it ran fine there. So it's probably behaving slightly different from MB's regex engine.
Good catch.  I'll check it out too.  I'm just assuming here, but maybe notepad++ and MB have regex expressions behave as if they were bordered like this /regex/g.  This simply calls for the regex to run repeatedly, not just once.  Again, this is just a guess and I'm used to this repeating behavior by default in notepad++ w/o the "/" and "/g".  The regex above is the type of syntax that is used in programming or sed.

Something is 'off' with how MusicBee's regex engine handles this.
When I test:
 
Code
(^.+?)\:\s
with several regex testers on the string: Misa criolla: Credo
All of them return "Misa criolla: "
Only MusicBee returns: "Misa criollaCredo"

And it's probably not the /g switch responsible for this, since many regex tester (such as http://regexr.com/) have that one active by default also.

I am not saying something is wrong (I lack the insight and understanding of regex to state such), but I think it would be good if MusicBee's regex engine would behave more like the ones from such on- and offline regex testers.

Steven

  • Administrator
  • Sr. Member
  • *****
  • Posts: 34313
keep in mind MB is using regex from .NET
i have read somewhere before that its not quite standard and you should read any documentation from the microsoft website to deterime the expected behavior

hiccup

  • Sr. Member
  • ****
  • Posts: 7790
Yes, I am aware of differences between java, perl, C++ etc. and also tried out some engines that specifically (claim to) use .net

Like http://regexstorm.net/tester
But that will give the same result.

I also tried the offline tester 'Expresso' and set it to use Visual Basic.
Same result.

Do you perhaps have a suggestion for a regex tester that behaves as MusicBee's engine currently does?

Bee-liever

  • Member
  • Sr. Member
  • *****
  • Posts: 3831
  • MB Version: 3.6.8849 P
Only MusicBee returns: "Misa criollaCredo"
As Steven said, it's to do with the .NET version of regex and something called implicit and explicit capture groups (whatever they are??).
Try adding (?n) to the start of your regex.
Last Edit: February 26, 2017, 09:11:53 AM by Bee-liever
MusicBee and my library - Making bee-utiful music together