Author Topic: Letters Frequency in Various Languages  (Read 20687 times)

boroda

  • Sr. Member
  • ****
  • Posts: 4595
UPDATED May 09, 2011

I'm not sure if someone needs it, but that's what I've found for myself and that may be used by other translators - Excel spreadsheet with letters frequency in Russian, English, French, German, Spanish, Italian, Polish, Portuguese, Turkish, Dutch. Unfortunately spaces frequency is considered only for Russian, Polish and Swedish.

------------------------------------

MB evaluates average width of char for used font for your language by frequency of letter usage in language and widths of chars in given font. Frequency of letter usage is defined at the beginning of .lng file in string with "#Main.msg.abefopv#" id by repeating each char. The more the frequency of letter usage, the more given char must be repeated. As far as I understand spaces are also considered to be a letter and length of space char for given font is also evaluated.

The correct usage of line "#Main.msg.abefopv#" can make your translation almost independent of used fonts and helps to adjust sizing of UI elements for localizations.

At least I hope that all this is true.

Spreadsheet that can be downloaded from the link below allows you to get the exact "#Main.msg.abefopv#" string that may be simply copy/pasted to the beginning of your .lng file. . There is a scale factor in spreadsheet that defines average length of word (average number of letters in text) in your language compared to the same English text. You may want to play around with value of scale factor. You can add new languages to spreadsheet if you have some information about frequencies of letters usage for your language.

http://www.mediafire.com/download.php?nueey9i273j687b

Last Edit: February 28, 2013, 07:00:50 PM by boroda74

VX

  • Guest
boroda74, could You provide me the exact "#Main.msg.abefopv#" string for Polish (scaling factor of 1,4 and 1,6)  as I can't generate it with the spreadsheet  ???

Bolshoe spasibo :)

boroda

  • Sr. Member
  • ****
  • Posts: 4595
boroda74, could You provide me the exact "#Main.msg.abefopv#" string for Polish (scaling factor of 1,4 and 1,6)  as I can't generate it with the spreadsheet  ???

Sure. See this text file. But why you cant generate it?

Bolshoe spasibo :)
:)

-----------

By the way, I think you should use scaling factor ~1.4 (1.6 is tooo big). I was using 1.6 initially for Russian, but after discussion with Steven changed it to 1.4 and requested Steven for movement and resizement of some controls.
Last Edit: May 25, 2011, 01:07:03 PM by boroda74

VX

  • Guest
Thank you very much boroda74 for help :) I was trying to write in Russian but apparently this forum doesn't recognize Russian letters (I got '???' for Polish specific letters too   :-[ ) I'd rather write in English instead in Russian with Latin letters ;)

I'm using 1,4 scaling factor now and I'm OK with it so far. How about the 1,45 option you have introduced recently for the Russian translation  ??? Is it better for you ???

boroda

  • Sr. Member
  • ****
  • Posts: 4595
I'm using 1,4 scaling factor now and I'm OK with it so far. How about the 1,45 option you have introduced recently for the Russian translation  ??? Is it better for you ???
Actually 1.4 is perfect for Russian (I could use even lesser value), except for tag editor window. And using factor 1.45 doesn't mean that its real factor (I think real factor is about 1.41-1.42), because I cant use fractional number of chars in string #Main.msg.abefopv#. I guess that its senseless to request Steven for very fine adjustment of controls before most translations are finished.
Last Edit: May 26, 2011, 04:11:51 PM by boroda74

jistme

  • Guest
This is interesting stuff, but I must admit I don't quite understand how it should/can be used exactly.

In my case, Dutch translation, including prompt corrections by Steven on reported 'width problems' is currently quite satisfactory.
Yet I still see some minor cases which could improve slightly, but they are too minor to raise a ticket for.


One estimate on occurrences of letters in Dutch language states this:

E    18,91%    N    10,03%    A    7,49%    T    6,79%    I    6,50%    R    6,41%    O    6,06%
D    5,93%    S    3,73%    L    3,57%    G    3,40%    V    2,85%    H    2,38%    K    2,25%
M    2,21%    U    1,99%    B    1,58%    P    1,57%    W    1,52%    J    1,46%    Z    1,39%
C    1,24%    F    0,81%    X    0,04%    Y    0,03%    Q    0,01%

Looking at the "#Main.msg.abefopv#" in the translation file I am using, for example I see 4 times letter D, and 4 times letter L.
From the list above you might conclude there should maybe be 6 "D''s and 4 "L"'s in the .lng file.
(and 1 Q and 1891  E's  ;-)

My question:

- Is this something I could change safely myself in the .lng file without adverse effects on existing layout and spacings
- Should I suggest these 'Dutch values' to Steven
- Is it better not to touch this at all, now Dutch translation is already in such a far progressed state?

Do I understand correctly that the scaling factor that is mentioned, is an all-languages comprising factor in MusicBee, and is not a per-language setting?

boroda

  • Sr. Member
  • ****
  • Posts: 4595
My question:

- Is this something I could change safely myself in the .lng file without adverse effects on existing layout and spacings
- Should I suggest these 'Dutch values' to Steven
- Is it better not to touch this at all, now Dutch translation is already in such a far progressed state?
- I can't guarantee you that changing #Main.msg.abefopv# string doesn't affect layout. Be ready to fine-tune translated strings or request Steven for controls adjustment
- You don't need to suggest Steven anything because you have full control over your .lng-file
- I think you should leave #Main.msg.abefopv# string as is if you currently satisfied with MB layout in Dutch

Exact look of #Main.msg.abefopv# string is only important if users are changing default fonts. For given default fonts you wont get benefit from fine-tuning of this string

Do I understand correctly that the scaling factor that is mentioned, is an all-languages comprising factor in MusicBee, and is not a per-language setting?
Yes


jistme

  • Guest
FYI:

Wanting to try experimenting a little with the supplied xls:


boroda

  • Sr. Member
  • ****
  • Posts: 4595
Strange, these files open fine on my PC. I've reuploaded them in .xlsx format. Try to download again from the link in the first page.

jistme

  • Guest
Strange, these files open fine on my PC. I've reuploaded them in .xlsx format. Try to download again from the link in the first page.

Still an error message:



Btw it's xlsm, not xlsx, but I suppose that's correct since it's 'macro'?


boroda

  • Sr. Member
  • ****
  • Posts: 4595
jistme, I prepared unicode text file for Dutch with scaling factors 1.0, 1.1, 1.2, 1.3, 1.4. Just copy (replace) one of 5 strings to the beginning of your language file. You may want to play around with different scaling factors. Feel free to ask me file with new sacling factors (eg 1.15 or 1.25)

http://www.mediafire.com/view/?8rbtqmtz8ook637

jistme

  • Guest
jistme, I prepared unicode text file for Dutch with scaling factors 1.0, 1.1, 1.2, 1.3, 1.4. Just copy (replace) one of 5 strings to the beginning of your language file. You may want to play around with different scaling factors. Feel free to ask me file with new sacling factors (eg 1.15 or 1.25)

http://www.mediafire.com/view/?8rbtqmtz8ook637

Wow, great service. Going to try those out.

Very off-topic;
Do you have any genre suggestions for this:?
http://getmusicbee.com/forum/index.php?topic=8987.msg52612#msg52612
(I have a soft spot for some Russian films/music, and am interested if you might have input on this)

boroda

  • Sr. Member
  • ****
  • Posts: 4595
@jistme
I've read these files. All seems to be correct. I want to remark only one thing: there is one very popular in Russia and specific to Russia (along with Folk/Russia) genre - 'Bards'. We often call it 'Author's song' also.

jistme

  • Guest
@jistme
I want to remark only one thing: there is one very popular in Russia and specific to Russia (along with Folk/Russia) genre - 'Bards'. We often call it 'Author's song' also.

Thnx. I already had 'Bard', but I wasn't sure about it and put it in the 'keywords' list.
I will move it to the 'genre' list.