Page 1 of 1

Charset problem in folder names in UMS.conf [SOLVED]

Posted: Tue Nov 24, 2020 5:32 am
by Sxilderik
Hello
in my ~/.config/UMS/UMS.conf, I wrote

Code: Select all

folders = /srv/video/séries
When I launch UMS, I see this

Code: Select all

INFO  18:19:14.794 [main] Checking shared folder: "/srv/video/séries"
WARN  18:19:14.794 [main] "/srv/video/séries" does not exist. Please remove it from your shared folders list on the "Contenu partagé" tab or in the configuration file.
The UMS.conf file is obviously encoded in UTF8, which seems to be a problem for UMS…

And when I edit the UMS.conf file back, I see it has been modified as the UTF8 decomposition

Code: Select all

folders = /srv/video/s\u00C3\u00A9ries
So I went ahead and wrote

Code: Select all

folders = /srv/video/s\u00E9ries
instead, and that worked. (\u00E9 is the unicode for é)

Is this a bug? Is there some setting I’m missing?

Re: Charset problem in folder names in UMS.conf [SOLVED]

Posted: Tue Nov 24, 2020 5:43 am
by Nadahar
This isn't a bug, although one could argue it should be. It comes with Java and has been like this "since the beginning of time". They chose the standard encoding for the properties files before UTF-8 was "the de facto standard", and because of backwards compatibility they have never changed it. So, UMS doesn't read the file as UTF-8, but as Latin-1.

UMS merely use standard Java properties files, which means that these rules apply. It would obviously be possible to make a custom system for configuration files handling UTF-8 also from Java, but that would require a lot more work.

Re: Charset problem in folder names in UMS.conf [SOLVED]

Posted: Tue Nov 24, 2020 5:43 am
by Sxilderik
A follow up on my other topic shed light on this

The .conf file is in java property file format and charset (Latin-1), so using the UCN sequence is a way of solving this.

I wonder whether editing the file directly in Latin-1 charset would be another way…

Re: Charset problem in folder names in UMS.conf [SOLVED]

Posted: Tue Nov 24, 2020 5:51 am
by Sxilderik
Nadahar wrote: Tue Nov 24, 2020 5:43 am This isn't a bug, although one could argue it should be. It comes with Java and has been like this "since the beginning of time". They chose the standard encoding for the properties files before UTF-8 was "the de facto standard", and because of backwards compatibility they have never changed it. So, UMS doesn't read the file as UTF-8, but as Latin-1.

UMS merely use standard Java properties files, which means that these rules apply. It would obviously be possible to make a custom system for configuration files handling UTF-8 also from Java, but that would require a lot more work.
Thanks for the clear explanation. As a Java developer myself, I came across that… glitch too. I chose to handle reading/writing of property files myself, which is not really “a lot of work”, but it’s a moral hit to have to re-write what should have been written right in the first place, or amended so that an extra Charset parameter could be provided.

Re: Charset problem in folder names in UMS.conf [SOLVED]

Posted: Tue Nov 24, 2020 6:34 am
by Nadahar
"A lot" isn't very specific, but in UMS' case we're talking making changes to how Apach Configuration works. I've actually done this to make it accept UTF-8 (and to make it understand double-quotes whose where what's enclosed in double-quotes don't need to be escaped and isn't trimmed etc). I consider this to be a relatively big job.