Charset problem in folder names in UMS.conf [SOLVED]

For help and support with Universal Media Server
Forum rules
Please make sure you follow the Problem Reporting Guidelines before posting if you want a reply
Post Reply
Sxilderik
Posts: 7
Joined: Mon Nov 23, 2020 11:58 pm

Charset problem in folder names in UMS.conf [SOLVED]

Post by Sxilderik »

Hello
in my ~/.config/UMS/UMS.conf, I wrote

Code: Select all

folders = /srv/video/séries
When I launch UMS, I see this

Code: Select all

INFO  18:19:14.794 [main] Checking shared folder: "/srv/video/séries"
WARN  18:19:14.794 [main] "/srv/video/séries" does not exist. Please remove it from your shared folders list on the "Contenu partagé" tab or in the configuration file.
The UMS.conf file is obviously encoded in UTF8, which seems to be a problem for UMS…

And when I edit the UMS.conf file back, I see it has been modified as the UTF8 decomposition

Code: Select all

folders = /srv/video/s\u00C3\u00A9ries
So I went ahead and wrote

Code: Select all

folders = /srv/video/s\u00E9ries
instead, and that worked. (\u00E9 is the unicode for é)

Is this a bug? Is there some setting I’m missing?
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: Charset problem in folder names in UMS.conf [SOLVED]

Post by Nadahar »

This isn't a bug, although one could argue it should be. It comes with Java and has been like this "since the beginning of time". They chose the standard encoding for the properties files before UTF-8 was "the de facto standard", and because of backwards compatibility they have never changed it. So, UMS doesn't read the file as UTF-8, but as Latin-1.

UMS merely use standard Java properties files, which means that these rules apply. It would obviously be possible to make a custom system for configuration files handling UTF-8 also from Java, but that would require a lot more work.
Sxilderik
Posts: 7
Joined: Mon Nov 23, 2020 11:58 pm

Re: Charset problem in folder names in UMS.conf [SOLVED]

Post by Sxilderik »

A follow up on my other topic shed light on this

The .conf file is in java property file format and charset (Latin-1), so using the UCN sequence is a way of solving this.

I wonder whether editing the file directly in Latin-1 charset would be another way…
Sxilderik
Posts: 7
Joined: Mon Nov 23, 2020 11:58 pm

Re: Charset problem in folder names in UMS.conf [SOLVED]

Post by Sxilderik »

Nadahar wrote: Tue Nov 24, 2020 5:43 am This isn't a bug, although one could argue it should be. It comes with Java and has been like this "since the beginning of time". They chose the standard encoding for the properties files before UTF-8 was "the de facto standard", and because of backwards compatibility they have never changed it. So, UMS doesn't read the file as UTF-8, but as Latin-1.

UMS merely use standard Java properties files, which means that these rules apply. It would obviously be possible to make a custom system for configuration files handling UTF-8 also from Java, but that would require a lot more work.
Thanks for the clear explanation. As a Java developer myself, I came across that… glitch too. I chose to handle reading/writing of property files myself, which is not really “a lot of work”, but it’s a moral hit to have to re-write what should have been written right in the first place, or amended so that an extra Charset parameter could be provided.
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: Charset problem in folder names in UMS.conf [SOLVED]

Post by Nadahar »

"A lot" isn't very specific, but in UMS' case we're talking making changes to how Apach Configuration works. I've actually done this to make it accept UTF-8 (and to make it understand double-quotes whose where what's enclosed in double-quotes don't need to be escaped and isn't trimmed etc). I consider this to be a relatively big job.
Post Reply