Gibberish with hebrew subtitles

For help and support with Universal Media Server
Forum rules
Please make sure you follow the Problem Reporting Guidelines before posting if you want a reply
Post Reply
ori300
Posts: 2
Joined: Thu May 07, 2020 6:52 am

Gibberish with hebrew subtitles

Post by ori300 »

Hi,

I get gibberish when i'm trying to view a movies with hebrew subtitles on Samsung qe65q65r tv.
I've uploaded the logs.
i get this error:

WARN 2020-05-06 21:12:47.639 [HTTPv2 Request Worker 6] Detected charset "ISO-8859-8-I" in file "E:\Movies and TV Shows\Movies\Us.2019.1080p.BluRay.x264-GECKOS\us.2019.1080p.bluray.x264-geckos\us.2019.1080p.bluray.x264-geckos.srt", but cannot use it because it's not supported by the Java Virual Machine

WARN 2020-05-06 21:12:47.639 [HTTPv2 Request Worker 6] Could not detect character encoding for file "E:\Movies and TV Shows\Movies\Us.2019.1080p.BluRay.x264-GECKOS\us.2019.1080p.bluray.x264-geckos\us.2019.1080p.bluray.x264-geckos.srt"; using the default charset "UTF-8"

Thanks.
Attachments
ums_dbg2.zip
(48.79 KiB) Downloaded 271 times
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: Gibberish with hebrew subtitles

Post by Nadahar »

I doubt that you will ever get this to work. There are no proper standard for SubRip subtitles, it has just involved into some kind of de-facto standard. But, since it's not a proper standard, there is no character set handling involved at all - so what UMS tries to do is to do an "analysis" of the file content and come up with the most probable encoding. It generally doesn't work very well, but there's no other way to handle it automatically. That said, it doesn't matter much if UMS detects it correctly or not if the subtitles are sent to the TV, because UMS will just send the file as-is and it's up to the TV to handle all this.

Since there is no way to handle encoding of SubRip subtitles, the TV faces the same problem. It probably doesn't bother to try to "guesstimate" the encoding like UMS does, so my guess is that it pretty much assume UTF-8. So, you could try to convert the subtitles files to UTF-8 using some tool that does that.

The next problem is the font the TV uses. These often aren't "full Unicode" fonts, so it might not have the hebrew glyphs at all. If that's the case, UTF-8 won't solve your problem - and in fact nothing will, because the TV isn't able to print the glyphs.

I would try to encode them with UTF-8 with BOM. That way, there should be little room for doubt for any software coming across the file, as it is clearly marked - but some software can't handle the BOM and will get confused by it - so it with BOM fails you should probably try without BOM too. But, without BOM, there is no information in the file itself that identifies the encoding, so it's always a risk that something will misinterpret it as something else.
ori300
Posts: 2
Joined: Thu May 07, 2020 6:52 am

Re: Gibberish with hebrew subtitles

Post by ori300 »

Thanks.
I've converted the encoding to UTF-8, and everything works.
I see the subtitles in hebrew, and even on the "Options" menu on TV i see "Subtitles settings", and there i can see that hebrew encoding is supported, i can actualy change the encoding there.
When i play the movie with subtitle encoded to windows-1255, there is no "Subtitles settings" option on TV, so it means that the UMS is not transfering the subtitles to TV, maybe it transcoding it or something.
So instead of converting all SRT files to UTF-8, maybe there is a way to let the UMS or "Java Virtual Machine" recognize windows-1255 encoding?
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: Gibberish with hebrew subtitles

Post by Nadahar »

The log isn't made in "trace mode", so it's missing all the important information. As a consequence, I can't tell what really happens and if UMS ends up sending the subtitles or not, but I see a lot of disconnects from the TV's side, which is what you'd expect to see if the TV rejects something that UMS offers.

I would say that UTF-8 is the only sensible thing to use anyway, using "code pages" is a thing of the 90's. I really can't understand the motivation of whoever makes these subtitles to make them in Windows 1255. I'm sure you can find some kind of bulk conversion utility if you look online.. a program to do that would only need a few lines of code - depending on the circumstances (language, platform etc). The main work would be to read and write the files, not the conversion ;)

When it comes to character encoding support of the JVM, it all depends on the installed Java. But, the later versions of UMS "forces" you to use a bundled JVM, so there's not much you can do about it. But, there's not really anything one can do from the Java code side of things either, when the encoding isn't supported by the JVM. You could go back some UMS versions to before they dropped the "non-bundled version", and install your own Java. Just remember to install Java 8.. By doing that you can make sure that you install a version that supports the encoding you need, although I don't know if you will find such a Java installation. I can't really remember having seen different versions with different charset support... but it could be that I just never noticed.
Post Reply