Page 1 of 1

UMS and Special Characters in Subtitles

Posted: Thu Aug 23, 2018 2:56 pm
by Madoka
This is not a bug per se, more like UMS chooses poorly(?)

I have in the past left the non-unicode subtitles encoding detection to Auto-detect. However, I don't know what encoding it's using, but the auto detection will not display curly quotes, N or M dashes, or ellipses correctly. For example, it’s is rendered as it’s. However, manually setting the code to Windows 1252 will solve these problems. Not sure if UMS needs to improve the detection or not. Just reporting what I have noticed.

Re: UMS and Special Characters in Subtitles

Posted: Fri Aug 24, 2018 6:37 am
by Nadahar
Character set detection is very tricky, especially for short texts. UMS uses ICU4J for this. I don't think we would be able to "improve" the detection, some of the character sets are very similar and some texts simply don't use words that reveal the difference.

What you should do instead is use UTF-8. Converting them to UTF-8 can be done easily for example by using Notepad++

Re: UMS and Special Characters in Subtitles

Posted: Fri Aug 24, 2018 8:44 am
by Madoka
Thank you for your explanation! Much appreciated.