.srt character encoding troubles

For help and support with Universal Media Server
Forum rules
Please make sure you follow the Problem Reporting Guidelines before posting if you want a reply
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: .srt character encoding troubles

Post by Nadahar »

Thanks. Regarding the language, I noticed that the attached files are named "_rus". That won't be picked up by UMS, you must use a dot so that they end with ".rus.srt".

I'll continue to look into the encoding problems.
Last edited by Nadahar on Sat May 25, 2019 4:00 am, edited 1 time in total.
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: .srt character encoding troubles

Post by Nadahar »

All the SRT files have a UTF-8 BOM, even though none seems to be UTF-8. Have you "converted" these in Notepad++ or similar? If so, do you have the original files, as you downloaded them? The only one I've been able to find the actual charset of is the "working" one, which is actually encoded as Windows codepage 1251. Here is the same file actually converted to UTF-8.
Attachments
working_tvs-got-dd51-dl-18p-azhd-avc-801_UTF-8.rus.srt.zip
(14.36 KiB) Downloaded 482 times
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: .srt character encoding troubles

Post by Nadahar »

When reading your log, the 803 file is detected as UTF-8 (because of the UTF-8 BOM but probably would be anyway):

Code: Select all

TRACE 2019-05-24 15:26:21.273 [HTTPv2 Request Worker 6] Returning subtitles with priority External, type: SUBRIP, subtitles track title from metadata: _rus, lang: und, externalFile: Z:\Video\TV\Series\Game of Thrones\S08\TVS\tvs-got-dd51-dl-18p-azhd-avc-803_rus.srt, external file character set: UTF-8: {}
The problem is that it's not a valid UTF-8. The same happens for 804:

Code: Select all

Returning subtitles with priority External, type: SUBRIP, subtitles track title from metadata: _rus, lang: und, externalFile: Z:\Video\TV\Series\Game of Thrones\S08\TVS\tvs-got-dd51-dl-18p-azhd-avc-804_rus.srt, external file character set: UTF-8: {}
801 on the other hand is correctly detected:

Code: Select all

TRACE 2019-05-24 15:26:21.074 [HTTPv2 Request Worker 6] Returning subtitles with priority External, type: SUBRIP, subtitles track title from metadata: _rus, lang: ru, externalFile: Z:\Video\TV\Series\Game of Thrones\S08\TVS\tvs-got-dd51-dl-18p-azhd-avc-801_rus.srt, external file character set: WINDOWS-1251: {}
This explains why this file is working properly.

Here is what FFmpeg reports when starting the transcoding of 803:

Code: Select all

DEBUG 2019-05-24 15:27:04.292 [ffmpeg64.exe-10-2] [Parsed_subtitles_0 @ 00000000022a0f00] Unable to open Z:/Video/TV/Series/Game of Thrones/S08/TVS/tvs-got-dd51-dl-18p-azhd-avc-803_rus.srt
DEBUG 2019-05-24 15:27:04.292 [ffmpeg64.exe-10-2] [AVFilterGraph @ 00000000023cf260] Error initializing filter 'subtitles' with args 'Z\:/Video/TV/Series/Game of Thrones/S08/TVS/tvs-got-dd51-dl-18p-azhd-avc-803_rus.srt'
DEBUG 2019-05-24 15:27:04.293 [ffmpeg64.exe-10-2] Error initializing complex filters.
DEBUG 2019-05-24 15:27:04.293 [ffmpeg64.exe-10-2] Invalid data found when processing input
This is without any doubt caused by the fact that it's identified as UTF-8 but doesn't contain valid UTF-8 characters. My guess is that FFmpeg simply disregards the subtitles file as a result, and transcodes without subtitles.

If you attach the original files, as you downloaded them, I can try to see if I can figure out the original encoding.
mikeaoller
Posts: 31
Joined: Sat Sep 22, 2012 9:09 am

Re: .srt character encoding troubles

Post by mikeaoller »

Nadahar wrote: Sat May 25, 2019 3:51 am If you attach the original files, as you downloaded them, I can try to see if I can figure out the original encoding.
Hi Nadahar,

thank you very much for your help! All the files I uploaded are the original downloads, I didn't change anything.
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: .srt character encoding troubles

Post by Nadahar »

If you only open them in Notepad++ and change the encoding, Notepad++ will (in some cases) save the file. If you didn't do this, then I have no other explanation than that these files are bad at the download location.

Are you able to get the files to show the correct Russian characters in any text editor, word processor or similar on your PC? If not, we can't expect UMS/FFmpeg to figure them out either. I was only able to get the "working" subtitles file to display properly as a text file, and when I look at the "bad" ones with a hex editor, it doesn't look like they can be valid to me. It seems to me like the characters are 16 bits, which would mean UTF-8 or UTF-16 - but none of them give meaningful output. There might be other 16 bit encoding that I don't know about though, but as I've run them through several "online charset detectors" too I doubt that's the case.
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: .srt character encoding troubles

Post by Nadahar »

Could you post a URL for one of the non-working subtitles downloads. If you can't post it publicly, please send me a PM.
mikeaoller
Posts: 31
Joined: Sat Sep 22, 2012 9:09 am

Re: .srt character encoding troubles

Post by mikeaoller »

https://www.opensubtitles.org

Game of Thrones > season 8 > Russians only > Select episode > Download the most downloaded srt.
User avatar
Madoka
Posts: 328
Joined: Fri Jun 01, 2012 12:51 pm

Re: .srt character encoding troubles

Post by Madoka »

OpenSubtitles just tried to force install an extension on FF. Nice.

Code: Select all

OpenSubtitles Ads by Subtitle Pro

OpenSubtitles Ads removes Google Ads and inserts our Ads instead.
Yeah, that's useful.
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: .srt character encoding troubles

Post by Nadahar »

@Makoka I really hate when they do things like this, but I guess everybody craves money :( Anyway, it happened to me too, and it doesn't seem like they actually tried to install it, they simply tried to open the page where you could manually install the plugin. However, the plugin isn't available for me at least, so it seems like Firefox might have removed it. It's still a terrible move by opensubtitles.org

@mikeaoller When trying to follow your instructions, I ended up with this file: https://www.opensubtitles.org/en/subtit ... g-night-ru

I downloaded it and checked and it's encoded as Windows 1251. The encoding is correctly identified and it works when I test it. I also used Notepad++ and converted it to UTF-8. This file was also correctly identified and played. I'll attach the two files I used in my test. It seems to me like this can't be the same file that you used.
Attachments
srt.zip
(9.01 KiB) Downloaded 475 times
Nadahar
Posts: 1990
Joined: Tue Jun 09, 2015 5:57 pm

Re: .srt character encoding troubles

Post by Nadahar »

Hi spambot, it's not funny anymore now..
Post Reply