.srt character encoding troubles
Forum rules
Please make sure you follow the Problem Reporting Guidelines before posting if you want a reply
Please make sure you follow the Problem Reporting Guidelines before posting if you want a reply
Re: .srt character encoding troubles
Thanks. Regarding the language, I noticed that the attached files are named "_rus". That won't be picked up by UMS, you must use a dot so that they end with ".rus.srt".
I'll continue to look into the encoding problems.
I'll continue to look into the encoding problems.
Last edited by Nadahar on Sat May 25, 2019 4:00 am, edited 1 time in total.
Re: .srt character encoding troubles
All the SRT files have a UTF-8 BOM, even though none seems to be UTF-8. Have you "converted" these in Notepad++ or similar? If so, do you have the original files, as you downloaded them? The only one I've been able to find the actual charset of is the "working" one, which is actually encoded as Windows codepage 1251. Here is the same file actually converted to UTF-8.
- Attachments
-
- working_tvs-got-dd51-dl-18p-azhd-avc-801_UTF-8.rus.srt.zip
- (14.36 KiB) Downloaded 900 times
Re: .srt character encoding troubles
When reading your log, the 803 file is detected as UTF-8 (because of the UTF-8 BOM but probably would be anyway):
The problem is that it's not a valid UTF-8. The same happens for 804:
801 on the other hand is correctly detected:
This explains why this file is working properly.
Here is what FFmpeg reports when starting the transcoding of 803:
This is without any doubt caused by the fact that it's identified as UTF-8 but doesn't contain valid UTF-8 characters. My guess is that FFmpeg simply disregards the subtitles file as a result, and transcodes without subtitles.
If you attach the original files, as you downloaded them, I can try to see if I can figure out the original encoding.
Code: Select all
TRACE 2019-05-24 15:26:21.273 [HTTPv2 Request Worker 6] Returning subtitles with priority External, type: SUBRIP, subtitles track title from metadata: _rus, lang: und, externalFile: Z:\Video\TV\Series\Game of Thrones\S08\TVS\tvs-got-dd51-dl-18p-azhd-avc-803_rus.srt, external file character set: UTF-8: {}
Code: Select all
Returning subtitles with priority External, type: SUBRIP, subtitles track title from metadata: _rus, lang: und, externalFile: Z:\Video\TV\Series\Game of Thrones\S08\TVS\tvs-got-dd51-dl-18p-azhd-avc-804_rus.srt, external file character set: UTF-8: {}
Code: Select all
TRACE 2019-05-24 15:26:21.074 [HTTPv2 Request Worker 6] Returning subtitles with priority External, type: SUBRIP, subtitles track title from metadata: _rus, lang: ru, externalFile: Z:\Video\TV\Series\Game of Thrones\S08\TVS\tvs-got-dd51-dl-18p-azhd-avc-801_rus.srt, external file character set: WINDOWS-1251: {}
Here is what FFmpeg reports when starting the transcoding of 803:
Code: Select all
DEBUG 2019-05-24 15:27:04.292 [ffmpeg64.exe-10-2] [Parsed_subtitles_0 @ 00000000022a0f00] Unable to open Z:/Video/TV/Series/Game of Thrones/S08/TVS/tvs-got-dd51-dl-18p-azhd-avc-803_rus.srt
DEBUG 2019-05-24 15:27:04.292 [ffmpeg64.exe-10-2] [AVFilterGraph @ 00000000023cf260] Error initializing filter 'subtitles' with args 'Z\:/Video/TV/Series/Game of Thrones/S08/TVS/tvs-got-dd51-dl-18p-azhd-avc-803_rus.srt'
DEBUG 2019-05-24 15:27:04.293 [ffmpeg64.exe-10-2] Error initializing complex filters.
DEBUG 2019-05-24 15:27:04.293 [ffmpeg64.exe-10-2] Invalid data found when processing input
If you attach the original files, as you downloaded them, I can try to see if I can figure out the original encoding.
-
- Posts: 31
- Joined: Sat Sep 22, 2012 9:09 am
Re: .srt character encoding troubles
If you only open them in Notepad++ and change the encoding, Notepad++ will (in some cases) save the file. If you didn't do this, then I have no other explanation than that these files are bad at the download location.
Are you able to get the files to show the correct Russian characters in any text editor, word processor or similar on your PC? If not, we can't expect UMS/FFmpeg to figure them out either. I was only able to get the "working" subtitles file to display properly as a text file, and when I look at the "bad" ones with a hex editor, it doesn't look like they can be valid to me. It seems to me like the characters are 16 bits, which would mean UTF-8 or UTF-16 - but none of them give meaningful output. There might be other 16 bit encoding that I don't know about though, but as I've run them through several "online charset detectors" too I doubt that's the case.
Are you able to get the files to show the correct Russian characters in any text editor, word processor or similar on your PC? If not, we can't expect UMS/FFmpeg to figure them out either. I was only able to get the "working" subtitles file to display properly as a text file, and when I look at the "bad" ones with a hex editor, it doesn't look like they can be valid to me. It seems to me like the characters are 16 bits, which would mean UTF-8 or UTF-16 - but none of them give meaningful output. There might be other 16 bit encoding that I don't know about though, but as I've run them through several "online charset detectors" too I doubt that's the case.
Re: .srt character encoding troubles
Could you post a URL for one of the non-working subtitles downloads. If you can't post it publicly, please send me a PM.
-
- Posts: 31
- Joined: Sat Sep 22, 2012 9:09 am
Re: .srt character encoding troubles
https://www.opensubtitles.org
Game of Thrones > season 8 > Russians only > Select episode > Download the most downloaded srt.
Game of Thrones > season 8 > Russians only > Select episode > Download the most downloaded srt.
Re: .srt character encoding troubles
OpenSubtitles just tried to force install an extension on FF. Nice.
Yeah, that's useful.
Code: Select all
OpenSubtitles Ads by Subtitle Pro
OpenSubtitles Ads removes Google Ads and inserts our Ads instead.
Re: .srt character encoding troubles
@Makoka I really hate when they do things like this, but I guess everybody craves money
Anyway, it happened to me too, and it doesn't seem like they actually tried to install it, they simply tried to open the page where you could manually install the plugin. However, the plugin isn't available for me at least, so it seems like Firefox might have removed it. It's still a terrible move by opensubtitles.org
@mikeaoller When trying to follow your instructions, I ended up with this file: https://www.opensubtitles.org/en/subtit ... g-night-ru
I downloaded it and checked and it's encoded as Windows 1251. The encoding is correctly identified and it works when I test it. I also used Notepad++ and converted it to UTF-8. This file was also correctly identified and played. I'll attach the two files I used in my test. It seems to me like this can't be the same file that you used.

@mikeaoller When trying to follow your instructions, I ended up with this file: https://www.opensubtitles.org/en/subtit ... g-night-ru
I downloaded it and checked and it's encoded as Windows 1251. The encoding is correctly identified and it works when I test it. I also used Notepad++ and converted it to UTF-8. This file was also correctly identified and played. I'll attach the two files I used in my test. It seems to me like this can't be the same file that you used.
- Attachments
-
- srt.zip
- (9.01 KiB) Downloaded 932 times
Re: .srt character encoding troubles
Hi spambot, it's not funny anymore now..