How to Enable DVB and Image Subtitle Extraction

RBoy · June 30, 2017, 7:03pm

MCEBuddy can automatically convert image based subtitles (e.g. DVB) into a text subtitle (SRT) using OCR (Optical Character Recognition). However this feature requires additional files to be downloaded. This is because OCR processing requires very large data files (~1 GB in size) which cannot be bundled within the MCEBuddy setup file.

MCEBuddy Version 2.6.2 and newer

During installation the option to automatically download and install the required OCR add-on files is enabled by default. You can disable it if do not want the OCR add-on files downloaded and installed automatically.

If you did not select the automatic installation during setup, you can at anytime trigger an install/re-install of the OCR add-on files by clicking on the Install OCR add-on link in the Conversion task page as shown below:

InstallOCR add-on in grey means that the add-on files have been installed. Click on the text to redownload and reinstall them (~500MB download)
Install OCR add-on in red means that the add-on files have not yet been installed. Click on the text to download & install them (~500MB download)

Make sure that Save subtitles or Embed subtitles options are enabled in the Conversion task → Advanced settings to extract and process subtitles using OCR.

MCEBuddy Versions 2.4.7 to 2.6.1

You will need to manually download and install the OCR files as described below.

Follow the below procedure to enable OCR and image to text subtitle conversion to extract DVB and other image based subtitles from recordings.

Download the OCR files from https://github.com/tesseract-ocr/tessdata/archive/3.04.00.zip

Extract the contents of the zip file (tessdata-3.04.00.zip). It should create a folder called tessdata-3.04.00 inside which will be 100+ traineddata files. Make sure there are no sub-folders. It should look like tessdata-3.04.00\<100+ traineddata files>

Move the folder tessdata-3.04.00 to inside the ccextractor directory where MCEBuddy is installed: <MCEBuddy installation directory>\ccextractor\
e.g. Move tessdata-3.04.00 to inside C:\Program Files\MCEBuddy2x\ccextractor\

Rename the tessdata-3.04.00 directory to tessdata. Important: Do not miss this step or OCR wont’ work

So your final setup should look like: <MCEBuddy installation directory>\ccextractor\tessdata\<100+ traineddata files>
e.g. C:\Program Files\MCEBuddy2x\ccextractor\tessdata\<100+ traineddata files>

Make sure you’ve enabled the Extract subtitles and closed captions option in your Conversion Task advanced settings and you’re all set! It will extract and convert image based subtitles into a text SRT file. Enjoy!

Versions older than 2.4.7 do not support OCR

John_Freiman · November 16, 2020, 1:43am

Is Tessdata 4.x or 5.x (alpha) supported by MCEBuddy and can it be installed in the same way/location?

There are additional subfolders in both the 4 and 5 versions of the updates.

Goose · November 16, 2020, 5:49pm

Tessdata is used by ccExtractor and it doesn’t support 4.x or 5.x as yet

John_Freiman · November 17, 2020, 2:10am

Good to know, thank you. That saved me a lot of time.

erinsfun · January 30, 2023, 4:12pm

I downloaded the zip file and it is empty. I’ll try again.

erinsfun · January 30, 2023, 5:04pm

had to get 7-zip. windows would not unzip it

Topic		Replies	Views
Extract subtitles stored as Teletext Subtitles / Closed Captions	7	2668	December 24, 2018
Extracting DVB using ccextractor New Features subtitles	7	2967	March 13, 2019
Cannot get subtitles embedded into MKV files Subtitles / Closed Captions	4	899	June 7, 2020
Embedded (VobSub) not in output Subtitles / Closed Captions subtitles	8	1082	June 18, 2021
No subtitles extracted from NextPVR .TS files Subtitles / Closed Captions	2	2271	December 24, 2018

How to Enable DVB and Image Subtitle Extraction

MCEBuddy Version 2.6.2 and newer

MCEBuddy Versions 2.4.7 to 2.6.1

Related Topics