How to Enable DVB and Image Subtitle Extraction

RBoy · June 30, 2017, 7:03pm

MCEBuddy can automatically convert image based subtitles (e.g. DVB or burned in subtitles) into a text subtitle (SRT) using OCR (Optical Character Recognition). This feature requires additional files to be downloaded because OCR processing requires very large data files (~1 GB in size) which cannot be bundled within the MCEBuddy setup file.

MCEBuddy Version 2.7.1 and newer

The OCR files are automatically downloaded and installed if an internet connection is available after MCEBuddy setup completes.

MCEBuddy Version 2.6.2 to 2.6.6

During installation the option to automatically download and install the required OCR add-on files is enabled by default. You can disable it if do not want the OCR add-on files downloaded and installed automatically.

If the download failed due to lack of an internet connection or you did not select the automatic installation during setup, you can at anytime trigger an automatic re-install of the OCR add-on files by clicking on the Install OCR add-on link in the Conversion task page as shown below:

InstallOCR add-on in grey means that the add-on files have been installed. Click on the text to redownload and reinstall them (~500MB download)
Install OCR add-on in red means that the add-on files have not yet been installed. Click on the text to download & install them (~500MB download)

Make sure that Save subtitles or Embed subtitles options are enabled in the Conversion task → Advanced settings to extract and process subtitles using OCR.

MCEBuddy Versions 2.4.7 to 2.6.1

You will need to manually download and install the OCR files as described below.

Follow the below procedure to enable OCR and image to text subtitle conversion to extract DVB and other image based subtitles from recordings.

Download the OCR files from https://github.com/tesseract-ocr/tessdata/archive/3.04.00.zip

Extract the contents of the zip file (tessdata-3.04.00.zip). It should create a folder called tessdata-3.04.00 inside which will be 100+ traineddata files. Make sure there are no sub-folders. It should look like tessdata-3.04.00\<100+ traineddata files>

Move the folder tessdata-3.04.00 to inside the ccextractor directory where MCEBuddy is installed: <MCEBuddy installation directory>\ccextractor\
e.g. Move tessdata-3.04.00 to inside C:\Program Files\MCEBuddy2x\ccextractor\

Rename the tessdata-3.04.00 directory to tessdata. Important: Do not miss this step or OCR wont’ work

So your final setup should look like: <MCEBuddy installation directory>\ccextractor\tessdata\<100+ traineddata files>
e.g. C:\Program Files\MCEBuddy2x\ccextractor\tessdata\<100+ traineddata files>

Make sure you’ve enabled the Extract subtitles and closed captions option in your Conversion Task advanced settings and you’re all set! It will extract and convert image based subtitles into a text SRT file. Enjoy!

Versions older than 2.4.7 do not support OCR

John_Freiman · November 16, 2020, 1:43am

Is Tessdata 4.x or 5.x (alpha) supported by MCEBuddy and can it be installed in the same way/location?

There are additional subfolders in both the 4 and 5 versions of the updates.

Goose · November 16, 2020, 5:49pm

Tessdata is used by ccExtractor and it doesn’t support 4.x or 5.x as yet

John_Freiman · November 17, 2020, 2:10am

Good to know, thank you. That saved me a lot of time.

erinsfun · January 30, 2023, 4:12pm

I downloaded the zip file and it is empty. I’ll try again.

erinsfun · January 30, 2023, 5:04pm

had to get 7-zip. windows would not unzip it

Goose · September 30, 2025, 6:33pm

Update, starting version 2.7.1 MCEBuddy will support tessdata for tesseract 4.x and 5.x. The default tessdata files downloaded will still be for 3.04 because in our testing we found those to be the best for OCR in burnt in video and image subtitles but you can experiment with your own tessdata trained files if interested.

Points to note:

The default is always PSM mode 3 but the OEM mode will switch to LTSM when using tessdata for 4.x and 5.x and LEGACY when using tesssdata for 3.x
The directory structure remains the same for all versions, all the trained data files should be placed in a folder called tessdata as described in the instructions above

If you find that using different PSM or OEM mode is working better let us know the details with samples and we look into allowing users to customize the modes.

Here are some links to the different tessdata trained files compatible with tesseract 4.x and 5.x that have been tested with MCEBuddy

Topic		Replies	Views
Extract subtitles stored as Teletext Subtitles / Closed Captions	7	2852	December 24, 2018
Extracting DVB using ccextractor New Features subtitles	7	3395	March 13, 2019
Cannot get subtitles embedded into MKV files Subtitles / Closed Captions	4	994	June 7, 2020
Unable to Retrieved Closed captions Subtitles / Closed Captions	22	174	October 1, 2025
How to hard code foreign subtitles Subtitles / Closed Captions	3	1514	February 26, 2019

How to Enable DVB and Image Subtitle Extraction

MCEBuddy Version 2.7.1 and newer

MCEBuddy Version 2.6.2 to 2.6.6

MCEBuddy Versions 2.4.7 to 2.6.1

Related topics