How to Enable DVB and Image Subtitle Extraction

MCEBuddy can automatically convert image based subtitles (e.g. DVB) into a text subtitle (SRT) using OCR (Optical Character Recognition). However this feature requires additional files to be downloaded. This is because OCR processing requires very large data files (~1 GB in size) which cannot be bundled within the MCEBuddy setup file.


MCEBuddy Version 2.6.2 and newer

During installation the option to automatically download and install the required OCR add-on files is enabled by default. You can disable it if do not want the OCR add-on files downloaded and installed automatically.

If you did not select the automatic installation during setup, you can at anytime trigger an install/re-install of the OCR add-on files by clicking on the Install OCR add-on link in the Conversion task page as shown below:

InstallOCR add-on in grey means that the add-on files have been installed. Click on the text to redownload and reinstall them (~500MB download)
Install OCR add-on in red means that the add-on files have not yet been installed. Click on the text to download & install them (~500MB download)

Make sure that Save subtitles or Embed subtitles options are enabled in the Conversion task → Advanced settings to extract and process subtitles using OCR.


MCEBuddy Versions 2.4.7 to 2.6.1

You will need to manually download and install the OCR files as described below.

Follow the below procedure to enable OCR and image to text subtitle conversion to extract DVB and other image based subtitles from recordings.

  1. Download the OCR files from https://github.com/tesseract-ocr/tessdata/archive/3.04.00.zip
  2. Extract the contents of the zip file (tessdata-3.04.00.zip). It should create a folder called tessdata-3.04.00 inside which will be 100+ traineddata files. Make sure there are no sub-folders. It should look like tessdata-3.04.00\<100+ traineddata files>
  3. Move the folder tessdata-3.04.00 to inside the ccextractor directory where MCEBuddy is installed: <MCEBuddy installation directory>\ccextractor\
    e.g. Move tessdata-3.04.00 to inside C:\Program Files\MCEBuddy2x\ccextractor\
  4. Rename the tessdata-3.04.00 directory to tessdata. Important: Do not miss this step or OCR wont’ work

So your final setup should look like: <MCEBuddy installation directory>\ccextractor\tessdata\<100+ traineddata files>
e.g. C:\Program Files\MCEBuddy2x\ccextractor\tessdata\<100+ traineddata files>

Make sure you’ve enabled the Extract subtitles and closed captions option in your Conversion Task advanced settings and you’re all set! It will extract and convert image based subtitles into a text SRT file. Enjoy!


Versions older than 2.4.7 do not support OCR

Is Tessdata 4.x or 5.x (alpha) supported by MCEBuddy and can it be installed in the same way/location?

There are additional subfolders in both the 4 and 5 versions of the updates.

Tessdata is used by ccExtractor and it doesn’t support 4.x or 5.x as yet

Good to know, thank you. That saved me a lot of time. :slight_smile:

I downloaded the zip file and it is empty. I’ll try again.

had to get 7-zip. windows would not unzip it

1 Like