How to Enable DVB and Image Subtitle Extraction

documentation

(RBoy) #1

MCEBuddy 2.4.7 supports OCR (Optical Character Recognition) extraction and conversion for DVB and other Image based subtitles into a Text (SRT) subtitle format.

While this capability is enabled automatically when Extract subtitles and closed captions is enabled in the conversion task advanced settings, it will not function without downloading some additional files.

This is because OCR processing requires very large data files which cannot be bundled with the MCEBuddy setup file. You will need to download and install these additional files (~1 GB in size) to utilize the OCR capability.

Follow the below procedure to enable OCR and image to text subtitle conversion to extract DVB and other image based subtitles from recordings.

  1. Download the OCR files from https://github.com/tesseract-ocr/tessdata/archive/master.zip
  2. Extract the contents of the zip file (tessdata-master.zip). It should create a folder called tessdata-master inside which will be 100+ traineddata files. Make sure there are no sub-folders. It should look like tessdata-master\<100+ traineddata files>
  3. Move the folder tessdata-master to inside the ccextractor directory where MCEBuddy is installed: <MCEBuddy installation directory>\ccextractor\
    e.g. Move tessdata-master to inside C:\Program Files\MCEBuddy2x\ccextractor\
  4. Rename the tessdata-master directory to tessdata. Important: Do not miss this step or OCR wont’ work

So your final setup should look like: <MCEBuddy installation directory>\ccextractor\tessdata\<100+ traineddata files>
e.g. C:\Program Files\MCEBuddy2x\ccextractor\tessdata\<100+ traineddata files>

Make sure you’ve enabled the Extract subtitles and closed captions option in your Conversion Task advanced settings and you’re all set! It will extract and convert image based subtitles into a text SRT file. Enjoy!


No subtitles extracted from NextPVR .TS files
2.4.7 Release Notes
(RBoy) #2