Starting version 2.4.7, MCEBuddy supports OCR (Optical Character Recognition) extraction and conversion for DVB and other Image based subtitles into a Text (SRT) subtitle format.
While this capability is enabled automatically when
Extract subtitles and closed captions is enabled in the conversion task advanced settings, it will not function without downloading some additional files.
This is because OCR processing requires very large data files which cannot be bundled with the MCEBuddy setup file. You will need to download and install these additional files (~1 GB in size) to utilize the OCR capability.
Follow the below procedure to enable OCR and image to text subtitle conversion to extract DVB and other image based subtitles from recordings.
- Download the OCR files from https://github.com/tesseract-ocr/tessdata/archive/3.04.00.zip
- Extract the contents of the zip file (
tessdata-3.04.00.zip). It should create a folder called
tessdata-3.04.00inside which will be 100+ traineddata files. Make sure there are no sub-folders. It should look like
tessdata-3.04.00\<100+ traineddata files>
- Move the folder
tessdata-3.04.00to inside the ccextractor directory where MCEBuddy is installed: <MCEBuddy installation directory>\ccextractor\
- Rename the
tessdata. Important: Do not miss this step or OCR wont’ work
So your final setup should look like: <MCEBuddy installation directory>\ccextractor\tessdata\<100+ traineddata files>
C:\Program Files\MCEBuddy2x\ccextractor\tessdata\<100+ traineddata files>
Make sure you’ve enabled the
Extract subtitles and closed captions option in your Conversion Task advanced settings and you’re all set! It will extract and convert image based subtitles into a text SRT file. Enjoy!