Scan/compare performance issue

Hi! I’ve been using MCEBuddy for a few years. I am currently using MCEBuddy Premium 2.6.6.

I use MCEBuddy to keep sets of source directories (which I maintain on the local HDD of a PC running MCEBuddy) and destination directories (which are on a file server and accessed via SMB) synchronized, using the “Skip reconversion” and “Check history” options. The destination directories on the file server are used to serve converted media to clients. MCEBuddy is configured to convert files from certain source directories, and simply replicate files from others without converting (using the “Rename without converting” option). The main takeaway here is that the source directories are never emptied; files “live” in the source directories all the time, and changes are replicated to the “visible” directories on the file server as the source directories change. I currently have 9,589 files in the source directories.

All of this works. My problem is that scanning performance is abysmal and getting worse as I add files. The initial traversal of the directory trees and building the list of files is rather quick, finding all files in about 22 seconds and processing up to 1,000 files per second. (During this stage, I see the files enumerated in mcebuddy.log as “xxx is being monitored for syncing with output file”.) The parsing of files and comparing to what is in the history file, however, continues for another 49 minutes, processing only 3 to 4 files per second. (At this point, I see the files enumerated in mcebuddy.log as “xxx already converted with status Converted”.) This process seems mainly CPU-bound, as MCEBuddy.Service.exe averages 24-25% CPU usage on a 4-core system (obviously not a beast, with an Intel Core i5-7600 @ 3.5 GHz and 16 GB RAM). If I assign CPU affinity to 1 CPU core for that process, it uses 100% of that core. Of course I don’t know the internals of MCEBuddy, but I have to imagine that all of this CPU overhead is MCEBuddy comparing the files in the source directory tree against the records in the history file, which is 7.36 MB at this point. I know enough to realize that an INI file of this size is very expensive to parse. So I tried disabling the “Check history” option in all my conversion tasks, thinking that would cause MCEBuddy to ignore the history file and just check for the existence of the destination file name on the server for each source file name on the local drive. But it did not change the performance characteristics at all; it actually added 12 seconds to the scan (though that deviation is so minor in context that the setting change could be irrelevant).

I have used Process Monitor to examine what is going on, and even with “Check history” disabled, I see MCEBuddy.Service.exe checking each source file once, and doing literally hundreds of reads on the history file for each one, but not touching the file server at all. I guess it makes sense why it would need to read the history file regardless to find out what the destination file path is, but I expected some change in bevahior here. Regardless, this does not appear to be my silver bullet.

Can you suggest any other setting changes that might improve performance, or am I just pushing the practical limits of MCEBuddy’s architecture? I’d imagine this sort of use case would work much better with a real database (e.g. SQLite) and I think the history file is the bottleneck here, but I am open to suggestions.

I have already tried recreating the history file (Show history → Clear history, made sure “C:\Program Files\MCEBuddy2x\config\history” is gone, restarted the MCEBuddy2x service, started conversion and allowed it to rebuild). That also made no difference.

I have uploaded MCEBuddy logs to the FTP server, in two subdirectories: “scan - check history” and “scan - no check history”. To create each, I updated settings, stopped the MCEBuddy2x service, deleted mcebuddy.log, started the service, started conversions, waited until scanning was complete, and stopped the MCEBuddy2x service, so each log should be very clean.

Check history enabled: 14:18:20 - 15:08:06 = 49m:46s
Check history disabled: 13:09:50 - 13:59:48 = 49m:58s

Thanks for reading!

Hi toasterking, thanks for the detailed problem description and the logs. It sounds like you’re running into a performance bottleneck with the history file, especially with such a large number of files.

You’re right that the INI file format for the history can become inefficient with a very large number of entries. While I can’t directly change the software’s architecture to use a database, let’s see if there are any other settings or workarounds that might help.

Given your use case, where source directories are never emptied and you’re essentially syncing changes, the “Skip reconversion” and “Check history” options are crucial. However, the performance impact you’re seeing is significant.

Let’s try to explore some alternative approaches or confirm some assumptions:

  1. “Skip reconversion” with “Check history” disabled: You mentioned that disabling “Check history” didn’t help. My understanding is that “Skip reconversion” relies on the history file to know if a file has been previously converted and what its destination was. So, even if “Check history” is explicitly disabled, the system might still be querying the history file for the “Skip reconversion” logic. This would explain why you’re still seeing hundreds of reads on the history file.

  2. Optimizing the history file: Is there any possibility of archiving or splitting the history file if certain source directories are “stable” and rarely change? If you have older, static content that doesn’t need constant checking, perhaps those entries could be moved to a separate, less frequently accessed history file. This is more of a manual workaround, but it might reduce the size of the actively parsed history.

  3. Alternative synchronization methods: While MCEBuddy is excellent for conversions, for pure file synchronization, especially with such a large dataset, specialized file synchronization tools might be more efficient. However, this would mean separating your conversion and synchronization workflows, which might not be ideal for your setup.

Could you clarify if you’re experiencing similar performance issues if you temporarily disable “Skip reconversion” in addition to “Check history” (on a test set, of course, to avoid unnecessary reconversions)? This would help us isolate if the “Skip reconversion” logic itself is heavily reliant on the history file parsing.

Also, have you considered using a more powerful CPU or an SSD for the MCEBuddy installation and history file location if it’s currently on an HDD? While it might not solve the architectural bottleneck, it could alleviate some of the I/O and processing overhead.

I’ll also search the forum to see if other users have encountered similar performance issues with large history files or found any workarounds.

It sounds like you’ve done a thorough investigation into the performance bottleneck, and your observations regarding the history file size and its impact on scanning performance are very insightful. It’s indeed possible that with nearly 10,000 files and a 7.36MB history file, MCEBuddy is encountering limitations with its current architecture, especially if it’s repeatedly parsing the INI file.

Since disabling “Check history” didn’t significantly alter the behavior, and you’re still seeing numerous reads on the history file, it reinforces your hypothesis that the history file itself is the core issue.

While I can’t directly suggest internal architectural changes like switching to a SQLite database, I can offer some general troubleshooting steps and considerations that might help, or at least provide more data:

  1. Examine the mcebuddy.log more closely: You’ve already done a great job of using the logs, but sometimes a deeper dive can reveal subtle patterns. Look for any recurring errors or warnings during the “parsing and comparing” phase. Are there any specific files or types of files that seem to take longer to process?
  2. Temporarily reduce the number of monitored files: As a test, could you temporarily reduce the number of source directories or files MCEBuddy is monitoring? If the scan time scales down proportionally, it would further confirm that the number of files and the history file size are the primary drivers of the performance issue.
  3. Monitor disk I/O: While you mentioned CPU usage, it’s worth checking disk I/O during the “parsing and comparing” phase. If the history file is being read hundreds of times per file, even with caching, there could be a significant disk I/O component, especially if the drive is not an SSD. Process Monitor can show you read/write operations on the history file.
  4. Consider breaking down your tasks: If possible, could you split your 9,589 files into multiple, smaller sets and create separate MCEBuddy tasks for each? This wouldn’t solve the underlying history file issue, but it might allow MCEBuddy to process smaller chunks more efficiently, potentially reducing the impact of parsing a single large history file repeatedly. You could then schedule these tasks to run sequentially.
  5. Check for MCEBuddy updates: You mentioned you’re using MCEBuddy Premium 2.6.6. It’s always a good idea to ensure you’re on the absolute latest version available. While a major architectural change might not be in a minor update, performance optimizations are often included.
  6. Reach out to MCEBuddy support directly: Given your detailed analysis and the logs you’ve uploaded to their FTP server, I highly recommend reaching out to MCEBuddy’s official support channels directly. They would have the most insight into the internal workings of the application and could confirm if you’re indeed pushing the limits of its current design for your specific use case. They might also have beta versions with performance improvements or specific recommendations for large libraries.

It’s clear you’ve put a lot of effort into diagnosing this, and your insights are valuable. I hope these suggestions help you either find a workaround or get a definitive answer from the MCEBuddy developers.

Wow, big load of help this bot is. It’s essentially suggested these options:

  1. Get a faster CPU (which is only a temporary fix at best since CPU requirement will increase exponentially with INI length).
  2. Change entire media serving approach so that it better suits MCEBuddy.
  3. Re-architect MCEBuddy. (Nope, the source is not open anyway.)
  4. Make changes that would cause MCEBuddy not to work, necessitating #3.
  5. Investigate things I’ve already investigated.

I would sooner write my own application than do any of these. I may end up doing that, but I’m hoping I can make MCEBuddy continue to serve my workflow!

Interesting setup. We’ve tested MCEBuddy here in our labs with up to 500,000 files in a directory structure and haven’t seen any particular performance degradation while scanning files.

I will try to address some of your questions.

  • When using the limit CPU affinity or changing the CPU priority from the status page, that only affects the actual conversions and not the main engine which scans files, monitors the overall status of conversions, handles the GUI etc. So if you limit the CPU, it will slow down the conversions but the main engine should still be operating at full speed
  • The INI files database is very highly optimized and tweaked for performance and scalability and we’ve seen no issues with it with over a million entries in it (so the history file shouldn’t add any overhead either)

If I were to take a guess based on my experience I would bet on your disk being the probable bottleneck. Since you’re only doing renaming, it’s basically very little CPU but a LOT of hard disk activity. So while the conversion task is busy copying and moving files from one disk to another, the main engine monitoring for new files is also trying to use the hard disk to look for new files and that clash between the two is probably overloading your disk and slowing everything down.

Are you using a SSD or a regular spindle hard disk?

Thanks very much for your reply, Goose!

  • I am not normally assigning CPU affinity. I only did so as a test to see if MCEBuddy.Service.exe would saturate the entire CPU core if I assigned it one core during scanning. And it did.
  • There were no conversions occurring during my testing, as MCEBuddy was only scanning files which had already been converted/renamed, already existed in the history file, and already had converted/renamed files existing in the destination.
  • Maybe you misunderstood, but I am not only doing renaming. Some conversion tasks only rename, but others convert. Each task is bound to a single monitor location.
  • I checked the disk usage during scanning using Windows Task Manager and, after the 22-second initial file enumeration, MCEBuddy was using 0-1% disk time for the remaining 49 minutes, but using 25% CPU. Again, during this task, there was no conversion and no renaming/copying. MCEBuddy was only scanning files which had already been converted/renamed, already existed in the history file, and already had converted/renamed files existing in the destination. Windows and MCEBuddy are installed on an SSD, but the media files being processed are on a spinning rust disk. But if the disk were the bottleneck, I would expect to see high disk usage and low CPU usage in Task Manager, but instead, I see the opposite. It was only processing 3 to 4 files per second, and nothing entered the conversion queue.
  • I forgot to mention before, but I also tried disabling antivirus on the PC and the file server and saw no difference in performance.

If you think the INI file parsing is definitely not the bottleneck, I’ll stop focusing on that. But what is MCEBuddy.Service.exe doing with all that CPU time? How can I find out?

That could be waiting for windows to enumerate all the files in the directory. The service should not take much CPU for any extended period of time. The CPU should mostly be taken up in the other processes it uses (like ffmpeg, handbrake etc). If the service itself is taking CPU for an extended period of time then it’s basically invoking something in windows and windows is spinning it’s wheels on it. An example would be like mcebuddy asking windows to enumerate all the files in the directory and for some reason windows is taking a very long time to do (typically that would happen when the disk is very busy). Since your case the disk isn’t busy, would need to see where it’s spending all it’s CPU cycles.

Try this, start MCEBuddy and the conversions but hit the pause button. Then the only CPU that should be taken up is by the scanning. How it behaves then will tell me something new, whether the conversions themselves is having an impact on the scanning. From your logs I can see that you only have one active conversion.

Thanks, Goose.

I tried this again and clicked Start, then Pause as soon as the MCEBuddy2x service started. Windows Defender is active but the source folders are set as an exclusion and I also disabled Real Time Protection before starting the MCEBuddy2x service. Unfortunately, I got the same result. Scanning still took about 48 minutes.

01:33:02 - 02:20:54 = 47m:52s

mcebuddy.log is in the “scan - check history on, conversion paused” folder under my username on the FTP server.

Is there a way we can enable additional debug logging to see exactly what part of the file queries the MCEBuddy service is waiting on the longest?

Okay, we were able to look into this further and replicate the issue you’re facing. It turned out to be the history file that was causing the bottleneck in some cases while re-scanning the folders after the initial scan when the history file grows beyond a certain size. We’ve re-worked in our tests we are seeing a 50x-100x increase in performance now for subsequent scans. The initial scan after clicking the start button will take time since it extracts metadata from each file in addition to scanning the directory which is a slow process. Thanks for reporting this.

Try out today’s 2.7.1 beta build and let me know how it goes.

You fixed it, @Goose!! :clap: :partying_face: You guys are heroes!! :smiling_face_with_three_hearts:

The scan now takes 26 seconds, down from 49 minutes. I can definitely live with that! This is the fastest it has ever been since I set this up 3 years ago!

22:21:44 - 22:22:10 = 0m:26s

I suppose my setup is an edge case, or at least the first time someone has set it up this way with a large collection and complained about the performance. Anyway, I am glad you were able to improve the product and I’m very happy with my setup again. Cheers to you and the other dev(s) who made this happen! :clinking_beer_mugs:

1 Like

I have removed my logs from the FTP server.