Krakatoa loads the particles in batches (of around 50000 particles) and culls them in parallel, then loads the next batch and culls them parallel, until done. Since the I/O portion is by definition a single threaded operation you generally do not see 100% processor utilization, and will see the CPU usage spike up then down rapidly. That being said, it will also probably be more erratic if the culling geometry is fairly light, since the parallel portion of the pipeline becomes smaller relative to the serial (such as I/O and Modifiers).
As far as I can tell there isn’t such a thing as multi-threaded disk I/O. Its not really a processor operation, so having more cores doesn’t make your disk or network transfer data into RAM faster. I have Krakatoa doing constant disk reads in a background thread, filling one buffer while the rest of Krakatoa is processing the one before it, but you still run into standard bandwidth limitations where you’ll see slowdowns if you can cull 50000 particles faster than you can load and decompress 50000 particles from a networked drive. I can definitely spend time working on variable sized buffers to hide the limited bandwidth, more or less like buffering in streaming video. Its kind of a dumb system right now, since it doesn’t adapt the read-ahead buffer sizes based on the amount of computation you are doing to the incoming particles. So long story short, it is sort of multi-threaded. Do you guys have any sneaky tricks that you use?