That sounds like lyrics for the Beastie Boys (or similar), LOL.
Anyway, I ran some numbers today, which gave some surprising results.
I created a standard scene, then test rendered it on 1 core, 2 cores, and 4 cores (via crops with 12% overlap). I've found 12% overlap is required due to the GI inconsistencies.
Here are the results:
1 Core:
core 1 time: 161 seconds
total time: 161 seconds
absolute time: 161 seconds
2 Core:
core 1 time: 92 seconds
core 2 time: 94 seconds
total time: 186 seconds
absolute time: 94 seconds (longest of the 2)
4 Core:
core 1 time: 55 seconds
core 2 time: 58 seconds
core 3 time: 54 seconds
core 4 time: 54 seconds
total time: 221 seconds
absolute time: 58 seconds (longest of the 4)
So 4 cores does still win with an absolute time of 58 seconds. And 1 core loses with an absolute time of 161 seconds.
But the interesting stuff is this: 1 core beats them all when it comes to total time! It was 161 seconds vs 186 seconds for 2 core and 221 seconds for 4 cores. I think this has to do with the inefficiency of using multiple instances, as well as having to overlap the images (which creates 6% and 24% of rendundant render space). So rendering with 4 cores via the workaround is just a novelty at this point, and is very inefficient.
So I can't wait until true multicore functionality arrives. It should scale nearly linearly. So 161 seconds would scale to just over 40 seconds. And we won't have to deal with the GI inconsistencies.