Hi,
I have an 16 core Mac as well. Well, 8 cores with Hyperthreading making 16 cores for TG2, which would be the same as your Mac.
I tried your files and see the same CPU usage behaviour. I had a quick look and couldn't find what in particular was allowing test5b to better utilise the CPU. I'm interested though because it's not very often I see a scene which gets such high CPU usage.
Generally speaking you won't be able to make TG2 utilise 100% CPU on a 16 core machine. For one thing TG2 doesn't scale as well as the number of render threads gets higher. The Mac versions scales a little worse than the Windows version as well, this is down to some issues with the OS. OS X 10.6 has the best performance. Generally speaking for most scenes I've found 8 threads to be the sweet spot on the Mac. Sometimes up to 12 threads is a bit quicker, but usually only by a smaller percentage than you might expect.
In the case of your scenes I tried v2_01 with both 16 threads as you'd set it and 8 threads. 16 threads took:
6 mins 25s
8 threads took:
3 mins 30s
You can see that using 8 threads is dramatically faster.
If you will be rendering your scene a lot, for example test renders or an animation, it's worth experimenting with the number of threads until you find the fastest setting.
As to setting subdiv cache, if you do a search for that you should find plenty of information. I will say that setting it to 3200 MB is too much though. You only have a theoretical memory limit of 4GB with the 32 bit version. Telling TG2 it can use 3.2 GB is too much, although it would be worse if you'd said to preallocate it because that would have left only up to 800 MB available for everything else. In your relatively simple scenes this might not be an issue but as your scenes get more complex it could be problematic. When using 8 threads and up I usually set to 100 multiplied by the number of threads i.e. 8 threads x 100 MB = 800 MB.
Regards,
Jo