depending on the cache size, it's more designed for instruction word lengths that the processor is designed for, e.g. if you have a 64 bit processor then a hyperthreaded processor would simply divide this in twain. Cuts out most of the overhead. TG2 has to double up on instruction words to fill a 64 bit processor at the moment and that has huge costs. If you have to do that for every processor then it just gets bogged down in "paperwork". For a hyperthreaded processor, the extra cores are only hypothetical, in reality you still have 1 core doing 1 thing at a time but taking advantage of cycles that would not normally be used, e.g. missread instructions from the cache.