Almost ten years ago, you could buy a 106 CPU Solaris server. Solaris scaled well even then.
A thread presents itself to the OS as a cpu. See for instance Windows cpu window if you have a hyperthreaded cpu, there will be two threads for each core so you will see two cpus for each core.
Today you can buy a 256 thread Solaris server "Sun T5440", which means the Solaris OS sees 256 cpus and use them very well. You need three IBM POWER p570 servers with 14 POWER6 CPUs at 5GHz, to get 7.000 SIEBEL benchmark points. The single Sun T5440 machine, which has 4 Niagara cpus at 1.4GHz, gets 14.000 points. That is double performance. Solaris does really use the CPUs very well, scales very well. One IBM p570 server costed 413.000 USD and one Sun T5440 costed 76.000 USD. You need six IBM p570 servers to match one Sun T5440 server.
(Here are some less credible talk about scalability, because it is a Sun guy that has written it.
Of course you could compile OpenSolaris on a computer with 1000000000 cpus, does that mean that OpenSolaris can use all those CPUs well? Does it mean OpenSolaris scales well? If you compile Linux to such a machine, does it mean Linux scales well? No. The mere existence does not prove anything. You can not say "Linux is available on a machine with many cpus - this must mean Linux scales well". No.
If we talk about the 1024 cpu Linux machine from SGI, it behaves exactly as a cluster - that is, a network with some PCs. I have posted links explaining this. It is no coincidence that SGI posts benchmarks where the work load is embarassingly parallell, so SGI can partition the work load into 128 independent parts, and SGI runs each part on each node in the SGI machine. (As a coincidence, there are 128 nodes in the SGI machine. Just look at the benchmarks.)