Announcement

**rene** · 10 October 2018, 01:46 PM

+1 for AMD being back to the HPC game! ,-)

**StefanBruens** · 10 October 2018, 02:24 PM

GraphicsMagick seems to scale with log(threads) instead of number of threads, until I miss something.

Many results have a speedup of >> 2 when going from 32 to 64 threads, probably threads are not split evenly between packages.

**andrei_me** · 10 October 2018, 02:33 PM

How much would cost a system like this?

**varikonniemi** · 10 October 2018, 02:41 PM

what magic does stockfish and vgr do when 32->64 threads more than doubles performance?

**Mr.Radar** · 10 October 2018, 02:50 PM

Did you buy this system or is it on loan from Dell for review purposes? Configuring this server on Dell's online store gets it into the 5 figures very easily just from the CPU and memory options.

**Michael** · 10 October 2018, 02:56 PM

Originally posted by Mr.Radar View Post

Did you buy this system or is it on loan from Dell for review purposes? Configuring this server on Dell's online store gets it into the 5 figures very easily just from the CPU and memory options.

Review sample to be used for future Linux server benchmarking and other interesting performance tests.

**darkbasic** · 10 October 2018, 03:22 PM

Awesome machine!

**Wielkie G** · 10 October 2018, 03:34 PM

I might have found out the reason for greater than expected scaling between 32 and 64 threads.

The specification table shows that each configuration up to 32 threads works at ~2.7GHz frequency, but 64 and 128 thread configurations work at 3.1GHz. The less-thread configurations might not have working CPU turbo enabled, hindering their performance.

**jrch2k8** · 10 October 2018, 03:39 PM

Originally posted by varikonniemi View Post

what magic does stockfish and vgr do when 32->64 threads more than doubles performance?

It may depend on several factors, threads(or cores) are only a part of problem, sometimes if your data is big enough it can choke your cache pipelines or even the RAM bandwidth against certain cores on certain numa nodes, L3 victim cache starvation, etc.

When you see those cases where speedup is more than double usually means you got enough processing to remove the bottleneck on bandwidth or cache due to enough parallelization allowing the hardware to more efficiently handle smaller chunks of data.

Of course there are other factors related and unrelated to bandwidth or cache but usually those just contribute in a smaller scale.

There is also the possibility of a runtime algorithm selector in that application, aka sometimes you find a way to make an algorithm really neat and fast to realize later that it hits a ceiling at some point and stop scaling BUT until that point is the fastest implementation you can reach, then after months of breaking your head you realize the other "slow" algorithm you didn't wanted to use because slow before that ceiling turns out to be an scalability Chuck Norris and ends up been a lot faster when it can scale enough. Hence you end up switching algorithms at runtime depending on the size(or any reasonable parameter) of the dataset to use the most effective tool for the job

Announcement

A Look At Linux Application Scaling Up To 128 Threads

A Look At Linux Application Scaling Up To 128 Threads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment