Announcement

**GruenSein** · 16 December 2018, 06:15 AM

How well does it scale with cores? IIRC, it didn't scale well beyond 8 cores a while back.

**Meteorhead** · 16 December 2018, 08:15 AM

This seems strange to me. Looking at the raw perf and bamdwidth of GPUs, when ET was new is nowhere near that of a dual socket EPYC setup. Is it that hard to scale on graphics using many x86 cores?

**davidbepo** · 16 December 2018, 08:26 AM

i bet it will run faster on an similar intel CPU, afaik llvmpipe uses avx really well

**dungeon** · 16 December 2018, 08:27 AM

I played Bugs Bunny: Lost in Time... GL game from 1999. on Kaveri APU (for FullHD res that was enough), yes on WINE plus OpenGL using llvm-pipe. Just put mesa's dll and it doing fine on CPU:

Mesa3D For Windows

http://fdossena.com/?p=mesa/index.frag

Software Rendering for OpenGL

With that CPU you could do bunny at 8K resolution maybe even 16K i guess, it just use plain GL no extensions and does not need more than 30 fps anyway

I guess any newer GL version increases requirement for software rendering several times, say 3 times for any version bump, let alone extensions, etc...

edit: Bugs Bunny: Lost in Time is really lost in time

only have some shadow bugs on GL, very ancient GL i guess - or just no one dared to fix it

**dungeon** · 16 December 2018, 08:52 AM

It says there LLVM 128 bits, maybe that does matter

Somewhere else it says 256 bits

**mczak** · 16 December 2018, 03:02 PM

Originally posted by GruenSein View Post

How well does it scale with cores? IIRC, it didn't scale well beyond 8 cores a while back.

Yes, scaling is bad, and for that reason it does not even try to use more than 16 threads (+ the main thread).

Originally posted by dungeon View Post

It says there LLVM 128 bits, maybe that does matter

Somewhere else it says 256 bits

It's a result of disabling avx on AMD. I suspect should probably rework that logic now but it would need some testing. The reasoning for the logic is that running with 256bit wide vectors is nowhere near 2 times as fast as with 128bit wide vectors even on intel, and in particular Bulldozer family of chips have pretty bad performance when using 256bit vectors - not only do they split everything into 2x128bit vectors but the decoder actually has quite reduced throughput. Zen doesn't suffer from the latter problem, but I'd still expect 256bit vectors to be a loss (as it still splits things up into 128bit pieces). It should be a win with Zen2, however.
Also the logic is a bit flawed, since there is actually no need to disable AVX, should only disable 256bit operation, but I don't think this really makes a difference (and with newer llvm versions, it will actually still use AVX anyway).

**Particle** · 17 December 2018, 10:22 AM

Scaling is indeed quite bad with llvmpipe. Back when I bought my Vega 64 but before support for it landed, I played Quake 1 on llvmpipe for a while. I could get about 30 fps at 1280x720 iirc. Utilization of my AMD 1950X wasn't very good. I suspect that renderer doesn't get a lot of love or attention from the developers since it isn't particularly important for general high performance graphics. It's just a stand-in when hardware 3D isn't available and is capable of rendering your desktop environment if it's accelerated, etc.

**chuckatkins** · 18 December 2018, 02:49 PM

I believe LLVM is artificially limited to 16 threads; see src/gallium/drivers/llvmpipe/lp_limits.h l64: #define LP_MAX_THREADS 16. You should be able to just change that 16 to 256 and then you can use the LP_NUM_THREADS env var to take it all the way up to 256 threads. I'd also like to see a comparison with SWR, that'd be great (control threading with KNOB_MAX_WORKER_THREADS env var). It would certainly be interesting given that the two really target different workloads; fragment shaders are the primary workload used by scientific visualization which llvmpipe is still single threaded for, while swr can parallelize all of the supported shader operations.

**mczak** · 18 December 2018, 10:32 PM

Originally posted by chuckatkins View Post

I believe LLVM is artificially limited to 16 threads; see src/gallium/drivers/llvmpipe/lp_limits.h l64: #define LP_MAX_THREADS 16. You should be able to just change that 16 to 256 and then you can use the LP_NUM_THREADS env var to take it all the way up to 256 threads.

Yes but that's what I'm saying - it is intentionally limited to 16 threads because scaling is bad. If you lift the limit it can use more threads but they will just be idle most of the time (even with 16 threads, I bet you see quite lengthy idle periods - there's some design shortcomings, and probably the biggest is that anything pre-rasterization can't run in parallel, and worse it can't run concurrently with fragment shading, thus the threads go idle).

I'd also like to see a comparison with SWR, that'd be great (control threading with KNOB_MAX_WORKER_THREADS env var). It would certainly be interesting given that the two really target different workloads; fragment shaders are the primary workload used by scientific visualization which llvmpipe is still single threaded for, while swr can parallelize all of the supported shader operations.

It's the opposite. scientific visualization deals with huge data sets, fragment shading is usually simple. llvmpipe does threads only for fs (whereas SWR indeed can scale everything, and should be able to use more threads meaningfully).

Announcement

A Look At The LLVMpipe OpenGL Performance On Mesa 19.0 With A 64C/128T Server

A Look At The LLVMpipe OpenGL Performance On Mesa 19.0 With A 64C/128T Server

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment