Page 10 of 11 FirstFirst ... 891011 LastLast
Results 91 to 100 of 104

Thread: How Valve Made L4D2 Faster On Linux Than Windows

  1. #91
    Join Date
    Apr 2011
    Posts
    236

    Default

    Quote Originally Posted by elanthis View Post
    You are right, "server OS" is much clearer. My apologizes to you for my snarky reply.



    ... No. You are still massively confused by some basic terminology, outright wrong on several points, and misunderstanding how certain things work. You're not making any new points, so I'm not going to repeat myself any more.

    You must learn first the basics. For example a GLSL compiler does not produce byte-code lower than assembly-level. There is not "to metal" access for GPUs. If I remember correctly only Kepler has some atomic operations exposure. For example MADD is a dual operant unit, at the same time is part of the instruction set, and at the same time its assembly.

  2. #92
    Join Date
    Oct 2007
    Posts
    12

    Default

    Hi all,

    I'd like to begin and say I am no senior expert at graphics programming and abstraction however I have spent numerous hours porting Horde3D to OpenGL ES 2.0 in my spare time and I have learned a lot about abstraction constructs and things so maybe I could explain a few things.

    Someone mentioned applying good OOP-design via C++ using subclassing or virtual functions. These are indeed nice, clean and readable design approaches in an ideal world. If you're a hardcore 'to-the-metal' coder you'll find out Ogre3D (and I assume Irrlict) these don't really fit the target hardware. On x86 where you use an out-of-order CPU with massive caches it isn't a huge issue speed-wise, but on in-order CPU platforms like PS3 or Xbox 360 having virtual functions going into a vtable have horrendous performance due to the vtable lookup into memory causing cache misses and huge clock cycles missed in this timeframe, and in a place where it's making a hell of a lot of calls per frame at 16/33ms then this is a massive bottleneck. For more info look here (on Unity's abstraction layer from Aras P):
    http://www.altdevblogaday.com/2011/0...nd-no-virtual/

    The approach to do #ifdefs is not nice to read (for an example, look at Panda3D's OpenGL codepath/abstraction for GL's) but can be advantageous speed-wise if you're doing it correctly, but you may need to make an executable per backend as you're compiling it for a specific target at build-time (which would make sense on platforms with only 1 API to target). I think iD Tech 2 or 3 use the approach outlined in the above link where they define abstraction by making a high-level struct with function pointers inside, and then make library .dll's/.so's for each backend they want to target whether it's a special OpenGL version or even a software renderer, and the game engine will choose the best one using the target system's library loader calls and grabs the function pointers that way, and then the engine will just calls into those while the backend handles the rest without the engine knowing the specifics. On this subject Fabien has made excellent breakdowns of how iD Tech engines are written: http://fabiensanglard.net/ Doom 3 is particularly interesting where they support quite a few code paths for OpenGL alone, to take advantage of extensions that some hardware vendors support like Nvidia's ultra shadow or depth bounds test (unfortunately there was no real support for OpenGL 2.x so it's mostly OpenGL 1.x fixed function with a few ARB shader programs on some hardware).

    I can't really comment on how valve does their DX9-to-OpenGL translation, but I guess they have a frontend which matches DX9 more and then under the scenes they have a 'smart' way to deal with OpenGL's terms and state machine. Horde3D has a similar approach where the 'RendererDeviceInterface' has more DX10-like names but uses OpenGL's functions eg. a 'RenderBuffer' is really a 'FramebufferObject' bound to a texture. Some things which are unavoidable would be .dds textures needing to be flipped (D3D/GL differ here), you either remake the assets or re-use the windows one and just flip them in the shader or when loading them in which might cause some performance penalties.

    I hope this explains a bit...

  3. #93
    Join Date
    Nov 2007
    Posts
    968

    Default

    Quote Originally Posted by artivision View Post
    You must learn first the basics.
    ... That's my line.

    For example a GLSL compiler does not produce byte-code lower than assembly-level. There is not "to metal" access for GPUs.
    I'm not sure what you're trying to say (there's clearly a language barrier issue here), but it still sounds like you're possibly confused on how this all works. A GLSL compiler generally produces some kind of intermediary format which is then fed to a codegen pass that does generate real "to the metal" machine code for the GPU execution cores (has to happen somewhere, after all). Some drivers share this intermediary format with their HLSL compiler (which explicitly generates a Microsoft-defined cross-IHV intermediary format, unlike GLSL), others do not. In either case, all shader languages at some point are compiled to raw machine code, but that code is not specified by any API or language standard, because it varies not only per-vendor but even per-product-cycle, and the APIs are intended to work on all hardware of the appropriate generation. Hence why OpenGL mandates GLSL source code as the lowest level in the standard (NVIDIA defines their assembly program extension, but that itself is just another intermediary format) and why D3D mandates its intermediary format as the lowest level in the standard (basically the same general concept as NVIDIA's GL assembly extension, but part of the API specification rather than as a vendor add-on).

    Quote Originally Posted by MistaED
    Someone mentioned applying good OOP-design via C++ using subclassing or virtual functions.
    Most game developers certainly know that you can have good OOP design _without_ excessive subclassing or virtual functions. The wonderful thing about C++ is that it makes static polymorphism almost as easy as dynamic polymorphism, so you can write a compile-time abstraction layer (without nasty #ifdefs) that is still good OOP. Even at the C level, you can write a single API with multiple backends by simply compiling in different translation units that implement the API in different fashions.

    For example, in a graphics API abstraction layer I am using now, there is a header with non-virtualized class definitions and nothing is inlined. There are then multiple sets of .cpp files, e.g. GfxWin32D3D11*.cpp, GfxDarwinGL*.cpp, GfxiOSGLES*.cpp, etc. The compiler inlines the smaller functions in release builds thanks to LTO and everything else is just a regular function call. Sure, there are platforms that support multiple APIs, but that is very rarely worth even caring about. Each platform has a primary well-supported API which most users' hardware is compatible with, so just use that. And if you're a small-time indie developer, just write for GL ES and ifdef the few bits that need to change to run on regular GL. You probably don't have the time and money to write a high-end fully-featured D3D11 renderer, a D3D9 renderer, an GL3/4 Core renderer, a GL 2.1 renderer, a GLES1 renderer, and a GLES2 renderer... it's not just the API differences, but all the shaders, art content enhancements, testing and performance work, etc. As an indie dev, you'll be making stylized but simplistic graphics, so a single least-common-denominator API is preferable. If you're writing a big engine like Unity or Unreal or whatnot, well... you're going to have a LOT of problems to solve besides the easy stuff like abstracting the "create vertex buffer" API call efficiently.

  4. #94
    Join Date
    Oct 2007
    Posts
    12

    Default

    Quote Originally Posted by elanthis View Post
    For example, in a graphics API abstraction layer I am using now, there is a header with non-virtualized class definitions and nothing is inlined. There are then multiple sets of .cpp files, e.g. GfxWin32D3D11*.cpp, GfxDarwinGL*.cpp, GfxiOSGLES*.cpp, etc. The compiler inlines the smaller functions in release builds thanks to LTO and everything else is just a regular function call. Sure, there are platforms that support multiple APIs, but that is very rarely worth even caring about. Each platform has a primary well-supported API which most users' hardware is compatible with, so just use that. And if you're a small-time indie developer, just write for GL ES and ifdef the few bits that need to change to run on regular GL. You probably don't have the time and money to write a high-end fully-featured D3D11 renderer, a D3D9 renderer, an GL3/4 Core renderer, a GL 2.1 renderer, a GLES1 renderer, and a GLES2 renderer... it's not just the API differences, but all the shaders, art content enhancements, testing and performance work, etc. As an indie dev, you'll be making stylized but simplistic graphics, so a single least-common-denominator API is preferable. If you're writing a big engine like Unity or Unreal or whatnot, well... you're going to have a LOT of problems to solve besides the easy stuff like abstracting the "create vertex buffer" API call efficiently.
    I'm somewhat doing a similar thing supporting ES 2.0 on Horde, I just have different .cpp files and Horde has a nice global static class 'gRDI' which can be common enough if someone wanted to add DX9 or DX10 support. There's only GL 2.1+extensions and GL ES 2.0+extensions support so far, but by the looks of it all the missing pieces from ES 2.0 that the 2.1+extensions backend supports is now in ES 3.0 so it's an easy addition which I'll get around to eventually. I'm probably considered an 'indie' developer so I don't really need anymore backends to support, on the asset-side I have Maya setting up profiles per-platform which just have tweaked GLSL shaders and instructions to convert to certain compressed formats in .ktx format where possible (eg. Android+Tegra2/3 would output S3TC in .ktx, iOS PVRTC in .ktx, etc.)

    Getting back on-topic, I think converting gigabytes of assets in L4D2 and other source games sounds way too excessive, I'd bet they would just do a small conversion in the shader or at load time, given the load times to flip textures before hitting OpenGL aren't too long... I guess those shaders would just use MojoShader to convert HLSL to GLSL, or they just went with Cg *shrugs.*

  5. #95
    Join Date
    Apr 2011
    Posts
    236

    Default

    Quote Originally Posted by elanthis View Post
    ... That's my line.



    I'm not sure what you're trying to say (there's clearly a language barrier issue here), but it still sounds like you're possibly confused on how this all works. A GLSL compiler generally produces some kind of intermediary format which is then fed to a codegen pass that does generate real "to the metal" machine code for the GPU execution cores (has to happen somewhere, after all). Some drivers share this intermediary format with their HLSL compiler (which explicitly generates a Microsoft-defined cross-IHV intermediary format, unlike GLSL), others do not. In either case, all shader languages at some point are compiled to raw machine code, but that code is not specified by any API or language standard, because it varies not only per-vendor but even per-product-cycle, and the APIs are intended to work on all hardware of the appropriate generation. Hence why OpenGL mandates GLSL source code as the lowest level in the standard (NVIDIA defines their assembly program extension, but that itself is just another intermediary format) and why D3D mandates its intermediary format as the lowest level in the standard (basically the same general concept as NVIDIA's GL assembly extension, but part of the API specification rather than as a vendor add-on).



    Most game developers certainly know that you can have good OOP design _without_ excessive subclassing or virtual functions. The wonderful thing about C++ is that it makes static polymorphism almost as easy as dynamic polymorphism, so you can write a compile-time abstraction layer (without nasty #ifdefs) that is still good OOP. Even at the C level, you can write a single API with multiple backends by simply compiling in different translation units that implement the API in different fashions.

    For example, in a graphics API abstraction layer I am using now, there is a header with non-virtualized class definitions and nothing is inlined. There are then multiple sets of .cpp files, e.g. GfxWin32D3D11*.cpp, GfxDarwinGL*.cpp, GfxiOSGLES*.cpp, etc. The compiler inlines the smaller functions in release builds thanks to LTO and everything else is just a regular function call. Sure, there are platforms that support multiple APIs, but that is very rarely worth even caring about. Each platform has a primary well-supported API which most users' hardware is compatible with, so just use that. And if you're a small-time indie developer, just write for GL ES and ifdef the few bits that need to change to run on regular GL. You probably don't have the time and money to write a high-end fully-featured D3D11 renderer, a D3D9 renderer, an GL3/4 Core renderer, a GL 2.1 renderer, a GLES1 renderer, and a GLES2 renderer... it's not just the API differences, but all the shaders, art content enhancements, testing and performance work, etc. As an indie dev, you'll be making stylized but simplistic graphics, so a single least-common-denominator API is preferable. If you're writing a big engine like Unity or Unreal or whatnot, well... you're going to have a LOT of problems to solve besides the easy stuff like abstracting the "create vertex buffer" API call efficiently.


    Wrong, wrong, wrong and wrong.

    1. Compilers don't have access to-the-metal on gpu.

    2. There is not need in any point code to be compiled to-the-metal. The GPU hardware (not the driver) understands an assembly-level instruction (like MAD, or LOG, or MUL, or TEX, or anything) and does the execution lower with smaller instructions (atomic operations and others).

    3. Byte-code of GLSL, HLSL and CG varies per product, but the VM-typed protocol (that all GLSL games are written) is the same. Same goes for OpenCL to.

  6. #96
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    6,924

    Default

    Part of the confusion here may come from the fact that with GPUs you end up running into a lot of compilers -- the one used to compile the application code from C/C++ or whatever to CPU hardware instructions (while still containing high level graphics operations) vs the one in the driver stack used to compile from the high level graphics operations (GLSL, HLSL etc..) to GPU instructions.

    In this case I suspect you may be talking about different compilers. The second one *definitely* goes to-the-metal.

  7. #97
    Join Date
    Apr 2011
    Posts
    236

    Default

    Quote Originally Posted by bridgman View Post
    Part of the confusion here may come from the fact that with GPUs you end up running into a lot of compilers -- the one used to compile the application code from C/C++ or whatever to CPU hardware instructions (while still containing high level graphics operations) vs the one in the driver stack used to compile from the high level graphics operations (GLSL, HLSL etc..) to GPU instructions.

    In this case I suspect you may be talking about different compilers. The second one *definitely* goes to-the-metal.

    The second one and any other doesn't go to-the-metal. There is not to the metal access for GPUs, not even by the driver it self. Also wile GPU has an Instruction-Set, doesn't have a Native_Machine_Languadge. As we say x86 for a CPU, we cannot say nv64 for a GPU. Thats because a GPU missing varius control and execution units that only a CPU has (CISC or RISC). A GPU has ofload more things on Sofware. You never have to-the-metal access, you can't compile something only for the GPU, you always need a CPU and a driver, and always you write and compile to VM (like OpenCL).

  8. #98
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    6,924

    Default

    Um... no. Take a look at the EXA code for pre-SI radeon chips -- it uses to-the-metal GPU programs (aka shaders). They happen to be hard-coded rather than stored compiler output for the simple reason that we historically got 2D driver support running before 3D (and the radeon shader compiler happens to be in the 3D driver) but that sequence is changing with SI anyways.

    A number of GL and CL implementations allow offline storage of compiled shader programs at the GPU binary level. There is a need for the driver to set appropriate state info and issue appropriate draw/compute commands but that is conceptually no different from an OS (CPU) scheduler passing control from the kernel to a user process.

    What GPUs don't generally have & use today is support for GPU programs which "run forever" but which can be pre-empted (pulled off the hardware) to allow other programs to run for a while, but that is quite different from what you are talking about.

    The bigger issue here (which you are correctly identifying but IMO not describing correctly) is that GPU instruction sets are allowed to change more quickly than CPU instruction sets, as a consequence of having standards at a higher level than for CPUs (eg OpenGL / DirectX vs x86 ISA). There is a strong convenience aspect associated with having applications program GPUs via higher level API instead of programming to-the-metal either directly (by having application code include GPU hardware instructions) or indirectly (by having the toolchain compile high level GPU operations in the application to GPU machine instructions in the binary), but that is a "easier for users if you don't" constraint rather than a "you can't do it" one.
    Last edited by bridgman; 08-20-2012 at 09:30 AM.

  9. #99
    Join Date
    Apr 2011
    Posts
    236

    Default

    Quote Originally Posted by bridgman View Post
    Um... no. Take a look at the EXA code for pre-SI radeon chips -- it uses to-the-metal GPU programs (aka shaders). They happen to be hard-coded rather than stored compiler output for the simple reason that we historically got 2D driver support running before 3D (and the radeon shader compiler happens to be in the 3D driver) but that sequence is changing with SI anyways.

    A number of GL and CL implementations allow offline storage of compiled shader programs at the GPU binary level. There is a need for the driver to set appropriate state info and issue appropriate draw/compute commands but that is conceptually no different from an OS (CPU) scheduler passing control from the kernel to a user process.

    What GPUs don't generally have & use today is support for GPU programs which "run forever" but which can be pre-empted (pulled off the hardware) to allow other programs to run for a while, but that is quite different from what you are talking about.

    The bigger issue here (which you are correctly identifying but IMO not describing correctly) is that GPU instruction sets are allowed to change more quickly than CPU instruction sets, as a consequence of having standards at a higher level than for CPUs (eg OpenGL / DirectX vs x86 ISA). There is a strong convenience aspect associated with having applications program GPUs via higher level API instead of programming to-the-metal either directly (by having application code include GPU hardware instructions) or indirectly (by having the toolchain compile high level GPU operations in the application to GPU machine instructions in the binary), but that is a "easier for users if you don't" constraint rather than a "you can't do it" one.

    Shaders are not compiled to-the-metal. Shaders are pre-cemi-compiled to VM (thats the known GLSL), and not like sources of a general program. Then comes the "target-er" and the "optimizer" also known as compiler. The GLSL compiler compiles to assembly-level (MAD, MUL, TEX, LOG, FRC, LIT, and other assembly-level commands). Then the GPU hardware execute them internally with smaller instructions, atomic operations, possibly micro-instructions (if the GPU has microcode), and others. The compiler doesn't have access to the entire instruction set like you have on a CPU with a software-rasterizer.

  10. #100
    Join Date
    Apr 2011
    Posts
    236

    Default

    Even if Nvidia gives you access on the entire instruction-set you still can't use it with any compiler or anything else. Thats because GPUs missing various control and execution units that only a CPU has. For a GPU, programs must be pre-controled, and thats a higher level sub-set.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •