Announcement

Collapse
No announcement yet.

LLVM Looking To Better Collaborate Around Common AI/GPU/FPGA Offloading

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • LLVM Looking To Better Collaborate Around Common AI/GPU/FPGA Offloading

    Phoronix: LLVM Looking To Better Collaborate Around Common AI/GPU/FPGA Offloading

    While most hardware vendors are relying on LLVM when it comes to offloading compute work to GPUs, AI accelerators, FPGAs, and similar heterogeneous compute environments, right now each vendor is basically creating their own LLVM offloading run-time among a lot of other duplicated -- and often downstream only -- code. The new "llvm/offload" project hopes to lead to better collaboration in this area...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I hope it works out, also it would be cool if LLVM could add support for VHDL/Verilog like SystemC.

    Comment


    • #3
      LLVM needs to improve compile time it's all I ask tbh lol. It's compile times are physically painful.

      Comment


      • #4
        Originally posted by vextium View Post
        LLVM needs to improve compile time it's all I ask tbh lol. It's compile times are physically painful.
        On Windows we have two MinGW-w64 toolchains one based on GNU (GCC+Binutils+libstc++) and the other on LLVM (clang+lld+libc++). We set CI jobs to build packages by both of them. the LLVM toolchain is about 10-25% faster than the GNU one.

        Comment


        • #5
          Originally posted by bumblebritches57 View Post
          I hope it works out, also it would be cool if LLVM could add support for VHDL/Verilog like SystemC.
          True, though lots of really cool things have been achieved or could be achieved with even higher level tools e.g. HLS (as you mentioned SystemC), the kinds of stuff being done to look at SyCL / OneAPI / etc. as intel does for all kinds of targets, Scala (q.v. chisel), myhdl, whatever.

          As more tools get more advanced having lower and lower level "IR" layers and more and more architecture-savvy optimizers that can extract logic / data flow / algorithm / architecture insight and optimization from seeing patterns in the IR so that auto-vectorization or mapping to parallel SIMD / RISC / logic block / whatever targets can be done by looking at the "hot" / big loops & algorithms represented etc. and map those at higher and higher levels to a HW architecture / ISA / system execution model then there will be less dependence of the ASM layer or the "operating on bits" VHDL / verilog type layers to just say map a FFT or matrix multiply to a given serial / parallel target.

          I'm sad there hasn't been more work to integrate (in a unified way) correct behavioral models of targets (CPUs, MCUs, FPGAs, GPUs) wrt. the execution environment resource list & usage semantics (cacheing, atomics, latency, registers, pre/post condition & invariant for all ASM instructions, etc. etc.) so that an optimizer could actually look at the ISA and realize some particular CPU has a FP32 MAC that's fast and some other has to emulate it with INT16 operations or whatever.
          Optimizers can only be as intelligent / capable as the target compute / resource model descriptions they process and now we've just got lots of special code for
          -march=armv8-something or whatever vs a unified "here's a machine parse / readable semantic description / model of NVIDIA PTX, AMD RDNA3 ISA, X86-64 ISA, RISC-V, xilinx LUT6, whatever.

          LLVM / qemu / verilator / etc. etc. could all use ways to actually model & reason automatically about a given platform's "design patterns" of how code executes on them and should be mapped to them via "design patterns" at the lowest levels of instructions / registers.

          Comment

          Working...
          X