Announcement

Collapse
No announcement yet.

Llamafile 0.8.1 GPU LLM Offloading Works Now With More AMD GPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Llamafile 0.8.1 GPU LLM Offloading Works Now With More AMD GPUs

    Phoronix: Llamafile 0.8.1 GPU LLM Offloading Works Now With More AMD GPUs

    It was just a few days ago that Llamafile 0.8 released with LLaMA 3 and Grok support along with faster F16 performance. Now this project out of Mozilla for self-contained, easily re-distributable large language model (LLM) deployments is out with a new release...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Wow. So is the list of compatible hardware above like, 2 generations? Give it a decade and things might really start cooking.

    Comment


    • #3
      Originally posted by geerge View Post
      Wow. So is the list of compatible hardware above like, 2 generations? Give it a decade and things might really start cooking.
      Nvidia needs to make ZLUDA a thing. (People would probably pay for it too).

      But for real… AMD needs to enhance ROCM by keeping all supported GPUs included, albeit eventually at lower feature levels - similar to CUDA in that regard.

      Comment


      • #4
        Originally posted by Eirikr1848 View Post
        Nvidia needs to make ZLUDA a thing. (People would probably pay for it too).
        The licensing is probably why both Intel and AMD dropped this on the floor and went "I'm not touching this". From Nvidia's perspective if they start licensing CUDA then that will mean they'll sell less GPUs. They like their vendor lock-in and I don't see how they'd willingly part with that unless they're forced to.

        Comment


        • #5
          I am always happy to hear more news on Llamafile development! I am using it and it's great. Not yet achieved GPU acceleration (Debian Stable, which I use, doesn't come with ROCM, and I don't know how to workaround it), but even on CPU it works fine.

          Comment


          • #6
            Originally posted by geerge View Post
            Wow. So is the list of compatible hardware above like, 2 generations? Give it a decade and things might really start cooking.
            This is based on llama-cpp and the GPU enabled hardware list depends on what rocBLAS has been built for. On Debian 13, Ubuntu 23.10/24.04, NixOS 24.05, or SolusOS, that is at least all discrete Vega 10, Vega 20, RDNA 1, RDNA 2, RDNA 3, CDNA 1, CDNA 2 GPUs. So, they have four or five generations enabled.

            The AMD official packages have far fewer architectures enabled, but AMD's packages are not the only option for installing ROCm these days.

            Comment


            • #7
              Originally posted by piorunz View Post
              I am always happy to hear more news on Llamafile development! I am using it and it's great. Not yet achieved GPU acceleration (Debian Stable, which I use, doesn't come with ROCM, and I don't know how to workaround it), but even on CPU it works fine.
              Everything needed for llama-cpp GPU acceleration is available in Debian Unstable. With the update to ROCm 5.7.1 and the Ubuntu 24.04 release complete, I think creating Bookworm backports is probably the next order of business.

              Comment


              • #8
                Originally posted by cgmb View Post

                Everything needed for llama-cpp GPU acceleration is available in Debian Unstable. With the update to ROCm 5.7.1 and the Ubuntu 24.04 release complete, I think creating Bookworm backports is probably the next order of business.
                Cant' wait! Hopefully, fully working on GPU: OpenCL and ROCm will be ready for next Debian Stable. I will upgrade then.

                Comment


                • #9
                  Originally posted by cgmb View Post

                  This is based on llama-cpp and the GPU enabled hardware list depends on what rocBLAS has been built for. On Debian 13, Ubuntu 23.10/24.04, NixOS 24.05, or SolusOS, that is at least all discrete Vega 10, Vega 20, RDNA 1, RDNA 2, RDNA 3, CDNA 1, CDNA 2 GPUs. So, they have four or five generations enabled.

                  The AMD official packages have far fewer architectures enabled, but AMD's packages are not the only option for installing ROCm these days.
                  On my Phoenix laptop(7840u) I am always pretending that I have discrete card with a trick like this
                  Code:
                  export HSA_OVERRIDE_GFX_VERSION=11.0.0

                  Comment

                  Working...
                  X