Announcement

Collapse
No announcement yet.

GCC To Begin Implementing MMX Intrinsics With SSE Instructions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC To Begin Implementing MMX Intrinsics With SSE Instructions

    Phoronix: GCC To Begin Implementing MMX Intrinsics With SSE Instructions

    While current-generation Intel/AMD CPUs are still supporting the MMX SIMD instruction set from two decades ago, a set of GCC compiler patches are pending to begin implementing MMX intrinsics using SSE instructions...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    So it seems that intel is taking steps to retire the MMX instruction set from their hardware? Humph.

    Of course we might be able to recompile our Linux / BSD programs and libraries. We might see updated versions of them than will not do any MMX calls at all.

    But what about binary software? One day we will need a lot of emulators if we want to run e.g. some older games that make use of MMX. It was around for a long time (Pentium MMX from 1993?; OMG they had this horrible ad on TV with these guys in clean-room suits suddenly dancing around and now the colour comes to the internet or something) and it is really wide-spread, even smaller CPUs like VIA C3, Geode GX2 / LX support it.
    Retiring such an important and widely used instruction set sounds like creating a lot of problems for years.
    Stop TCPA, stupid software patents and corrupt politicians!

    Comment


    • #3
      There was chatter some time ago that Intel would remove some legacy functionality, e.g. MMX or x87 FPU but also 16-bit real mode for classic BIOS (also scrapping CSM along the way) etc. - see: https://arstechnica.com/gadgets/2017...-bios-by-2020/

      I'd say: Good riddance!

      Comment


      • #4
        My guess is that they might intend to switch to an implementation which emulates MMX in microcode, which would be slower than before, but fast enough for any old binaries designed for much slower CPUs overall.

        Comment


        • #5
          Originally posted by atomsymbol

          In my opinion, it is improbable for MMX or x87 FPU to be retired from CPUs.
          Why is that? The instructions will just generate a trap and be software emulated instead. That will likely be plenty fast for older software anyway.

          You know, that was how the x87 instructions were run in the first place, on your 386 without the 387 FPU . It is also why the x87 instructions are such an inefficient bolt-on to the native instructions.

          Comment


          • #6
            it's even simpler... modern intel CPUs have 2 ALUs ("ports" in Intel speak) for SSE and AVX but only 1 for MMX/x86.... so you can run 2 SSE or AVX instructions per cycle, but only one for mmx/x87.... so half the performance.

            Comment


            • #7
              Seems like a good idea to convert a lot of these ops to traps, I'm surprised more of them aren't traps yet. AAA, AAD, AAM, AAS, DEC... Nobody generates this code anymore, many of these instructions occur nowhere in any software packaged for Debian.

              Comment


              • #8
                Originally posted by Adarion View Post
                ...Pentium MMX from 1993?
                The first Pentium MMX CPUs were released in 1997.

                Comment


                • #9
                  IIRC, SSE2 support is mandatory for AMD64 compliance. So any 64-bit system is guaranteed to have SSE support.

                  One small nitpick about this bit from the article:

                  Of course, in modern code-bases hopefully you are utilizing modern versions of AVX.
                  It is actually surprisingly hard to get better performance with AVX than with SSE if your code is not a textbook use case for wide vectorization, due to the fact that on current intel CPUs, AVX mode takes a long time to turn on and to turn back off, and causes some significant downclocking which will lead to an effective slowdown in surrounding scalar code and vector code that does not leverages the full vector width.
                  Last edited by HadrienG; 03 February 2019, 05:53 AM.

                  Comment


                  • #10
                    Originally posted by microcode View Post
                    Seems like a good idea to convert a lot of these ops to traps, I'm surprised more of them aren't traps yet. AAA, AAD, AAM, AAS, DEC... Nobody generates this code anymore, many of these instructions occur nowhere in any software packaged for Debian.
                    which is what microcode implementations in processors were for, even already from the very beginning.

                    Comment

                    Working...
                    X