Announcement

Collapse
No announcement yet.

OpenPOWER Summit 2020 Was This Week With Many Interesting Hardware/Software Talks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • OpenPOWER Summit 2020 Was This Week With Many Interesting Hardware/Software Talks

    Phoronix: OpenPOWER Summit 2020 Was This Week With Many Interesting Hardware/Software Talks

    In addition to XDC2020 this past week, the Linux Foundation hosted the virtual OpenPOWER Summit North America 2020 event as well with a mix of interesting hardware and software presentations...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Just curious if people thought RISC-V is putting pressure on OpenPOWER these days? I a would assume yes, but in a good way. Just thinking about that as I was reading the A20 slides.

    Comment


    • #3
      SiFive is seeding workstations with their high performance RISC-V CPU to developers these days and considering all the buzz about that architecture they could attract quite a lot of interest during the past five years. Besides of the efforts of Raptor, I haven't heard of any push from the openPOWER camp to cater more to developers for desktop/workstation usage scenarios and eventually to attract more people to their ecosystem. They need something more interesting to that crowd if they want to stay relevant.

      tl,dr: RISC-V appears to be cool, new and different [albeit still immature technologically], openPOWER looks old and slow [but solid technologically] in comparison.

      Comment


      • #4
        OpenPOWER may look slow compared to RISC-V only to people who have no experience in assembly programming and instructions sets. Such people are not really qualified to have an opinion whether an ISA is slower or faster than another.

        Any POWER program will have significantly less instructions than a RISC-V program doing the same work, especially in loops, which are the most important for determining the speed of the program.

        Therefore, at similar hardware implementation complexities, an OpenPOWER CPU will always be faster than a RISC-V CPU.

        The performance of a RISC-V processor can be increased by heroic efforts, i.e. by decoding a very large number of instructions in parallel and by doing instruction fusion for many kinds of instruction pairs, as a workaround for the excessive number of instructions.

        However, if the same amount of additional hardware resources would be added to the OpenPOWER processor, its performance will become even higher.


        Unfortunately I agree with you that most people have heard that RISC-V is fashionable now, so they imagine that it must be a modern architecture, better than older architectures, like ARM and POWER.

        Nevertheless, RISC-V is in fact a primitive ISA, which does not implement features that were proved to be useful and necessary more than a half of century ago, like indexed addressing.

        The RISC-V creators have the huge merit of popularizing the idea of an open and free ISA and without their work it is likely that alternatives like OpenPOWER would have never appeared.

        Unfortunately their ideas about how an instruction set should be encoded do not seem to have been based on any serious experience with hardware or software implementations. The commercial ISA's like POWER or the 64-bit ARMv8 are much superior.

        The only good part of RISC-V is not in the base ISA, but in the vector extension and they also have the merit of reminding everybody that the ancient vector ISA's, like Cray, were in fact better than the SIMD extensions implemented by everybody after 1995.

        Nowadays however, others have also added similar vector extensions.

        Comment


        • #5
          Originally posted by ms178 View Post
          Besides of the efforts of Raptor, I haven't heard of any push from the openPOWER camp to cater more to developers for desktop/workstation usage scenarios and eventually to attract more people to their ecosystem.
          That's because you haven't read the article..
          There are a PowerPC64 Laptop being in the works


          Originally posted by ms178 View Post
          tl,dr: RISC-V appears to be cool, new and different [albeit still immature technologically], openPOWER looks old and slow [but solid technologically] in comparison.
          Actually PowerPC was one of the last arch's to apers in the 90's..

          RISCV, for now, is still vapourware..
          Alibaba does own a implementation of it, with respectable performance( but with more than 50 proprietary instructions on it.. ).
          Sifive has a mediocre implementation comparing to Alibaba one.. and both doesn't compare to POWER perf..

          Comment


          • #6
            Originally posted by tuxd3v View Post
            That laptop is dead even before it could be realized. They are still clinging on to the old PPC64 and not PPC64LE, just because of Altivec. Other than Debian, no other enterprise-grade distribution has any support for it. PPC64LE is where practically all the PPC64 development is at, and even then the effort is so miniscule that most libraries and applications cannot be built as-is for PPC64LE without a ridiculous chunk of out-of-tree patches.

            Even then, just compare the sizes of the x64 repositories in Debian and Fedora to the size of their corresponding PPC64LE repositories, the x64 repository is much, much larger than the PPC64LE.

            Comment


            • #7
              Originally posted by AdrianBc View Post
              Any POWER program will have significantly less instructions than a RISC-V program doing the same work, especially in loops, which are the most important for determining the speed of the program.
              Instructions not 100 percent true count of performance. RISC-V instruction count does change a lot based on what features the RISC-V design.

              Yes both power and risc-v have a defualt instruction size of 4bytes or 32 bits. Risc-v has a compressed instruction set so 2 bytes or 16 bits power does not have this and this is important difference.

              Power does have a 8 byte wide instruction that risc-v does not have.

              Originally posted by AdrianBc View Post
              The performance of a RISC-V processor can be increased by heroic efforts, i.e. by decoding a very large number of instructions in parallel and by doing instruction fusion for many kinds of instruction pairs, as a workaround for the excessive number of instructions.
              Except this is not as straight forwards. Lets take note of risc-v instruction length. So decoding 2 risc-v compressed instructions at a time with risc-v is in fact equal to decoding a complex 32 bit wide instruction with power.

              Originally posted by AdrianBc View Post
              However, if the same amount of additional hardware resources would be added to the OpenPOWER processor, its performance will become even higher.
              This is not 100 percent true. You have ignored the difference in instruction width. Risc-v compressed instruction set can store in the instruction cache part of L1 over twice as many instructions as power due to 16 bits vs 32 bits.

              You have missed that a lot of speciality functions in the power ISA are duplicated by Risc-v compressed of generic instructions being processed 2 at a time. Now it gets more complex than this.

              Lets say we have two cores 1 power and one risc-v both have the samemicro-operations that the instruction sets are being converted to and you are performing fusion. Something something interesting will happen. The power ISA stream will be having less fusion events performed than the risc-v. This is simple the more complex you ISA instructions the more micro-operations are required to perform the instruction so since power ISA has more complex instructions these fusion happens less.

              Risc-v there is a big advantage to decoding more instructions and performing fusion because the instructions on adveragae use less micro-operations individually. This also makes designing the instruction set simpler.

              Risc-v was design with the idea of fusion in mind. Power was not designed with the idea of fusion in mind. Yes a ideal instruction set for a CPU core performing fusion is different to what the power instruction set is.

              Please note this fusion thing also applies to armv8 vs power as well. At times both armv8 and risc-v have more instructions that power but due to compression in the instruction set 2 instructions equal 32 bytes so binary size is not large and processing to micro-operations is not equally slower processing that power either. Remember both the armv8 and the risc-v instruction sets were design with the idea that fusion would be in the cpu design. Result of a instruction set design for fusion is particular speciality instructions make absolutely no sense.

              Fusion make your metrics harder. You have a risc-v/armv8 with 16 compressed instructions vs a 8 power instructions performing the same task due to effective fusion on the 16 risc-v/armv8 and 8 power instructions both can be complete in the same processing time because 16 risc-v/armv8 and the 8 power instruction were generating the same number of micro-operations to be performed.

              Originally posted by AdrianBc View Post
              Nevertheless, RISC-V is in fact a primitive ISA, which does not implement features that were proved to be useful and necessary more than a half of century ago, like indexed addressing.
              This gets complex. Risc-v is targeted as a instruction set that scales down to deep embedded.


              For deep embedded the fact Risc-V default instruction set can add and subtract is kind of excessive. There is a lot of things over the last half of a century like index addressing that is not useful in all cases. Yes it would be good if the Risc-V had a option in the instruction set for the cases that index addressing is useful to provide. The big thing that features like index addressing not absolutely necessary everywhere and at times not useful at all. The problem you run into attempting to scale down power ISA is you find bits in the power ISA like index addressing you wish to remove because for the embedded use case you don't need it but you cannot because the ISA was not designed that those parts would be optional.

              Yes I will give you for some current usage cases RISC-V is primitive ISA lacking features but power for particular use cases is overlay complex/advanced.

              Lot of ways it simpler for RISC-V over time to mature to a more advanced ISA than power move back to a more primitive for deep embedded.

              Originally posted by tuxd3v View Post
              RISCV, for now, is still vapourware..
              Alibaba does own a implementation of it, with respectable performance( but with more than 50 proprietary instructions on it.. ).
              Sifive has a mediocre implementation comparing to Alibaba one.. and both doesn't compare to POWER perf..
              I would not say vapourware as there are real risc-v chips hard to get but you can get them. Alibaba has said they are looking to develop those extra instructions to be added to the risc-v specifications.

              Comment


              • #8
                Originally posted by oiaohm View Post

                Instructions not 100 percent true count of performance. RISC-V instruction count does change a lot based on what features the RISC-V design.

                Yes both power and risc-v have a defualt instruction size of 4bytes or 32 bits. Risc-v has a compressed instruction set so 2 bytes or 16 bits power does not have this and this is important difference.

                Power does have a 8 byte wide instruction that risc-v does not have.
                When comparing 2 ISAs, the comparison should never be made between a compressed encoding for one of them and the base encoding for the other ISA.
                The reason is that a compressed encoding variant can be added to any ISA, so this is an orthogonal feature to the structure of the base instruction encoding.

                The latest POWER version has indeed added an 8-byte instruction format, to allow the embedding of large immediate constants inside the instructions.
                The ability of having large immediate constants was a significant advantage of x86 vs. traditional RISCs, which are forced to either construct a single constant from a sequence of instructions or have a separate constant pool, which increases the program size and consumes extra cache lines, slowing the execution.

                However I was not comparing the latest improved POWER version with RISC-V, but the classic version, with only 32-bit instructions.

                Moreover, even if this is not widely known, because it was implemented only in few processors, the POWER ISA also has a compressed encoding variant, with variable-length encoding, which results in similar program sizes like the other compressed RISC encodings, e.g. compressed RISC-V, ARM Thumb or nanoMIPS.

                Even if we would compare the speed of compressed RISC-V with uncompressed POWER, the compressed encoding has only an indirect effect on speed, mostly in the cases when the working set of a program happens to fit inside the instruction cache only when compressed. The compression is not magic, sequences of more frequently used instructions are shorter, but there are also sequences of instructions that cannot be encoded in the compressed form, so they must be substituted by longer sequences of compressed instructions, which are slower.

                I do not have experience in implementing a project with RISC-V, but I have a lot of experience with ARM, POWER and x86. For ARM CPUs supporting both the compressed Thumb encoding and the base ARM encoding, the compressed encoding is always used to reduce the program size, but extremely seldom to increase the speed. Even on a system with a narrow 16-bit memory interface, where the compressed encoding had a large advantage in the reduced fetching time for the instructions, most performance-critical parts of the program had to be encoded using the non-compressed ARM encoding to reach a decent speed.

                I expect a similar behavior for RISC-V.



                Originally posted by oiaohm View Post

                Except this is not as straight forwards. Lets take note of risc-v instruction length. So decoding 2 risc-v compressed instructions at a time with risc-v is in fact equal to decoding a complex 32 bit wide instruction with power.

                This is not 100 percent true. You have ignored the difference in instruction width. Risc-v compressed instruction set can store in the instruction cache part of L1 over twice as many instructions as power due to 16 bits vs 32 bits.

                There are many programs which spend much of their time waiting for I/O events and for the speed of such programs the ISA matters very little. For such programs either POWER or RISC-V are equally fast.


                However, there are also programs whose speed is bound by the execution of various computations, and there the ISA matters. The speed of such program is determined by how fast are executed various loops and on modern CPUs the speed of each loop is determined in various ways by the number of instructions executed inside the loop.

                When there are sufficient execution resources, the total number of instructions determines the speed, which is limited by the number of instructions decoded per cycle. When the loop contains many instructions that must use the same kind of execution resource, then the speed may be limited by the number of a certain kind of instructions, not by the total number of instructions. For example, if there are fewer adders than the number of instruction decoders, the speed of a loop might be determined by the number of addition instructions, not by the total number of instructions.


                The compressed encoding increases the speed of fetching the instructions from the main memory, but it does not have any relationship with the instruction decoding speed, which is determined by the number of instruction decoders that are implemented in hardware and that can run in parallel on the instruction stream.

                So NO, "decoding 2 risc-v compressed instructions at a time with risc-v is in fact" *NOT* "equal to decoding a complex 32 bit wide instruction with power".

                Decoding RISC-V instructions and decoding POWER instructions are tasks with similar complexity so at the same cost you will include the same number of instruction decoders in a chip, resulting in the same maximum number of instructions executed per clock cycle. A POWER decoder is somewhat more complex, maybe by 20% to 40%, but it is definitely much less complex than a RISC-V instruction decoder that is able to do instruction pair fusion.

                There are many loops where RISC-V may have up to a double number of instructions, mainly due to the lack of indexed addressing, which forces you to insert address computation instructions almost equal in number with the data computation instructions.

                Because of that, either the number of instruction decoders or the number of adders will limit the RISC-V loop execution time to a value much worse than for ARM, POWER, x86 or almost any other ISA.


                Originally posted by oiaohm View Post
                Risc-v was design with the idea of fusion in mind. Power was not designed with the idea of fusion in mind. Yes a ideal instruction set for a CPU core performing fusion is different to what the power instruction set is.
                You are correct, but exactly that is the problem in my opinion.

                ARM or POWER have very little need for instruction fusion, because most of the frequently-used operations can be expressed as a single instruction.

                RISC-V absolutely needs instruction fusion to reach the same speed, because most of the frequently-used operations must be expressed by an instruction pair, i.e. by a data-computation instruction plus an address-computation instruction.

                The error is that an instruction decoder able to do instruction fusion is more complex and expensive than an instruction decoder for a well-designed instruction encoding, like ARM or POWER. This is proven by the fact that even vacuum-tube computers or discrete-transistor computers built before 1960, with only a few thousands of components, included more complete sets of addressing modes in their ISAs.

                Another proof is that the fastest RISC-V design made until now, which has been presented by Alibaba one month ago at Hot Chips had to include a custom instruction set extension with some indexed addressing modes.

                This is an ugly workaround. The set of addressing modes belongs into the base ISA specification, not in extensions. Even if RISC-V has reserved code space for extensions, it is not possible to achieve an optimal instruction encoding when you are forced to split the addressing mode encodings into several instruction word subformats.






                Comment


                • #9
                  Originally posted by AdrianBc View Post
                  OpenPOWER may look slow compared to RISC-V only to people who have no experience in assembly programming and instructions sets. Such people are not really qualified to have an opinion whether an ISA is slower or faster than another.
                  Thanks for your input on the technical details. I am in no way qualified to make statements about the architecture and its technical merits, but I am following the news cycle and major presentations of both camps for a while and you can easily get a sense where the mindshare and hotness is at the moment as you mentioned yourself. To be clear, I wasn't talking about performance when I mentiond "slow" - I was referring to the vibrancy of the community and developments around both camps [e.g., if IBM had opened up sooner, openPOWER would have had better chances to be the ISA of choice for the European Processor Initiative which went with ARM and RISC-V instead. That could have given them more momentum to attract some interest and there are several other decisions which IBM could have made years ago to get more traction with POWER].

                  ... and I am in no way affiliated to one camp or the other. Might the better technology come out on top to the benefit of us all. It is just that IBM and its consortium is half-hearted about their platform, limiting their scope and impact too much with sticking to the HPC/server side. RISC-V is way more aggressive here.
                  Last edited by ms178; 21 September 2020, 06:45 AM.

                  Comment


                  • #10
                    Originally posted by AdrianBc View Post
                    The error is that an instruction decoder able to do instruction fusion is more complex and expensive than an instruction decoder for a well-designed instruction encoding, like ARM or POWER.
                    https://en.wikichip.org/wiki/macro-operation_fusion hard reality arm and x86 also use macro-operation fusion. The reason why power does not have macro operation fusion could be the intel patent not that it a bad idea.

                    Originally posted by AdrianBc View Post
                    This is proven by the fact that even vacuum-tube computers or discrete-transistor computers built before 1960, with only a few thousands of components, included more complete sets of addressing modes in their ISAs.
                    This is absolutely false. https://en.wikipedia.org/wiki/CSIRAC 1949 does not include a complete set of addressing mode. Risc-V has more address modes than CSIRAC. This is not the only example before 1960 that has less addressing modes than Risc-V. Maybe you were aiming for after 1960 but there are many embedded controllers after 1960 right up into current day with very limited addressing modes.

                    The Nanoprocessor is a mostly-forgotten processor developed by Hewlett-Packard in 1974 1 as a microcontroller 2 for their products. Stra...


                    This here is 1974. A processor for embedded that cannot do any form of advanced addressing heck it cannot even add or subtract two numbers. Risc-V is in this wacky area of the market.

                    Originally posted by AdrianBc View Post
                    Another proof is that the fastest RISC-V design made until now, which has been presented by Alibaba one month ago at Hot Chips had to include a custom instruction set extension with some indexed addressing modes.
                    No that is proof that its a useful feature for that usage case. Not that its always a useful feature for every usage case RISC-V has. When you get into embedded usage things get all kinds of wacky.

                    Originally posted by AdrianBc View Post
                    This is an ugly workaround. The set of addressing modes belongs into the base ISA specification, not in extensions. Even if RISC-V has reserved code space for extensions, it is not possible to achieve an optimal instruction encoding when you are forced to split the addressing mode encodings into several instruction word subformats.
                    I disagree when Risc-V target is embedded systems that you need at times to be able to cut the instruction set to be bare bones so by this addressing modes should extension/s so you can drop them when a usage will not need them.

                    Remember for some embedded solutions every square mm of silicon counts more than cpu features.

                    AdrianBc basically you are thinking what will make a good server/desktop cpu with power in most cases problem risc-v you need to ask the question what will make a good light on silicon embedded processor as well. Light on silicon means a lot more things need to be in extensions so you can drop them off the design to save on space.

                    Comment

                    Working...
                    X