Announcement

Collapse
No announcement yet.

More OpenACC 2.5 Code Lands In GCC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • More OpenACC 2.5 Code Lands In GCC

    Phoronix: More OpenACC 2.5 Code Lands In GCC

    More code for supporting the OpenACC 2.5 specification has been landing in mainline GCC...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Having used OpenACC myself I've never really seen that much of a point in it beyond allowing developers to offload pre-existing OpenMP code onto Nvidia GPUs with rather minor modifications (which it actually does pretty brilliantly). It also allows developers without prior GPGPU experience to easily write GPGPU code that isn't performance critical, but I've never seen much of a point in that as GPGPU code tends to be performance critical almost by definition.

    They may have have improved the optimizations made by the compiler, but last time I used it (late 2015) the performance difference between a well written OpenACC application and a well written CUDA application could be as high as 10x (in favor of CUDA).

    Comment


    • #3
      Originally posted by L_A_G View Post
      ... last time I used it (late 2015) the performance difference between a well written OpenACC application and a well written CUDA application could be as high as 10x (in favor of CUDA).
      OpenACC, OpenCL ... seems like none of these cross-platform standards can gain much traction as long as the GPGPU field is dominated by NVidia. Which only happens because I think there isn’t much mass-market demand for GPGPU--it’s still very much a niche area.

      Comment


      • #4
        Originally posted by ldo17 View Post
        OpenACC, OpenCL ... seems like none of these cross-platform standards can gain much traction as long as the GPGPU field is dominated by NVidia. Which only happens because I think there isn’t much mass-market demand for GPGPU--it’s still very much a niche area.
        The main reason GPGPU is a niche area is because effective use of GPGPU APIs still requires a rather significant re-write of existing applications and algorithms along with learning some pretty big APIs. OpenACC is supposed to go after just that and they've at least got the new API burden covered pretty well as the necessary modifications for a halfway decent implementation using well written OpenMP code can be as quick as a few minutes once you've got the hang of the API. GPGPU APIs have for quite some time tried to become easier to learn and use with features like unified memory (which allows the programmer to not have to worry about the separate device and host address spaces), but they're never going to become as easy to use as OpenACC.

        The extra development effort is the same reason other non-CPU accelerators like Intel's Xeon Phi and various FPGA-based solutions haven't taken off that hard either. However with machine learning becoming a bigger and bigger thing as time goes on I have a feeling GPUs may find a stable, and rather lucrative, place to exist in the field of general purpose computing.

        Comment


        • #5
          Originally posted by L_A_G View Post

          The extra development effort is the same reason other non-CPU accelerators like Intel's Xeon Phi and various FPGA-based solutions haven't taken off that hard either. However with machine learning becoming a bigger and bigger thing as time goes on I have a feeling GPUs may find a stable, and rather lucrative, place to exist in the field of general purpose computing.
          But I don’t think such an application area could ever account for anything approaching 300 million new machine sales per year.

          In other words, it will always be a niche market: low-volume, high-margin. Something that sells to corporates rather than end-users.

          (Kind of like how the whole computer market worked before PCs came along...)

          Comment


          • #6
            Originally posted by ldo17 View Post
            But I don’t think such an application area could ever account for anything approaching 300 million new machine sales per year.

            In other words, it will always be a niche market: low-volume, high-margin. Something that sells to corporates rather than end-users.
            When compared to the total sales of general purpose CPUs everything looks marginal by comparison. Even the sales of general purpose CPUs for compute looks pretty damn marginal when compared to the total sales of general purpose CPUs.

            You really don't need to sell over a hundred million units every year not to be niche product, specially when you're in a market that isn't anywhere near that in size. All that really matters is your market share in the market/markets you're in.

            Comment


            • #7
              Originally posted by L_A_G View Post

              When compared to the total sales of general purpose CPUs everything looks marginal by comparison.
              Not really. Consider the embedded market, and that more ARM chips ship each year than the entire population of the Earth. Even MIPS chips outsell Intel x86 by something like 3:1.

              Even the sales of general purpose CPUs for compute looks pretty damn marginal when compared to the total sales of general purpose CPUs.
              Look at it this way: why are all these servers and data centers and even supercomputers nowadays so heavily Intel x86-based? It’s because they’ve been benefiting from that 300-million-units-a-year volume of the PC market.

              Only that market is no longer shipping 300 million units per year; it’s currently down to 280 million, and still declining. Hence you see growing interest in, say, ARM chips for these server and supercomputer applications.

              You really don't need to sell over a hundred million units every year not to be niche product, specially when you're in a market that isn't anywhere near that in size. All that really matters is your market share in the market/markets you're in.
              It’s called “economies of scale”. It means you can leverage work already done for that mass market, rather than having to custom-design and manufacture everything for your niche market.

              Remember that VLSI chips have very low unit cost of manufacture: most of the cost is in setting up the fab (which is currently more than a billion dollars). Hence the price becomes very dependent on the volume you can ship.

              Comment


              • #8
                Originally posted by ldo17 View Post
                Not really. Consider the embedded market, and that more ARM chips ship each year than the entire population of the Earth. Even MIPS chips outsell Intel x86 by something like 3:1.
                Embedded is a bit beside the point... An embedded system running a wrist watch or a thermostat generally don't really to need any of the advances that have been made in computer hardware since the 1980s. There's a reason why so many embedded systems still run on old MOS 6502 and Zilog Z80 derived CPUs and with the extremely low margins it's not like these two 1970s classics are going anywhere any time soon.

                Look at it this way: why are all these servers and data centers and even supercomputers nowadays so heavily Intel x86-based? It’s because they’ve been benefiting from that 300-million-units-a-year volume of the PC market.

                Only that market is no longer shipping 300 million units per year; it’s currently down to 280 million, and still declining. Hence you see growing interest in, say, ARM chips for these server and supercomputer applications.
                Growth in interest for ARM beyond the embedded market mostly stems from just about all modern implementations being built for energy efficiency. Most of the cost savings from virtualization have already been used up so it's merely just hardware vendors looking at what are essentially still embedded CPUs to take care of lighter computational loads like data centers (where it's mostly disc access).

                As for scientific compute applications, ARM-cores are still way too slow to compete with x86 in low thread count compute applications and in very high thread count compute applications GPUs with their highly specialized compute units already fill any gap highly parallel ARM parts would fill. Intel's Xeon Phi boards showed that there isn't a market for CPUs with dozens upon dozens of small general purpose CPUs when their jobs are done so much better by GPUs and other more specialized hardware.

                [QUOTE]It’s called “economies of scale”. It means you can leverage work already done for that mass market, rather than having to custom-design and manufacture everything for your niche market.[QUOTE]

                Companies who make GPUs for scientific compute also make consumer GPUs based on the same architectures so I don't know how the hell you could have gotten the picture that they somehow don't take advantage of the economies of scale. Both companies have for quite a while shipped millions of consumer GPUs every year and I think Nvidia ships something close to 10 million GPUs every quarter.

                Remember that VLSI chips have very low unit cost of manufacture: most of the cost is in setting up the fab (which is currently more than a billion dollars). Hence the price becomes very dependent on the volume you can ship.
                In the 1980s one of the founders of ARM infamously said something to the effect that in the future there's going to be two kinds of semiconductor companies, those with their own fabrication plants and those that have gone bankrupt.

                The reason why people say that he infamously said that was that he was badly wrong and that not having your own fabrication plants is the norm for semiconductor companies. Apart from companies specialized in making memory (SK Hynix, etc.) or embedded hardware and nothing else, those who own fabrication plants are generally either dedicated fabrication companies purely focused on fabricating parts for other companies (TSMC, GlobalFoundries, etc.) or then they do make their own parts, but also offer manufacturing as a service to other companies (Intel, Samsung, etc.).

                Literally the only company that makes anything as complex as a modern high performance CPU and fabricates it themselves is Intel, and as I mentioned they also do contract fabrication for other companies. A decade ago AMD and IBM used to do the same thing, but first AMD realized that the capital costs were way too high to keep on doing it so they spun it off as a separate company called GlobalFoundries and a couple of years ago IBM sold them their semiconductor manufacturing arm.

                Comment


                • #9
                  Originally posted by L_A_G View Post

                  Embedded is a bit beside the point... An embedded system running a wrist watch or a thermostat generally don't really to need any of the advances that have been made in computer hardware since the 1980s. There's a reason why so many embedded systems still run on old MOS 6502 and Zilog Z80 derived CPUs and with the extremely low margins it's not like these two 1970s classics are going anywhere any time soon.
                  Which is not the market that ARM has so thoroughly dominated, to the point where Intel’s Atom efforts have completely failed to gain any traction at all.

                  As for scientific compute applications, ARM-cores are still way too slow to compete with x86 in low thread count compute applications and in very high thread count compute applications GPUs with their highly specialized compute units already fill any gap highly parallel ARM parts would fill.
                  What makes the difference between a massively parallel cluster and an actual supercomputer? It’s all down to the interconnect. The former works fine for your common-or-garden “embarrassingly parallel” problems, but for the more difficult stuff, you need the latter. And here the communication between CPUs becomes the bottleneck, so it becomes possible to use more energy-efficient ones and still blow away the x86 competition.

                  Companies who make GPUs for scientific compute also make consumer GPUs based on the same architectures...
                  Cards optimized for game-playing are not necessarily optimized for compute-intensive tasks. You yourself mentioned the Xeon Phi, did you not? Is that based on a “consumer” GPU?

                  Comment


                  • #10
                    Originally posted by ldo17 View Post
                    Which is not the market that ARM has so thoroughly dominated, to the point where Intel’s Atom efforts have completely failed to gain any traction at all.
                    You brought up the embedded market as whole, not the higher end in which ARM has competed with Atom, and mentioned MIPS, which is generally used towards the lower end of the market these days. When you limit yourself to the tablet and smartphone CPU segment of the embedded market you don't end up far enough ahead of the x86 market in term of sales to make up for the vastly lower margins found in the embedded market.

                    What makes the difference between a massively parallel cluster and an actual supercomputer? It’s all down to the interconnect. The former works fine for your common-or-garden “embarrassingly parallel” problems, but for the more difficult stuff, you need the latter. And here the communication between CPUs becomes the bottleneck, so it becomes possible to use more energy-efficient ones and still blow away the x86 competition.
                    When it comes to die-to-die interconnects it really doesn't matter all that much what's on the actual die, it's the actual interconnect itself that matters. If something is possible with one CPU architecture, it's possible on any other equally advanced architecture. Even with GPUs you can have very high speed and low latency interconnects now that Nvidia has made it's NVLink tech available to the public. In case you didn't know, the floating point operation performance figures used in the top 500 rankings is not the actual performance available to applications running on the machine, it's just to total theoretical performance if you didn't have any interconnect or memory related bottlenecks. Because of that the #1 spot holder is just the organization that has been able to assemble the largest amount of high performance hardware in one server room, not the actual performance available to a real world HPC application running on all of it.

                    As for the TaihuLight, you should take the claims about it's performance with more than just a grain of salt as the performance figures on it, and it's precursors, have never been independently verified. Being owned by the Chinese army, civilian access to is much more limited than it's western counterparts and branches of the Chinese government have for a very long time had a tendency to fudge their numbers to make it look like they've done a really good job.

                    Cards optimized for game-playing are not necessarily optimized for compute-intensive tasks. You yourself mentioned the Xeon Phi, did you not? Is that based on a “consumer” GPU?
                    The Xeon Phi, while housed like a GPU, is not a GPU. It's essentially a bunch of ARM-class x86 cores with big vector instruction units, 8 way SMT and a fast ring bus to connect them all. This is very similar to the high performance ARM packages like Cavium's ThunderX offerings and the chips used in the TaihuLight.

                    GPUs may not be perfect for all compute uses, but general purpose CPUs are actually pretty badly suited for the job as they're built for running code that's badly optimized, branches all over the place, loads of reliance on single threads. Most well written high performance compute applications are massively parallel, contain little if any branching and have highly predictable data access. Because of this general purpose CPUs have a significant portion of their silicon area used on features simply not needed by well written high performance compute applications. GPUs on the other hand, specially ones from AMD, have more or less exactly what these applications need and not anywhere near as much else as general purpose CPUs.

                    Comment

                    Working...
                    X