Page 1 of 2 12 LastLast
Results 1 to 10 of 34

Thread: Improving The Linux Kernel's Memory Performance

Hybrid View

  1. #1
    Join Date
    Jan 2007
    Posts
    14,597

    Default Improving The Linux Kernel's Memory Performance

    Phoronix: Improving The Linux Kernel's Memory Performance

    Over the past few days there's been an active discussion on the Linux kernel mailing list surrounding the memory copy (the memcpy function to copy blocks of memory) performance within the kernel. In particular, an application vendor claims to have boosted their application (a video recorder) performance by 12% when implementing an "optimized" memory copy function that takes advantage of SSE3...

    http://www.phoronix.com/vr.php?view=OTgwMQ

  2. #2
    Join Date
    Dec 2009
    Posts
    269

    Default Very interesting

    Very interesting!
    Thank you!

  3. #3
    Join Date
    Aug 2011
    Posts
    2

    Default

    It's indeed pretty interesting.
    But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?

  4. #4
    Join Date
    Dec 2009
    Posts
    269

    Default

    Quote Originally Posted by Dylar View Post
    But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
    Most likely, it's already doing it. Most likely your kernel already has support for SSE3, etc. Programs that are designed to take advantage of SSE3 will do so.
    Before, memcopy() function did magix of copying stuff, however, if I understood article correct, they want to use SSE3 for copying something big, which will give rather nice boost.

    But i am no programmer unfortunately.

  5. #5
    Join Date
    Dec 2009
    Posts
    269

    Default to check it:

    Quote Originally Posted by Dylar View Post
    But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
    cat /proc/cpuinfo |grep sse3

    It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?

  6. #6
    Join Date
    May 2010
    Posts
    20

    Default

    Quote Originally Posted by dimko View Post
    cat /proc/cpuinfo |grep sse3

    It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?
    It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.

  7. #7
    Join Date
    Dec 2009
    Posts
    269

    Default

    Quote Originally Posted by signals View Post
    It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.
    pni - checked!

    Is it still called PNI on Intel CPU?

  8. #8

    Default

    Quote Originally Posted by Dylar View Post
    It's indeed pretty interesting.
    But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
    You seem to be asking two separate questions at the same time.

    1. Yes, there is a way for the kernel to check for SSE3 support. /cat/proc demonstrates this capability.

    2. There is a distinction between -march style optimizations, rather than -mtune style optimizations. With -mtune the compiler will generate multiple versions of different code paths so that every CPU gets its own "optimized" version. With -march, the compiler assumes that these instructions are always available. To the best of my knowledge, the kernel uses -march. I have not checked the code to verify this, but I am fairly certain that if you configure your kernel compilation to build a kernel for hardware newer than what you have, it breaks when you try to run it on the older hardware, which is consistent with -march. Therefore, the kernel must be explicitly compiled for it.

    Quote Originally Posted by dimko View Post
    cat /proc/cpuinfo |grep sse3

    It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?
    SSE4A is AMD's variant of Intel's SSE4 extensions.

    Quote Originally Posted by dimko View Post
    pni - checked!

    Is it still called PNI on Intel CPU?
    They never renamed it. It would cause newer processors to use code paths meant for much older processors when executing older binaries if they did that.

    They call it pni on AMD cpus for the same reason.

    Quote Originally Posted by Smorg View Post
    memcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?
    If you write a kernel, you need a way of copying data back and forth between real memory and virtual memory. You will have a problem if you hit a page boundary and the rest of what you are copying does not continue on the next page.

    Also, when writing a kernel, you need to write your own library routines, because libraries specified in the ANSI C specification are meant for userland, not kernels.

    Quote Originally Posted by liam View Post
    Do we really want to add more x86 specific code to the kernel?
    Other than that, sounds cool. I had no idea hitting the SSE was so costly. I suppose it makes sense that they were intended for rather larger data sets, but still, hadn't occured to me.
    It is not necessarily x86 specific. It is a technique that applies to any CPU that has SSE3-like vector instructions and if they implement it properly, every CPU with such instructions should see a boost.


    Quote Originally Posted by movieman View Post
    BTW, wasn't SSE3 a compulsory part of AMD64? If so, then this would presumably go into any AMD64 kernel with no need to check processor flags.
    AMD produced the K8 architecture first and then Intel produced Prescott in response. SSE3 did not exist when the K8 architecture was made, so it was not part of the original x86_64 instruction set.
    Last edited by Shining Arcanine; 08-16-2011 at 10:57 PM.

  9. #9
    Join Date
    Jul 2008
    Location
    Greece
    Posts
    3,788

    Default

    Quote Originally Posted by Shining Arcanine View Post
    There is a distinction between -march style optimizations, rather than -mtune style optimizations. With -mtune the compiler will generate multiple versions of different code paths so that every CPU gets its own "optimized" version. With -march, the compiler assumes that these instructions are always available.
    No, that's not what -mtune is doing. It does not generate any instructions that would not run on other CPUs. It just applies changes that will work everywhere, but are known to result in faster execution. From the docs:

    "-mtune=cpu-type - Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions."

    So no multiple code paths or anything. That's what the Intel compiler does. GCC doesn't provide that functionality.

  10. #10
    Join Date
    Feb 2011
    Location
    France
    Posts
    195

    Default

    Quote Originally Posted by Shining Arcanine View Post
    SSE4A is AMD's variant of Intel's SSE4 extensions.
    Not at all.
    http://en.wikipedia.org/wiki/SSE4

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •