Page 4 of 5 FirstFirst ... 2345 LastLast
Results 31 to 40 of 43

Thread: Quad-Core ODROID-X Battles NVIDIA Tegra 3

  1. #31
    Join Date
    Oct 2008
    Posts
    110

    Default

    Quote Originally Posted by ssvb View Post
    Unfortunately integer ld/st instructions can't dual-issue with NEON instructions for Cortex-A9 anymore.
    I think you are wrong: the L/S instructions have their own pipe, and issue can send instructions both to that pipe and to NEON pipes. Did you try it?

  2. #32
    Join Date
    Jan 2012
    Posts
    113

    Default

    Quote Originally Posted by ldesnogu View Post
    I think you are wrong: the L/S instructions have their own pipe, and issue can send instructions both to that pipe and to NEON pipes. Did you try it?
    Yes, of course. I have learned long ago that nobody can be trusted (both random dudes on the Internet and the people I actually consider to be quite knowledgeable). Documentation can't be also trusted without verification (not no mention that it is often incomplete or vague). It goes without saying that I can't be trusted too

    I have encountered the sad fact of Cortex-A9 being unable to dual issue NEON instructions with any L/S instructions (both ARM and NEON) in practice long ago. The Cortex-A9 NEON Media Processing Engine Technical Reference Manual says "with the exception of simultaneous loads and stores, the processor can execute VFP and Advanced SIMD instructions in parallel with ARM or Thumb instructions", which is admittedly not very clear. But there is not need guessing and misinterpreting because we can easily run a simple benchmark program:

    Code:
    .text
    .arch armv7-a
    .fpu neon
    .global main
    
    #ifndef CPU_CLOCK_FREQUENCY
    #error CPU_CLOCK_FREQUENCY must be defined
    #endif
    
    #define LOOP_UNROLL_FACTOR   20
    
    .func main
    main:
            push        {r4-r12, lr}
            ldr         ip, =(CPU_CLOCK_FREQUENCY / LOOP_UNROLL_FACTOR)
            b           1f
        .balign 64
    1:
        .rept LOOP_UNROLL_FACTOR
            vorr        d30, d30, d30
            vorr        d31, d31, d31
            vorr        d30, d30, d30
            vorr        d31, d31, d31
    #ifdef DO_ARM_LDR
            ldr         r0, [sp]
    #endif
            vorr        d30, d30, d30
            vorr        d31, d31, d31
            vorr        d30, d30, d30
            vorr        d31, d31, d31
    2:
        .endr
            subs        ip, ip, #1
            bne         1b
    
            mov         r0, #0
            pop         {r4-r12, pc}
    .endfunc
    Cortex-A9:
    Code:
    $ gcc -DCPU_CLOCK_FREQUENCY=1200000000 bench_mixed_ldr_neon.S && time ./a.out
    real	0m8.093s
    user	0m8.080s
    sys	0m0.000s
    
    $ gcc -DCPU_CLOCK_FREQUENCY=1200000000 -DDO_ARM_LDR=1 bench_mixed_ldr_neon.S && time ./a.out
    real	0m9.048s
    user	0m9.035s
    sys	0m0.000s
    Using LDR instruction adds an extra cycle for Cortex-A9.

    Cortex-A8:
    Code:
    $ gcc -DCPU_CLOCK_FREQUENCY=1000000000 bench_mixed_ldr_neon.S && time ./a.out
    real	0m8.018s
    user	0m8.016s
    sys	0m0.000s
    
    $ gcc -DCPU_CLOCK_FREQUENCY=1000000000 -DDO_ARM_LDR=1 bench_mixed_ldr_neon.S && time ./a.out
    real	0m8.019s
    user	0m8.000s
    sys	0m0.008s
    Cortex-A8 can dual-issue L/S instructions with NEON arithmetics perfectly fine.

  3. #33
    Join Date
    Oct 2008
    Posts
    110

    Default

    Thanks for clearing that up

  4. #34
    Join Date
    Feb 2012
    Posts
    18

    Default

    SS, were you running your OMAP4430 off of USB / OTG power on your original cpuburn test? I have heard that you can run current a bit above spec on the original Pandaboard (but not the 4460 ES). Still, I would be surprised it you can run it at ~300% above spec... I have had quite a bit of success with OTG power so I will give this a try on a Panda A2 generation. The power peaks I am seeing on ODROID-X seem to be during the c-ray PTS tests which peaks at 7W. How are you getting your direct current measurements? I will instrument my boards if I find a good way to measure that...

  5. #35
    Join Date
    Jan 2012
    Posts
    113

    Default

    Quote Originally Posted by SolarNet View Post
    SS, were you running your OMAP4430 off of USB / OTG power on your original cpuburn test? I have heard that you can run current a bit above spec on the original Pandaboard (but not the 4460 ES). Still, I would be surprised it you can run it at ~300% above spec... I have had quite a bit of success with OTG power so I will give this a try on a Panda A2 generation.
    No, I'm using a 5V power supply rated at 3A. OTG just can't provide enough current without violating USB spec. Even the idle system had ~550 mA current draw, which is already too much for OTG.

    The power peaks I am seeing on ODROID-X seem to be during the c-ray PTS tests which peaks at 7W.
    I bet you can run it a lot hotter with a cortex-a9 tuned cpuburn Just do the following and maybe run htop in another terminal or ssh session to verify that all 4 cores are fully loaded. I would not be surprised if the power consumption goes up to 10W or more, which should be easily measurable even with your apparently poor precision power meter:
    Code:
    $ wget https://raw.github.com/ssvb/ssvb.github.com/master/files/2012-04-10/ssvb-cpuburn-a9.S
    $ gcc ssvb-cpuburn-a9.S
    $ ./a.out
    How are you getting your direct current measurements? I will instrument my boards if I find a good way to measure that...
    Just a multimeter connected between the power supply and the 5V barrel jack on the board. Something similar to what is shown on the picture here.

  6. #36
    Join Date
    Feb 2012
    Posts
    18

    Default

    I will try this out... see if I can find the true peak power. I have some more benchmarks at
    http://openbenchmarking.org/result/1...AR-1208150AR20
    comparing a dual-core Exynos (Soft-float) to the quad-core Exynos (Hard-float)... the numbers aren't quite twice as good as I thought they would be. Worse PE / RAM ratio might be in play there... thanks...

  7. #37
    Join Date
    Feb 2012
    Posts
    18

    Default

    Sweet. 11W normal with spikes to 12W... and that is in my hypercooled mineral oil bath to be safe (seriously, I'll take a picture). Next test will be to fully engage the rest of the board and measure. Might need to break the nitrogen out for that one...

    Actually, I'm surprised at how stable it has been. I have only just now dropped it into the oil bath.

  8. #38
    Join Date
    Jan 2012
    Posts
    113

    Default

    Quote Originally Posted by SolarNet View Post
    Next test will be to fully engage the rest of the board and measure.
    It would be interesting (and probably scary) if somebody could implement something like gpuburn-mali400 and run it on odroid-x together with cpuburn And kicking DMA to repeatedly copy something in the background could additionally stress the memory controller. But in any case, it is just a stress test for the cooling system. None of real applications is ever going to consume as much power.

  9. #39
    Join Date
    Feb 2012
    Posts
    18

    Default

    I have some more benchmarks here... the suite keeps crashing at compile bench for some reason... will have to look at that...

    openbenchmarking.org/result/1208245-AR-1208223AR23

  10. #40

    Default

    Quote Originally Posted by SolarNet View Post
    I have some more benchmarks here... the suite keeps crashing at compile bench for some reason... will have to look at that...
    It's probably best not to hit compilebench on the ARM hardware with SD cards since compilebench is rather write-intensive on the storage.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •