Announcement

Collapse
No announcement yet.

DragonFlyBSD Lands Another NUMA Optimization Helping AMD Threadripper 2 CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DragonFlyBSD Lands Another NUMA Optimization Helping AMD Threadripper 2 CPUs

    Phoronix: DragonFlyBSD Lands Another NUMA Optimization Helping AMD Threadripper 2 CPUs

    DragonFlyBSD lead developer Matthew Dillon has been quite impressed with AMD's Threadripper 2 processors particularly the Threadripper 2990WX with 32-cores / 64-threads. Dillon has made various optimizations to DragonFly for helping out this processor in past months and overnight he made another significant improvement...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Do we already have those kind of optimizations in Linux?
    ## VGA ##
    AMD: X1950XTX, HD3870, HD5870
    Intel: GMA45, HD3000 (Core i5 2500K)

    Comment


    • #3
      We certainly don't have those kind of optimizations in Windows.

      Comment


      • #4
        Man, low level C sometimes makes me want to gouge my eyeballs. From his patch:
        Code:
        while (vpq->lcnt < lcnt_lo) {
            struct vpgqueues *vptmp;
        
            iter = (iter + 1) & PQ_L2_MASK;
            vptmp = &vm_page_queues[PQ_FREE + iter];
            if (vptmp->lcnt < lcnt_hi)
                continue;
            m = TAILQ_FIRST(&vptmp->pl);
            KKASSERT(m->queue == PQ_FREE + iter);
            TAILQ_REMOVE(&vptmp->pl, m, pageq);
            --vptmp->lcnt;
            /* queue doesn't change, no need to adj cnt */
            m->queue -= m->pc;
            m->pc = i;
            m->queue += m->pc;
            TAILQ_INSERT_HEAD(&vpq->pl, m, pageq);
            ++vpq->lcnt;
        }

        Comment


        • #5
          Is there any way to measure how much progress a thread is making in its scheduled time? If so, a heuristic algorithm might also be very effective. It could permute the thread layout and determine which one runs best on the "remote" cores without direct memory access. My department is using a similar approach to optimize load balancing problems one HPC machines with nodes of different performance levels. The nice feature of this is approach is that is doesn't actually need to know anything about the nodes. It simply tracks the process and redistributes the workload, if one node isn't performing in sync with the others. A similar approach could potentially be done for the memory domains in NUMA systems.

          Comment

          Working...
          X