DragonFlyBSD Is Getting Much Better Network/TCP Performance
While DragonFlyBSD's TCP code getting a per-CPU LPORT cache for listen sockets may not sound like an exciting change, it's a huge performance win.
The commit by Sepherosa Ziehau explains, "In order to guard against reincarnation of an accepted connection after the listen socket is closed, the accepted socket is linked on to the same global lport hash list as the listen socket. However, on a busy TCP server, this could cause a lot of contention on this global lport hash list. But think about it again: as long as the listen socket is not closed, reincarnation of an accepted connection is _impossible_, since the listen socket itself is on the global lport hash list."
The performance ramifications are huge:
This change though does mandate a complete rebuild of DragonFlyBSD's world. These TCP performance improvements are to be found in DragonFlyBSD 4.5.
The commit by Sepherosa Ziehau explains, "In order to guard against reincarnation of an accepted connection after the listen socket is closed, the accepted socket is linked on to the same global lport hash list as the listen socket. However, on a busy TCP server, this could cause a lot of contention on this global lport hash list. But think about it again: as long as the listen socket is not closed, reincarnation of an accepted connection is _impossible_, since the listen socket itself is on the global lport hash list."
The performance ramifications are huge:
This greatly reduces the total contention rate on a busy TCP server:
- From 50K/s to 18K/s, if the # of NIC rings does not match the # of cpus. And it gives ~7% performance improvement (420Kconn/s to 450Kconn/s).
- From 30K/s to 800/s, if the # of NIC rings matches the # of cpus. Though this does not give more performance improvement, idle cpu time is increased a bit.
This change though does mandate a complete rebuild of DragonFlyBSD's world. These TCP performance improvements are to be found in DragonFlyBSD 4.5.
Add A Comment