Announcement

**Raka555** · 16 March 2020, 01:17 AM

I assume it is hand optimized assembler ?

**boxie** · 16 March 2020, 01:19 AM

Well, that's some... Blazing... Performance

**tiennou** · 16 March 2020, 01:23 AM

Originally posted by Raka555 View Post

I assume it is hand optimized assembler ?

A look at the commit (https://git.kernel.org/pub/scm/linux...94d765c8eecbe1) points in that direction.

**cl333r** · 16 March 2020, 01:33 AM

To me as a desktop user this is just gibberish, as about all of the "news" on lxer and linuxtoday.

**cbxbiker61** · 16 March 2020, 02:07 AM

It's always welcome to see performance improvements. Optimizing arm would no doubt hit a larger user base with all of the arm based OpenWRT DdWRT routers (I've got 5 routers running OpenWRT and one of them would benefit from netfilter performance improvements).

**Setif** · 16 March 2020, 03:22 AM

I hope It's not about some operations that occur once or twice a day, changed from taking 4.2 ms to 1.0 ms (420%).

**r08z** · 16 March 2020, 04:52 AM

This is what ClearLinux does on a regular basis for all libraries and programs with a few simple avx2 instricts patches to help the program make better use of the -march=haswell compiler flag.

**milkylainen** · 16 March 2020, 05:12 AM

Originally posted by r08z View Post

This is what ClearLinux does on a regular basis for all libraries and programs with a few simple avx2 instricts patches to help the program make better use of the -march=haswell compiler flag.

Vectorization and architecture support is not the same as structured optimized assembly for complex data.
But if the code structure was different, maybe more standard vectorization would have helped.
Replacing a functions or calls with a hand optimized intrinsic is not the same either.

**ldesnogu** · 16 March 2020, 05:48 AM

Originally posted by cbxbiker61 View Post

It's always welcome to see performance improvements. Optimizing arm would no doubt hit a larger user base with all of the arm based OpenWRT DdWRT routers (I've got 5 routers running OpenWRT and one of them would benefit from netfilter performance improvements).

It seems the author agrees:

nft_set_pipapo: Introduce AVX2-based lookup implementation - kernel/git/pablo/nf-next.git - Netfilter's -next tree

https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git/commit/?id=7400b063969bdca4a06cd97f1294d765c8eecbe1

A similar strategy could be easily reused to implement specialised versions for other SIMD sets, and I plan to post at least a NEON version at a later time.

Announcement

Linux 5.7 Netfilter To See AVX2 Optimizations For Big Performance Boost - Can Be Up To ~420%

Linux 5.7 Netfilter To See AVX2 Optimizations For Big Performance Boost - Can Be Up To ~420%

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment