Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 23

Thread: Ubuntu Plans For Linux x32 ABI Support

  1. #11
    Join Date
    Oct 2011
    Location
    Toruń, Poland
    Posts
    160

    Default

    Quote Originally Posted by xir_ View Post
    It can allow more to fit into the L2.
    I am just curious. Do -O3 optimizations make binaries "eat" more L2 than -O2 optimizations? Consider everything else left the same in comparison.

  2. #12
    Join Date
    Nov 2008
    Posts
    762

    Default

    Quote Originally Posted by Hirager View Post
    I am just curious. Do -O3 optimizations make binaries "eat" more L2 than -O2 optimizations? Consider everything else left the same in comparison.
    IIRC Firefox is by default compiled with -Os because the smaller cache footprint outweights all the other optimizations. But that's something you'll have to test for each project separately.


    The linked ubuntu docs seem to be hidden behind a login. Is there a solution for the library redundancy? Having to load x32 kdelibs+Qt AND x86_64 kdelibs+Qt for that one KDE-App that benefits from >4GB memory would probably outweight any memory savings to be had.

  3. #13
    Join Date
    Oct 2011
    Location
    Toruń, Poland
    Posts
    160

    Default

    Quote Originally Posted by rohcQaH View Post
    IIRC Firefox is by default compiled with -Os because the smaller cache footprint outweights all the other optimizations. But that's something you'll have to test for each project separately.


    The linked ubuntu docs seem to be hidden behind a login. Is there a solution for the library redundancy? Having to load x32 kdelibs+Qt AND x86_64 kdelibs+Qt for that one KDE-App that benefits from >4GB memory would probably outweight any memory savings to be had.
    No offence meant, but I would rather hear the answer from someone who specializes in this sort of things.

    As to your question. You forget just how big multimedia projects can. It is not about memory savings for big programs. It is about savings achieved in workflows which do not require 64-bit software. 64-bit programs are treated here like an additions and nothing more. So this is a back to the past situation, because it turned out that the drawbacks of 64-bit software can be nullified.

  4. #14
    Join Date
    Jul 2009
    Posts
    241

    Default

    Will there be a benefit for WINE?

  5. #15
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by Hirager View Post
    I am just curious. Do -O3 optimizations make binaries "eat" more L2 than -O2 optimizations? Consider everything else left the same in comparison.
    Well since -O3 favours speed over code size it is likely to be bigger than -O2 and thus fill up cpu cache faster. However since the optimizer aims for fastest speed it will only make code larger when the added cache footprint (like through inlining etc) will not make performance worse.

    In reality though the heuristics governing this are very difficult to get right and this is why sometimes the same code compiled using -O2 will beat -O3. I've never encountered this is with PGO (profile guided optimization) though, which means that the runtime data it uses for making choices when optimizing allows it to accurately value the impact code size/cache misses will have on performance.

  6. #16
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by jakubo View Post
    Will there be a benefit for WINE?
    I don't think so, obviously the actual windows programs won't be faster but I also think that the parts of Windows which Wine reimplements which could potentially be faster needs to run as standard 32-bit code aswell and thus won't be faster. But I'm not sure about this, I don't have much insight into how Wine works.

  7. #17
    Join Date
    Oct 2008
    Posts
    3,036

    Default

    Quote Originally Posted by rohcQaH View Post
    IIRC Firefox is by default compiled with -Os because the smaller cache footprint outweights all the other optimizations. But that's something you'll have to test for each project separately.
    I believe that was changed when they updated to a more recent GCC version and started supporting PGO. I believe they switched to -O3, along with using an option to limit the amount of inlining that normally enables.


    x32 support is unlikely to decrease memory or disk size requirements. In fact, it will almost certainly increase them, because you are just adding new libraries that need to be duplicated in both architectures for compatibility. And the amount of size it will save in a particular executable is really very small. We're talking about reducing a 1024KB program to 1000KB maybe.

    The benefit comes from reducing L1, L2, and L3 cache pressure, which can lead to significant speed boosts. It depends heavily on the application in question, though - and even the hardware it's running on. x32 might bring a big boost on hardware with smaller caches, while giving no boost at all on cpus with a large cache size.

  8. #18
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by smitty3268 View Post
    x32 support is unlikely to decrease memory or disk size requirements. In fact, it will almost certainly increase them, because you are just adding new libraries that need to be duplicated in both architectures for compatibility.
    That is assuming you will keep/need to run applications as x64, in particular if you have a 64-bit cpu and 4gb or less of ram x32 ONLY would be the perfect fit.

    Quote Originally Posted by smitty3268 View Post
    And the amount of size it will save in a particular executable is really very small. We're talking about reducing a 1024KB program to 1000KB maybe.
    I believe you are wrong here, I believe typically a full 32-bit system will use ~20% less ram than an equivalent 64-bit system due to libraries and applications being smaller (as in binaries) and using less ram when running (due to pointer size). Also potentially the x32 code could be even smaller than 32-bit code, this is because that even though both 32-bit and x32 has 32-bit pointers, 32-bit still suffers from having very few registers which means it will need to waste more code performing push'ing and pop'ing from stack in order to reuse the registers. x32 also has 32 bit pointers but TWICE the amount of registers which means that it can keep much more data inside the registers and require much less code to do stack push/pop'ing, thus making code smaller.

  9. #19
    Join Date
    Oct 2008
    Posts
    3,036

    Default

    Quote Originally Posted by XorEaxEax View Post
    That is assuming you will keep/need to run applications as x64, in particular if you have a 64-bit cpu and 4gb or less of ram x32 ONLY would be the perfect fit.
    You're assuming distros are going to create pure x32 distros, which i find unlikely. They already have to use the x64 kernel, so I find it hard to believe they wouldn't include x64 userland libs as well.

    I could be wrong about that, but i just don't see it happening. Every new architecture they have to support just means that much more work for their limited staff - it will be much easier to just combine x32 and x64 architectures together.

    If you are talking about custom building your own distro (on gentoo? or lfs?) then maybe you have a point.

    I believe you are wrong here, I believe typically a full 32-bit system will use ~20% less ram than an equivalent 64-bit system due to libraries and applications being smaller (as in binaries) and using less ram when running (due to pointer size). Also potentially the x32 code could be even smaller than 32-bit code, this is because that even though both 32-bit and x32 has 32-bit pointers, 32-bit still suffers from having very few registers which means it will need to waste more code performing push'ing and pop'ing from stack in order to reuse the registers. x32 also has 32 bit pointers but TWICE the amount of registers which means that it can keep much more data inside the registers and require much less code to do stack push/pop'ing, thus making code smaller.
    And i believe i'm right. Do you have any proof?

    The avg size of an executables instructions is really quite small. Most of it tends to be data - string values encoded in the program, for example. Even pointer-heavy apps are dominated in size by the data they are using, not the pointers themselves.

  10. #20
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by smitty3268 View Post
    I could be wrong about that, but i just don't see it happening. Every new architecture they have to support just means that much more work for their limited staff - it will be much easier to just combine x32 and x64 architectures together.
    Yes, I'm doubtful of this aswell, Ubuntu as the article states is looking into it but that is a long way from fully supporting it, Gentoo is very much build-it-yourself from scratch so I believe they will 'support' x32. I'm not sure what you mean by combining x32 and x64 architectures though, they will use the same kernel but they will need different libraries.

    Quote Originally Posted by smitty3268 View Post
    And i believe i'm right. Do you have any proof?
    As for 32-bit using ~20% less ram than equivalent 64-bit system/code that has been quite verified (I've done it twice myself in the past, both on Windows and Ubuntu), but since it's quick to do in these days of VM's I did a test just now, two identical setups in terms of software, one Arch 32-bit and one Arch 64-bit. After the same base installation I installed X, OpenBox and Conky on both,
    after starting X/Openbox this is what conky reported:

    http://img442.imageshack.us/img442/3500/32bit.png
    http://img232.imageshack.us/img232/8794/64bit.png

    Now for the x32 vs 32-bit code size, no I had no proof as it was just something which seemed logical, more registers = less push:ing and pop:ing = smaller code footprint, anyway thanks to your scepticism I figured I should see if it was true.

    As I'm running a pure 64-bit system and the GCC I'm using (Arch vanilla) wasn't configured with 32,x32 multilib I could compile code as 32-bit and x32 but not build a final binary. That's not so bad though since I can generate assembly output which actually shows us the code. I took meteor.c from Language Shootout as test subject as it didn't need to link in any external functionality (commented out main/printf) and compiled 32-bit and x32 into assembly output using:

    gcc -Os -march=native -fomit-frame-pointer -m32 -S -c meteor.c
    gcc -Os -march=native -fomit-frame-pointer -mx32 -S -c meteor.c

    The resulting x32 assembly output listing turned out to be quite a bit smaller than the 32-bit one (1505 vs 1691 lines respectively) but that could be the result of 32-bit assembly containing more compiler directives rather than actually smaller code so obviously I had to examine the listings. I can't say I did any thorough comparisons on the larger functions but from quickly scanning I couldn't see any occurence where the x32 code was larger but I did see several places where the x32 code was smaller, I picked out some small (and thus easier to examine) examples from the generated assembly:

    Code:
    32-bit:
    boardHasIslands:
    .LFB19:
    	pushl	%edi
    	xorl	%eax, %eax
    	pushl	%esi
    	movb	12(%esp), %dl
    	cmpb	$39, %dl
    	jg	.L237
    	movb	$5, %cl
    	movsbw	%dl, %ax
    	movl	board+4, %edi
    	idivb	%cl
    	movl	board, %esi
    	movsbl	%al, %ecx
    	leal	(%ecx,%ecx,4), %ecx
    	shrdl	%edi, %esi
    	shrl	%cl, %edi
    	testb	$32, %cl
    	cmovne	%edi, %esi
    	andl	$32767, %esi
    	testb	$1, %al
    	je	.L238
    	movl	bad_odd_triple(,%esi,4), %eax
    	jmp	.L237
    .L238:
    	movl	bad_even_triple(,%esi,4), %eax
    .L237:
    	popl	%esi
    	popl	%edi
    	ret
    
    x32:
    boardHasIslands:
    .LFB19:
    	xorl	%eax, %eax
    	cmpb	$39, %dil
    	jg	.L231
    	movb	$5, %dl
    	movsbw	%dil, %ax
    	idivb	%dl
    	movq	board(%rip), %rdx
    	movsbl	%al, %ecx
    	leal	(%rcx,%rcx,4), %ecx
    	shrq	%cl, %rdx
    	andl	$32767, %edx
    	sall	$2, %edx
    	testb	$1, %al
    	movslq	%edx, %rdx
    	je	.L232
    	movl	bad_odd_triple(%rdx), %eax
    	ret
    .L232:
    	movl	bad_even_triple(%rdx), %eax
    .L231:
    	ret
    
    32-bit:
    record_piece:
    .LFB11:
    	pushl	%edi
    	pushl	%esi
    	pushl	%ebx
    	movl	16(%esp), %esi
    	movl	20(%esp), %eax
    	movl	32(%esp), %edx
    	imull	$50, %esi, %ebx
    	imull	$600, %esi, %esi
    	addl	%eax, %ebx
    	imull	$12, %eax, %eax
    	movl	piece_counts(,%ebx,4), %ecx
    	addl	%eax, %esi
    	movl	28(%esp), %eax
    	leal	(%esi,%ecx), %edi
    	movl	%edx, pieces+4(,%edi,8)
    	movl	%eax, pieces(,%edi,8)
    	movl	24(%esp), %eax
    	movb	%al, next_cell(%ecx,%esi)
    	incl	%ecx
    	movl	%ecx, piece_counts(,%ebx,4)
    	popl	%ebx
    	popl	%esi
    	popl	%edi
    	ret
    
    x32:
    record_piece:
    .LFB11:
    	imull	$50, %edi, %eax
    	imull	$600, %edi, %edi
    	addl	%esi, %eax
    	imull	$12, %esi, %esi
    	sall	$2, %eax
    	cltq
    	movl	piece_counts(%rax), %r8d
    	addl	%edi, %esi
    	addl	%r8d, %esi
    	incl	%r8d
    	leal	0(,%rsi,8), %edi
    	movslq	%esi, %rsi
    	movl	%r8d, piece_counts(%rax)
    	movslq	%edi, %rdi
    	movb	%dl, next_cell(%rsi)
    	movq	%rcx, pieces(%rdi)
    	ret
    Now granted, this is not irrefutable proof. I can't swear that the x32 assembly here generates smaller code footprint than 32-bit as I'm only going by the assembly output, but it does seem likely. I also compiled with both -O2 and -O3 and in both cases the resulting x32 assembly was quite a bit smaller than the 32-bit one, I didn't examine those listings though.

    When kernel 3.4 is released and I thus have the possibility to actually run and benchmark x32 code I will recompile GCC with 32,x32 multilib so that I can build and compare proper binaries.

    Quote Originally Posted by smitty3268 View Post
    The avg size of an executables instructions is really quite small. Most of it tends to be data - string values encoded in the program, for example. Even pointer-heavy apps are dominated in size by the data they are using, not the pointers themselves.
    Again the ram usage difference of roughly ~20% between 32-bit and 64-bit equivalent systems is pretty much confirmed. Also code size does matter for performance since the cpu cache isn't infinite.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •