PDA

View Full Version : AMD fglrx 8.42.3 leaking gobs of memory in OpenGL apps - any known workaround ?


oyvind
11-14-2007, 08:24 AM
Hi,

Just read this article:
"ATI's New Drivers: Did The Paradise Come?" - http://www.phoronix.com/scan.php?page=article&item=914&num=1

He mentioned the latest fglrx leaking memory, so I tested this myself, and boy was he right. I've got 2GB of memory and don't do much OpenGL, so I hadn't noticed. But now I know I can forget OpenGL altogether with this driver .. unless anyone knows of a work-around ?

The leaking rate is somehow connected to FPS/ticks, and it seems to occur in every OpenGL-app I've tried. If I run fgl_glxgears and size down the window, the FPS increases, and so does the memory leaking rate. This little app now uses over 400MiB RSS, and it has only been spinning for a couple of minutes. Wow..

Guess it's back to 8.40.4 for me, this problem is definitely a showstopper :mad:.

Just for the reference: I've got an ATI X1400 on a Lenovo Thinkpad Z61m.

Syrinn
11-14-2007, 12:59 PM
I'd like to add that I am seeing the same behavior on my system, and it definitely is a show stopper if I am to attempt using anything 3D. The performance is lovely, but with any 3D game eating up all my memory within 5 mins is hardly playable. Anybody aware of a workaround until amd/ati fixes this?

I have heard of another memory leak that was caused by using kernel version 2.6.x and the kernel's own AGP driver, solved by using the built in AGP support in fglrx, but that doesn't work on my machine (fails to start X). I suspect this is different tho- most people these days use PCIe based cards... (grumble grumble need to upgrade...:( )

System:
Athlon XP (32-bit) / nForce 2
ATi X1600 PRO 512M AGP

sok-1
11-14-2007, 01:18 PM
The same issue is on my machine too.

Athlon 64 x2
Ubuntu 7.10 (32bit)
ati 8.42.3 on x1800xt

Thetargos
11-14-2007, 01:31 PM
This seems to be general enough. Running Compiz/Beryl will leak memory too, albeit a bit slower.

Alistair
11-15-2007, 03:13 AM
Holy CRAP!

Yes it leaks -- horribly.

fgl_fglrxgears running for 15 minutes on my box put me into swap. And I have 2g of ram.

/proc/18157 $ more status
Name: fgl_fglxgears
State: R (running)
Tgid: 18157
Pid: 18157
PPid: 18152
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1342 1342 1342 1342
FDSize: 256
Groups: 10 18 27 35 100 1337 1342
VmPeak: 1437420 kB
VmSize: 1161892 kB
VmLck: 0 kB
VmHWM: 1124884 kB
VmRSS: 1124884 kB
VmData: 1126908 kB
VmStk: 84 kB
VmExe: 20 kB
VmLib: 18564 kB
VmPTE: 2820 kB
Threads: 1
SigQ: 0/16383
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
voluntary_ctxt_switches: 8594487
nonvoluntary_ctxt_switches: 99266


That there is 1.4Gb of VM utilization.....


I'm off to read up -- this certainly explains where my memory errors in WoW are coming from --

DirtyHairy
11-15-2007, 06:54 AM
The same for me on a X1300 mobility (Thinkpad T60). At least to my untrained eyes it appears that memory is lost systematically with every buffer swap. For those with plenty of RAM, at least a temporary fix for damage reduction may be enforcing vsync, such limiting the total framrate.

Alistair
11-15-2007, 12:47 PM
okay :

fresh reboot, nice clean system
x up and running kde and fglrx 8.42.3

card= XT1650Pro agp

kernel AGP turned on by default, agp_amd64 on by default (IOMMU loaded by default in 2.6.23)

Gentoo - I'm using the ebuild from the driver bump bug on bugs.gentoo.org.


vmstat 5 100
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 1156852 70340 429808 0 0 17 13 557 263 1 1 98 0
2 0 0 1151104 70340 429808 0 0 0 0 1110 25385 31 22 47 0
1 0 0 1145360 70340 429808 0 0 0 0 1102 24983 32 21 47 0
1 0 0 1139528 70364 429808 0 0 0 5 1109 25164 31 22 47 0
1 0 0 1133700 70364 429808 0 0 0 0 1101 25472 31 21 47 0
3 0 0 1127832 70364 429808 0 0 0 0 1105 25129 31 21 47 0
1 0 0 1121920 70364 429808 0 0 0 0 1108 25404 31 22 47 0
1 0 0 1116216 70364 429808 0 0 0 0 1101 25464 31 22 47 0
1 0 0 1110180 70364 429808 0 0 0 0 1101 25476 31 22 47 0
1 0 0 1104352 70364 429808 0 0 0 3 1103 25486 31 22 47 0
2 0 0 1098608 70364 429808 0 0 0 0 1123 25209 31 23 46 0
1 0 0 1092736 70364 429808 0 0 0 0 1122 25224 30 24 46 0
1 0 0 1087120 70364 429808 0 0 0 0 1124 25213 31 22 47 0
1 0 0 1081080 70364 429808 0 0 0 0 1112 25343 31 22 48 0
1 0 0 1075336 70364 429808 0 0 0 0 1101 25445 31 22 47 0
1 0 0 1069468 70364 429808 0 0 0 0 1110 25403 31 22 47 0
1 0 0 1063640 70364 429808 0 0 0 0 1109 25341 32 21 47 0
1 0 0 1057812 70364 429808 0 0 0 0 1107 25377 31 22 47 0
1 0 0 1051816 70364 429808 0 0 0 0 1105 25368 31 21 48 0
0 0 0 1328040 70364 429808 0 0 0 0 1110 22143 27 20 53 0
0 0 0 1328040 70376 429808 0 0 0 3 1124 921 1 0 99 0
0 0 0 1328024 70376 429808 0 0 0 0 1113 897 0 0 100 0
0 0 0 1328108 70376 429808 0 0 0 0 1102 720 0 0 100 0
0 0 0 1328148 70376 429808 0 0 0 0 1125 932 1 0 99 0



You can see quite clearly where I kill fgl_fglxgears: the context swtiches die off and the memory utilization drops suddenly.


gcc -v
Thread model: posix
gcc version 4.1.1 (Gentoo 4.1.1-r3)

ld -v
GNU ld version 2.16.1


2.6.23-gentoo-r1iommuin #2 SMP Tue Nov 13 21:18:59 EST 2007 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ AuthenticAMD GNU/Linux


file /usr/lib/libGL.*


/usr/lib/libGL.la: libtool library file
/usr/lib/libGL.so: symbolic link to `//usr/lib64/opengl/ati/lib/libGL.so'
/usr/lib/libGL.so.1: symbolic link to `/usr/lib/libGL.so'
/usr/lib/libGL.so.1.2: symbolic link to `libGL.so.1'

alistair@ajftl1 ~ $ file /usr/lib64/opengl/ati/lib/libGL.so
/usr/lib64/opengl/ati/lib/libGL.so: symbolic link to `libGL.so.1.2'

alistair@ajftl1 ~ $ file /usr/lib64/opengl/ati/lib/libGL.so.1.2
/usr/lib64/opengl/ati/lib/libGL.so.1.2: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), stripped

alistair@ajftl1 ~ $ ls -l /usr/lib64/opengl/ati/lib/libGL.so.1.2
-rwxr-xr-x 1 root root 601832 Nov 15 11:44 /usr/lib64/opengl/ati/lib/libGL.so.1.2


glibc version is 2.5

X -version

X Window System Version 1.3.0
Release Date: 19 April 2007
X Protocol Version 11, Revision 0, Release 1.3
Build Operating System: UNKNOWN
Current Operating System: Linux ajftl1 2.6.23-gentoo-r1iommuin #2 SMP Tue Nov 13 21:18:59 EST 2007 x86_64
Build Date: 12 November 2007
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Module Loader present


I'm suspecting that the folks that ARE seeing the issue have some factor above in common -- so we need to list them all so we can find it...

DarkFoss
11-15-2007, 01:20 PM
Alistair,
I'm using Mandriva 2008 x86, kernel 2.6.22.9-desktop-1mdv.
Mandriva's default drivers for my card were the 8.40' rpm.
Using ATI's generic installer I've used both the 8.41 and currently the 8.42 drivers. I had no choice because the script is broken in 2008 so I cannot build my own rpm's..

I have in my xorg Option "UseInternalAGPGART" "no"

gcc -v

Thread model: posix
gcc version 4.2.2 20070909 (prerelease) (4.2.2-0.RC.1mdv2008.0)

ld -v
GNU ld version 2.17.50.0.12 20070128

file /usr/lib/libGL.*
/usr/lib/libGL.so: symbolic link to `/usr/lib/xorg/libGL.so.1.2'
/usr/lib/libGL.so.1: symbolic link to `/usr/lib/xorg/libGL.so.1.2'
/usr/lib/libGL.so.1.2: symbolic link to `libGL.so.1'

X -version

X Window System Version 1.3.0
Release Date: 19 April 2007
X Protocol Version 11, Revision 0, Release 1.3
Build Operating System: Linux_2.6.12-12mdksmp Mandriva
Current Operating System: Linux Tardis-1 2.6.22.9-desktop-1mdv #1 SMP Thu Sep 27 04:07:04 CEST 2007 i686
Build Date: 01 October 2007
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Module Loader present

Snake
11-16-2007, 03:52 AM
... At least to my untrained eyes it appears that memory is lost systematically with every buffer swap ...

Hey, good eyes :)

I simply did a quick test running glxgears under valgrind (via "$ valgrind --tool=memcheck --leak-check=yes glxgears"). After only 4 glxgears "iterations" (the poor FPS are because of running under valgrind, of course) valgrind gave up for more than 10000000 detected errors and told me to "Go fix your program!" :D

The result is posted in the following comment (due to size constraints here). I omitted the startup output, and all those nearly identical "Invalid write of size 1"-Reports but the last one.

[Edit] This is on Lenovo Z61m / 2 GiB RAM / ATI X1400 / Gentoo AMD64 / Kernel 2.6.22 / GLIBC 2.7 / X.Org xserver 1.3 / fglrx 8.42.3

Snake
11-16-2007, 03:54 AM
Valgrind report for comment #9 (http://www.phoronix.com/forums/showpost.php?p=18340&postcount=9)


<-- PREVIOUS OUTPUT CUT -->
==16057== Invalid write of size 1
==16057== at 0x4C2292F: memcpy (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==16057== by 0x6B51441: (within /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6B4FDE8: lnxioCmdBufSubmit(IODrvConnHandleTypeRec*, unsigned, unsigned, unsigned, unsigned, IOSubmitInfoOutRec&) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6A1184F: ioCmdBufSubmit2(void*, IOSubmitInfoInRec const*, IOSubmitInfoOutRec*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x69D1CF5: (within /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x69D17E7: coraSubmitCommandBuffer(gsl::gsCtx*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x69D0588: gsl::gsCtx::Flush() (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x69D0865: gslFlush(gslCommandStreamRec*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6A06C58: gscxFlush(gslCommandStreamRec*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x66AB74B: wpWindowSurface::copySwap(bool) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x66AC0B7: wpWindowSurface::copyToScreen(bool) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x66AC735: wpWindowSurface::swapBuffers() (in /usr/lib64/dri/fglrx_dri.so)
==16057== Address 0x2B49FD3ACB88 is not stack'd, malloc'd or (recently) free'd
176 frames in 5.0 seconds = 35.052 FPS
204 frames in 5.0 seconds = 40.628 FPS
206 frames in 5.0 seconds = 41.055 FPS
202 frames in 5.0 seconds = 40.244 FPS
==16057==
==16057== More than 10000000 total errors detected. I'm not reporting any more.
==16057== Final error counts will be inaccurate. Go fix your program!
==16057== Rerun with --error-limit=no to disable this cutoff. Note
==16057== that errors may occur in your program without prior warning from
==16057== Valgrind, because errors are no longer being displayed.
==16057==
==16057== Warning: set address range perms: large range 134217728 (noaccess)
==16057==
==16057== ERROR SUMMARY: 10000000 errors from 51 contexts (suppressed: 18 from 1)
==16057== malloc/free: in use at exit: 20,084,991 bytes in 3,922 blocks.
==16057== malloc/free: 44,360 allocs, 40,439 frees, 56,168,415 bytes allocated.
==16057== For counts of detected errors, rerun with: -v
==16057== searching for pointers to 3,922 not-freed blocks.
==16057== checked 13,187,712 bytes.
==16057==
==16057==
==16057== 0 bytes in 61 blocks are definitely lost in loss record 1 of 65
==16057== at 0x4C20C6B: malloc (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==16057== by 0x6AEEC08: operator new[](unsigned long) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6255840: gllCL::gllclProgramImpl::DecodeLoopConstants(gllCL ::Section const&, char const*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6255D73: (within /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6254CE8: gllCL::gllclProgramImpl::clExtractElfBinary() (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6254378: gllCL::gllclProgramImpl::ExtractUsageInfo() (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6252B9C: gllCL::scltogllclUsageInfo(sclProgram*, gllCL::gllclProgramImpl*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6253408: gllCL::clCompile(glclStateHandleTypeRec*, gllclCompileParameters const&, gllShaderLanguageEnum, unsigned long, void const*, int, _sourceStrings*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6252703: mbclCompile(glclStateHandleTypeRec*, gllclCompileParameters const&, gllShaderLanguageEnum, unsigned long, void const*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x652934D: gllMB::SurfaceFill::loadProgram(gslProgramTargetEn um, gslProgramObjectRec*&, gslMemObjectRec*&, int*&, unsigned, char const*, gllclCompileParameters const&) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x65A6F6E: gllMB::SurfaceCopy::buildFragmentProgram(gllMB::Su rfaceCopyOperation, unsigned) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x65AA782: gllMB::SurfaceCopy::setupFragmentState(gllMB::Surf aceCopyOperation, unsigned) (in /usr/lib64/dri/fglrx_dri.so)
==16057==
==16057==
==16057== 664 (16 direct, 648 indirect) bytes in 1 blocks are definitely lost in loss record 36 of 65
==16057== at 0x4C20C6B: malloc (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==16057== by 0x300BA263C2: __FireGLDRIGetVisualConfigPrivates (in /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x300BA281D7: (within /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x300BA2787A: (within /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x300BA26634: __glXInitialize (in /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x300BA22FD4: glXChooseVisual (in /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x402AD3: (within /usr/bin/glxgears)
==16057== by 0x53421F3: (below main) (in /lib64/libc-2.7.so)
==16057==
==16057==
==16057== 1,616 bytes in 32 blocks are possibly lost in loss record 44 of 65
==16057== at 0x4C20C6B: malloc (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==16057== by 0x6AEEC08: operator new[](unsigned long) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6A011A3: gsl::FetchProgramObject::SWPathStuff::construct(gs l::gsInput2ResourceTable const&) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6A01686: gsl::FetchProgramObject::pack(gsl::gsCtx*, AtiElfBinary, void*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x69FB587: gslProgramString(gslCommandStreamRec*, gslProgramObjectRec*, gslProgramTargetEnum, gslProgramFormatEnum, unsigned, void const*, void*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6A075B5: gsomProgramStringARB(gslCommandStreamRec*, gslProgramObjectRec*, gslProgramTargetEnum, gslProgramFormatEnum, unsigned, void const*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6529425: gllMB::SurfaceFill::loadFetchProgram(gslProgramObj ectRec*&, unsigned, sclFetchShaderInstr*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6529A47: gllMB::SurfaceLoad::init(glmbStateHandleTypeRec*, gslRenderStateRec*, glclStateHandleTypeRec*, gllMB::SurfaceCopy*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6512F52: cxmbInitCtxState(glAdaptorHandleTypeRec*, glmbStateHandleTypeRec*, glshStateHandleTypeRec*, glclStateHandleTypeRec*, glcxStateHandleTypeRec*, glepStateHandleTypeRec*, gldbStateHandleTypeRec*, glsvStateHandleTypeRec*, gsCtxInfoRec*, _bool32, unsigned) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6231C09: glcxCreateContext(glAdaptorHandleTypeRec*, cmNativeContextHandleRec*, glConfigInfoRec const*, _bool32) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x669F195: wsiContext::wsiContext(glAdaptorHandleTypeRec*, cmNativeContextHandleRec*, RefPtr<wsiConfig>) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6695DAF: wsiDisplay::createContext(cmNativeContextHandleRec *, wsiConfigHandle) (in /usr/lib64/dri/fglrx_dri.so)
==16057==
==16057==
==16057== 1,728 (216 direct, 1,512 indirect) bytes in 1 blocks are definitely lost in loss record 45 of 65
==16057== at 0x4C20C6B: malloc (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==16057== by 0x300BA217F9: _gl_context_modes_create (in /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x6AF12E8: __driCreateNewScreen_20050727 (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x300BA2828F: (within /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x300BA2787A: (within /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x300BA26634: __glXInitialize (in /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x300BA22FD4: glXChooseVisual (in /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x402AD3: (within /usr/bin/glxgears)
==16057== by 0x53421F3: (below main) (in /lib64/libc-2.7.so)
==16057==
==16057==
==16057== 987,648 bytes in 1,286 blocks are definitely lost in loss record 63 of 65
==16057== at 0x4C1FD7C: calloc (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==16057== by 0x300BA48227: XF86DRIGetDeviceInfo (in /usr/lib64/opengl/ati/lib/libGL.so.1.2)
==16057== by 0x6AF2257: DRIGetScreenSize (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6AF5BA8: driGetScreenSize(cmNativeDisplayHandleRec*, unsigned*, unsigned*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6B5061F: lnxioGetWindowInfo(IODrvConnHandleTypeRec*, cmWindowInfoRec*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6A117DD: ioGetWindowInfo(void*, cmWindowInfoRec*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x69D0B51: gslGetWindowInfo(gslCommandStreamRec*, cmWindowInfoRec*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x6A06CC8: gscxGetWindowInfo(gslCommandStreamRec*, cmWindowInfoRec*) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x66C39A0: wpWindowSystem::resizeIfNeeded(bool) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x66B2486: cxwpClear(glwpStateHandleTypeRec*, _bool32) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x621E5DB: epcxClear(glcxStateHandleTypeRec*, unsigned) (in /usr/lib64/dri/fglrx_dri.so)
==16057== by 0x63F1F81: gllEP::ep_tls_Clear(unsigned) (in /usr/lib64/dri/fglrx_dri.so)
==16057==
==16057== LEAK SUMMARY:
==16057== definitely lost: 987,880 bytes in 1,349 blocks.
==16057== indirectly lost: 2,160 bytes in 8 blocks.
==16057== possibly lost: 1,616 bytes in 32 blocks.
==16057== still reachable: 19,093,335 bytes in 2,533 blocks.
==16057== suppressed: 0 bytes in 0 blocks.
==16057== Reachable blocks (those to which a pointer was found) are not shown.
==16057== To see them, rerun with: --leak-check=full --show-reachable=yes

Malikith
11-16-2007, 04:11 AM
Hey, good eyes :)

I simply did a quick test running glxgears under valgrind (via "$ valgrind --tool=memcheck --leak-check=yes glxgears"). After only 4 glxgears "iterations" (the poor FPS are because of running under valgrind, of course) valgrind gave up for more than 10000000 detected errors and told me to "Go fix your program!" :D

The result is posted in the following comment (due to size constraints here). I omitted the startup output, and all those nearly identical "Invalid write of size 1"-Reports but the last one.

[Edit] This is on Lenovo Z61m / 2 GiB RAM / ATI X1400 / Gentoo AMD64 / Kernel 2.6.22 / GLIBC 2.7 / X.Org xserver 1.3 / fglrx 8.42.3

Suddenly, my hope in 8.43 just dropped like a rock.

Snake
11-16-2007, 04:40 AM
Suddenly, my hope in 8.43 just dropped like a rock.

Don't get me wrong, I did not post this result to make anyone look bad, nor to demonstrate how "horrible" 8.42 is and that's about time to give up hope :rolleyes:

It may well be the result of a single forgotten "free()" call, that simply gets multiplied with every frame (and as FPS drastically improved, so did the effect :)).

We're on linux, where a bug is no "embarrassing taboo", but something that happens all the time (me being the exception, of course ;)). This is our operating system, so we are also resposible to improve it by helping tracking down and fixing those bugs. Perhaps AMD/ATI may even integrate a quick valgrind run into their QA procdures in future.

Let's help the devs, not bash them! It's a win-win, after all.

Malikith
11-16-2007, 05:38 AM
Don't get me wrong, I did not post this result to make anyone look bad, nor to demonstrate how "horrible" 8.42 is and that's about time to give up hope :rolleyes:

It may well be the result of a single forgotten "free()" call, that simply gets multiplied with every frame (and as FPS drastically improved, so did the effect :)).

We're on linux, where a bug is no "embarrassing taboo", but something that happens all the time (me being the exception, of course ;)). This is our operating system, so we are also resposible to improve it by helping tracking down and fixing those bugs. Perhaps AMD/ATI may even integrate a quick valgrind run into their QA procdures in future.

Let's help the devs, not bash them! It's a win-win, after all.

Yeah I know, I never said that this doesn't happen. But its just one of many many problems with the driver. Hopefully that memory leak is not too difficult to fix. I guess I just expect a little more from a company that well, should had started working harder on Linux a long time ago.

I can't even begin on how many bugs there are. I've done my share of bug reporting but it seems the list grows faster and faster with every driver release hehe.

oyvind
11-16-2007, 07:43 AM
<snip>

We're on linux, where a bug is no "embarrassing taboo", but something that happens all the time (me being the exception, of course ;)). This is our operating system, so we are also resposible to improve it by helping tracking down and fixing those bugs. Perhaps AMD/ATI may even integrate a quick valgrind run into their QA procdures in future.

Let's help the devs, not bash them! It's a win-win, after all.

I agree with you on this. And I'd certainly be willing to help in any way possible, but the one-way communication eventually gets really old.

I've reported a few bugs on http://ati.cchtml.com/, and find that I'm actually only talking to myself. I've always stated that I'm willing to help in further debugging, in the reports. There is no response, and all the bugs are still marked NEW, even though I've reported them a long time ago. So I stopped going there, because it feels utterly pointless. I just can't take that bugzilla seriously anymore.

I've also reported most of the stuff to AMD's customer feedback on the driver. But I have no idea if my reports are actually being looked at.

Then there's the Phoronix forums, but I don't know if any AMD devs are present and looking around.

Snake
11-16-2007, 09:10 AM
I agree with you on this. And I'd certainly be willing to help in any way possible, but the one-way communication eventually gets really old.

I've reported a few bugs on http://ati.cchtml.com/, and find that I'm actually only talking to myself. <--snip-->

Same for me :mad:

The sad thing is that on this forum alone there are already lots of people that put their energy and knowledge into testing, trying to figure and report what exactly is going wrong and helping others. But without a "living" bug tracker all those efforts remain scattered pieces floating around, being discussed over and over again in variations without anyone being able to really catch all available info for the bug at hand.

I wonder if AMD/ATI could be convinced to actively use http://ati.cchtml.com/ (possibly even indirectly via a "trusted" mediator). They would gain a lot from detailed and focused bug reports, and for us there was a central place for documenting issues and known workarounds. But this requires that no one gets the feeling like throwing everyting into a "black hole of no return" :)

I'm not sure whether this "AMD/ATI public bug tracker"-thing was already "officially" discussed here before. But even if it was, AMD/ATIs attitude towards the community recently changed quite a bit, so perhaps it should be done now, anyway.

Michael, are you listening? Any ideas on how we could improve this situation?

ambro814
11-16-2007, 10:06 AM
I've run glxgears and fgl_fglxgears, but I haven't noticed any memory leaks. I haven't done any extensive checks, but I've been running it for a few minutes and I haven't noticed any change in memory usage in ksysguard.
Thinkpad T60p, FireGL V5200

Alistair
11-16-2007, 12:39 PM
I'm looking at whats happening on my system and I'm noting that the worst memory leak noted by valgrind is actually in libGL ... and apparently it occurs in both the libGL built by X and in libGL built by the ati install, but *only* when using the fglrx driver.

Overall at this moment I'm stumped as to where to go next. I've dumped my ati-drivers, and xorg installation and rebuilt from the ground up once already to see if I could eliminate the issue -- but it was NOT successful -- if anything I've ended up with a faster memory leak.

yoshi314
11-16-2007, 01:18 PM
8.42 - "brown paperbag edition" :D (maybe we should buy some for fglrx devs - they're going to need them badly ;-) )

man, that's one hell of a leak. how did it get unnoticed?

Alistair
11-16-2007, 07:32 PM
Now for the killer laugh of the week.

I ***have not touched my configurations one iota since the previous tests ........


What I DID do was power down the box and put my good old trusty 9550Pro 256M card back in the box, boot, start up , and run both glxgears and fgl_fglrxgears



Either there is *no* memory leak with this card, or the rate at which it leaks memory is ****substantially*** lower.....

Svartalf
11-16-2007, 08:37 PM
8.42 - "brown paperbag edition" :D (maybe we should buy some for fglrx devs - they're going to need them badly ;-) )

man, that's one hell of a leak. how did it get unnoticed?

Because they don't Valgrind or Oprofile things and the QA people probably didn't test against the cards that seem to have the serious leak issue.

Svartalf
11-16-2007, 08:39 PM
Now for the killer laugh of the week.

I ***have not touched my configurations one iota since the previous tests ........


What I DID do was power down the box and put my good old trusty 9550Pro 256M card back in the box, boot, start up , and run both glxgears and fgl_fglrxgears



Either there is *no* memory leak with this card, or the rate at which it leaks memory is ****substantially*** lower.....

Different chipsets, different code pathways within the driver. It's entirely conceivable that this really IS the case here.

Alistair
11-16-2007, 09:02 PM
Different chipsets, different code pathways within the driver. It's entirely conceivable that this really IS the case here.

Certainly I figure that's the case. Now to build one.....
(goes back to collecting the evidence)

TheIcebreaker
11-16-2007, 09:45 PM
i ran fgl_glxgears for 2 hrs straight and still the memory utilization from proc filesystem i got was abt 67 MB constant for the whole period..


X200 chipset (intel) FC4 Xorg 6.8.2

Svartalf
11-16-2007, 11:22 PM
X200 chipset (intel) FC4 Xorg 6.8.2

You've pegged why. That's an R300 derivative chip. It works differently than the any of the R400, R500, and R600 chips. It's using/stressing different code paths in the primitive ops layer when you're doing things.

Svartalf
11-16-2007, 11:31 PM
We're on linux, where a bug is no "embarrassing taboo", but something that happens all the time (me being the exception, of course ;)).

Heh... In this case, unfortunately, this is more of an embarrassing error of not doing proper QA on the part of AMD. Never mistake that I think that they are not some of the brightest developers and coders there are in the OpenGL space (they are some of the best...but they're very, very few...and I suspect that they're more Windows developers than Linux ones and haven't a clue about things like Valgrind, Oprofile, or even VTune on Linux. (Emphasis added to give 'em a hint as to what to go use- they DO read this forum, make NO mistakes on that score... :D))

I blame their employer, formerly known as ATI for not even remotely taking this seriously enough when they were ATI. I blame their employer, now known as AMD, for not taking this seriously enough when they took the company over. I can only hope they realize that what we're being handed would pretty much nuke them from orbit in the Windows world and it's about to do that in what might be one of their only future markets.

happycampers
11-16-2007, 11:32 PM
A few interesting (to me, at least) commonalities based on my own informal testing and the posts of other members:

The memory leak seems to be directly proportional to frame rates; running glxgears, I observed that increasing the frame rate (by shrinking or hiding the window) proportionally increased the memory consumption, while decreasing frame rate (by expanding the window or moving it around on the screen) decreases both frame rate and the rate of memory loss.

The size and complexity of the frame being drawn seems to have no impact on the rate of memory loss (other than slowing it down by reducing frame rate); fgl_glxgears runs slower and loses memory slower than glxgears; running Doom3 (on my slow X1400M, at least), the leak is almost not noticeable.

The trend of posts I have read seems to indicate the older or lower-end cards do not suffer from the leak or that the leak is so much reduced as to be unnoticeable.

This seems to indicate that the leak is tied to code that runs a relatively fixed number of times per frame swap / redraw. It also makes me wonder if the leak is tied to a portion of the driver code used only by cards supporting a more recent / advanced feature. The fact that the biggest leak seems to originate from somewhere in XF86DRIGetDeviceInfo (according to valgrind) might bear this out...

Ironically, it seems that the more advanced / expensive the card, the faster the frame rate and (consequently) the faster the memory leak.

DarkFoss
11-17-2007, 01:03 AM
Here's the leak summary of my valgrind run..The definitely lost line is much smaller than snakes.
Guess my x800pro/420 chipset isn't showing much of a hit.
I wonder if I ran x86_64 if it would be more pronounced.

=16992== LEAK SUMMARY:
==16992== definitely lost: 216 bytes in 63 blocks.
==16992== indirectly lost: 2,104 bytes in 8 blocks.
==16992== possibly lost: 1,488 bytes in 32 blocks.
==16992== still reachable: 18,362,334 bytes in 2,930 blocks.
==16992== suppressed: 0 bytes in 0 blocks.
==16992== Reachable blocks (those to which a pointer was found) are not shown.
==16992== To see them, rerun with: --leak-check=full --show-reachable=yes

korpenkraxar
11-17-2007, 12:54 PM
glxgears memory usage as reported by 'ps' on my Thinkpad Z61m with an ATI X1400 Mobility:

Time (s) --- RAM: VSZ / RSS (KB)
=====================================

0 --- 21744 / 9952
5 --- 53368 / 43896
10 --- 67496 / 58884
15 --- 81752 / 73188
20 --- 95744 / 87248
25 --- 109736 / 101232
30 --- 123728 / 115240
35 --- 137720 / 129240
40 --- 151712 / 143256
45 --- 165704 / 157248
50 --- 179696 / 171252
55 --- 193688 / 185212
60 --- 207812 / 199408

Hmm... so after the fast bump in memory usage during the initial 5 seconds, glxgears grabs 14 about megs of memory per 5 seconds. After one minute, glxgears has grabbed 200 megs. That is the single most disastrous memory leak I have ever seen. Nice.

As someone mentioned before, this is presumably a single simple bug being iterated over and over. It shouldn't be to difficult for ATI/AMD to fix this.

yoshi314
11-17-2007, 01:28 PM
Because they don't Valgrind or Oprofile things and the QA people probably didn't test against the cards that seem to have the serious leak issue.at times like this i usually think to myself "what the hell is that betatester program for?"

this really should have come out in beta tests. unless we're expected to be betatesters now.

on the other hand i guess that's what the beta warning in release notes page is for.

korpenkraxar
11-17-2007, 03:50 PM
at times like this i usually think to myself "what the hell is that betatester program for?"

this really should have come out in beta tests. unless we're expected to be betatesters now.

Agreed. I could even put up with us users being beta-testers if only I had the feeling our discoveries and reports made some sort of difference.

Osado
11-20-2007, 09:46 AM
The same issue for me...:(

Testing Mandriva packages. Started ksysguard and then glxgears. The memory use increased and, about 1 minute after, the memory extra was about 250 MB and growing up. Closing glxgears freed the memory instantly.

ATI Mobile X1400 on a Dell inspiron 6400.

Regards
Osado

Extreme Coder
11-20-2007, 11:39 AM
I can't see the issue here on my ATI Radeon X1200, using the Gnome System Monitor.

I run fgl_glxgears, and it keeps increasing, and then stops at 55.6 MB.

Svartalf
11-20-2007, 02:49 PM
I can't see the issue here on my ATI Radeon X1200, using the Gnome System Monitor.

I run fgl_glxgears, and it keeps increasing, and then stops at 55.6 MB.

I think it's an R500 part problem from the overall impressions I've been gathering in this thread- the X1400 is a mobile version of the Rv515. You, in spite of the X1200 moniker, have an R400 derivative which operates slightly differently than the rest of the X1xxx series parts do. I really, really detest their insistence on munging up the product space with things like the X1200 being actually an R400 part and so forth. It makes for fun trying to sort out these problems and it doesn't really help with their marketing efforts because the bulk of the people are buying based on advice from people like us, which figure out real damn quick what a card is and isn't.

I'm about to set up a machine with a few differing distributions on it- and an X800 pro and an X1300 card. I'll see if it's distribution specific, device class specific, or can't reproduce with the cards I've got.

korpenkraxar
11-20-2007, 06:12 PM
I'm about to set up a machine with a few differing distributions on it- and an X800 pro and an X1300 card. I'll see if it's distribution specific, device class specific, or can't reproduce with the cards I've got.

That is interesting. I've been thinking along the lines of even trying different screen resolutions and sizes of the glxgears window, just for the heck of it. Can't be worse than ATi's bug-testing procedures anyway.

goffrie
11-20-2007, 08:41 PM
Happens for me, too - it locks X up (*Lock keys, Ctrl-Alt-F1, Ctrl-Alt-Backspace, etc. don't work), but Magic SysRq key (and, presumably, SSH) works. fgl_fglxgears locked up my system in under 10 seconds, after noticing that it took up a few hundred MB.

x86_84, kamikaze-sources-2.6.22-r9 on Gentoo. I have an X1900GT.

korpenkraxar
12-07-2007, 10:58 AM
Well, I finally got around to play with the size of the glxgears window to see if it had an effect on the rate of memory leak.

I have an ATi X1400 running on a Z61m Thinkpad. I use Frugalware, which still uses 8.42.3 (but as far as I've understood, the memory leak was not fixed in 7.11 anyway).

The native resolution for the laptop screen is 1680x1050 and I toggled the glxgears window between its default size of 300x300, some intermediary resolutions and 1680x1050, expecting it to either eat memory even quicker than before when I upped window size or alternatively at the same rate if the glxgears load remained constant only resulting in fewer fps. As you all know maximizing the size of that window results in lower frames per second count since more pixels needs to be computed by the graphics card. :rolleyes:

This is what I got (please note I just simply monitored output of ps aux every 5 seconds for a minute [after I discarded the 5 "burnin" seconds] so figures are approximate):

Window size --- FPS --- Leaked RSS

300x300 --- 3740 --- ~ 14MB/5s
600x600 --- 1310 --- ~ 5MB/5s
800x800 --- 760 --- ~ 3MB/5s
1000x1000 --- 500 --- ~ 2MB/5s
1680x1050 --- 300 --- ~ 1MB/5s

Well... this shows two important things:

1) I do not know much about graphics driver development and infrastructure since my expectations were completely wrong. :)

2) The bug is indeed related to FPS turnover rate (which in turn of course is inversely related to window size). The bug appears to cause a small memory leak when each frame is updated. Actually, this is easily demonstrated if we look at the FPS count and memory leak over 5 seconds (more detailed figures):

Leak / Frame

300x300: 14289KB / ( 3740FPS * 5s ) = 0.76KB/frame
600x600: 4989KB / ( 1308FPS * 5s ) = 0.76KB/frame
800x800: 2890KB / ( 758FPS * 5s ) = 0.76KB/frame
1000x1000: 1940KB / ( 499FPS * 5s ) = 0.78KB/frame
1680x1050: 1129KB / ( 296 * 5s ) = 0.76KB/frame

The memory leak caused when each frame is rendered is not dependent on frame size but constant.

If we assume that this particular leak would occur in the same way when a frame is rendered in any OpenGL app (that is, we visit the bug the same number of times per frame as we do in glxgears) and I have 1GB of RAM available for Doom3 after it has already started up and an average frame rate of 20 fps during gameplay, I would then be able to play the game:

1048576KB / (0.76KB/frame * 20fps) =
19 hours and 10 minutes

before starting to worry about a crash. Or even longer if I crank up the resolution a bit :p

Oh well.

Happy holidays everyone and take care :)

=======
UPDATE:
=======

Seriously shouldn't it be possible to actually use glxgears to pinpoint where and when the memory leak occurs using valgrind or some other tool so that we can file a real bug report to ATi?

happycampers
12-07-2007, 11:28 AM
Well... this shows two important things:

1) I do not know much about graphics driver development and infrastructure since my expectations were completely wrong. :)



Not many of us do...


Seriously shouldn't it be possible to actually use glxgears to pinpoint where and when the memory leak occurs using valgrind or some other tool so that we can file a real bug report to ATi?

A few people have done this and it appears that the culprit is a leak in the implementation of glclear(), which would make sense, in terms of the leak rate being directly proportional to framerate.

korpenkraxar
12-07-2007, 11:38 AM
A few people have done this and it appears that the culprit is a leak in the implementation of glclear(), which would make sense, in terms of the leak rate being directly proportional to framerate.

Yup. Some quick googling returns this: http://ohioloco.ubuntuforums.org/showthread.php?t=588383&page=23