How to find what's going on?
From time to time I get lockups. The screen freezes and I can't do anything. Magic sysrq does nothing. However, everything else keeps working, and I can ssh into the machine. CPU usage doesn't rev up and I see nothing in /var/log/Xorg.log or dmesg. Trying to restart the X server doesn't work. Or rather, it tells me that it restarts but the screen keeps displaying the same thing.
This happens relatively rarely, but I would like to know whether there's something I can do to i) fix it without rebooting; ii) find out something useful that could make for a bug report.
If you can ssh in the machine, it is not a hard lockup, but most likely a lockup of the graphics stack. Sysrq should work and if it does not it is maybe because you are on a laptop whose weird keyboard management makes the sysrq key mismapped or unaccessible (e.g. the key requires some "Fn" special shift, which creates trouble when the alt and command key are pressed).
Originally Posted by yotambien
If the machine is not completely locked up, you can try the following
1) ALT+F1 to switch to a text console (probably will not work) and stop/restart the graphics server from there (e.g. on a Kubuntu machine something like sudo service kdm restart).
2) Press _shortly_ the power key. This after some timeout (e.g. 20-30 s) could make your machine have a clean shutdown.
3) SSh in the machine and stop/restat the graphics server from there
In any case, you should probably report the freeze to the maintainers of your graphics driver, provided you can collect all the relevant information.
can you simply kill X or do you need a kill -9 to end the task?
If X hangs somewhere, it may be helpful to attach a debugger (i.e. gdb) and get a backtrace.
Thanks a lot for the answers.
I now realize that I didn't explain the situation accurately. Thinking about it, I didn't try to restart X, but only the desktop manager. According to the terminal output kdm did stop and restated as requested, but the screen remained frozen as before. Next time I'll make sure I try to kill X and see what happens.
Also, when I mentioned that sysrq didn't work, I meant that I couldn't switch to a virtual terminal after trying to go to raw keyboard input (I think that's the name for alt+sysrq+r). In any case, from the remote machine I could read in the syslog that the keyboard had actually passed to default configuration (or something similar). So sysrq trick commands are accepted, it's just that don't do what I (perhaps naively) expected.
What I found most interesting is the last pointer about "attach a debugger". I didn't know you could do this, for I always used to run gdb first and start an application to trigger a bug. Now I know. Will report whenever this happens again. I estimate 3-7 days : )
Sounds like a GPU lockup to me. AFAIK the only way to get back to normal is to do a GPU reset. I don't think you can do that by restarting X or launching any other command through ssh. So probably only option is to reboot.
There's some automatic GPU lockup detection code in the drm that tries to detect GPU lockup and do auto reset itself. But it doesn't always work and is not implemented on Evergreen.
It would be nice if the drm registered a sysrq handler to do GPU reset, so you can trigger a GPU reset yourself with the keyboard. Because right now GPU lockup is really a PITA on Evergreen.
Don't know if its a similar issue or not, but I experience lockups on r300g (rs480 chip) sometimes. It used to occur more frequently on r300c and r300g does improve the situation. I haven't tried ssh-ing, but SysRq trick doesn't work, and as monraaf suggested, only rebooting works (hence I suspect GPU lockup as well).
Also, as I'm quite new, could any one explain on how to attach a debugger and get the backtrace.
I found this in the Xorg wiki and also something else in the Ubuntu one:
It's not nearly as magical as I thought it would be, and I have the feeling it's pretty hard to get anything out of it in the case of lockups...