Let's split video acceleration into two parts -- "decode" acceleration and "render" acceleration, as SloggerKhan says.
Decode includes tasks like IDCT and Motion Compensation (MC). It converts a compressed bitstream (MPEG2, H.264, VC1 etc..) to uncompressed video data, but the output of decode is (a) in a format your card can not display correctly (b) the size of the encoded video stream, not the size of your screen/window, and (c) typically interlaced whereas your display is typically progressive scan.
Render finishes the video processing task, performing colour space conversion, scaling, de-interlacing and various kinds of post processing to improve image quality. Both decode and render can be done in software or can be GPU-assisted.
The Xv API only handles the render portion of the pipe. Depending on the chip, most of the work can be done in dedicated hardware via a powerful overlay block or can largely be done using the 3D engine. Textured Video is the most common way of performing render acceleration using a 3D engine -- load the output of decode into textures, then use the 3D engine to paint textured triangles on the screen which forces the texture engines to handle scaling and colour space conversion.
The XvMC API performs some or all of the decode portion of video acceleration. It typically outputs to an Xv port, using Xv to perform the render acceleration. Decode cares about which video format you use (MPEG2 etc...), Render does not care about which format was used to encode (compress) the video data, only about the format of the *uncompressed* data (YV12 etc.).
XvMC is designed for MPEG2 acceleration and can not directly handle newer formats like H.264 -- you either need non-standard XvMC extensions or a new API to handle H.264.
Today we implement Xv but not XvMC or VAAPI, so we provide render acceleration but not decode acceleration.
I've read the nvidia drivers also dropped decode acceleration for mpeg 1/2 recently (or maybe a few versions ago). I think render acceleration is critical, but for mpeg 1/2 basically a 10 year old cpu would do just fine with room to spare. For MPEG4 based codecs (xvid/divx/h.264), especially h.264, it's a different matter.. for the low-bitrate, medium quality setting encodes, cpu decoding is still fine IMO.
It's the high-bitrate, uber-setting h.264 that's the killer and would probably help a great deal from ATI's 2xxx/3xxx gpus. But this also requires the cooperation of the playback software (ffmpeg/libavcodec). In addition I believe this is mostly moot point until Linux gets HDCP support since almost all of these kinds of encodings are from BluRay. OTA HDTV is no where near as intensive (most of which are mpeg2 i believe?) and neither is hi-def online streaming (e.g. h.264 in a flash container).
Well, yeah I take that back. I do agree that it's still useful to have offloading. Always better to have more cpu cycles to spare. I also have non-BD 1080p content.. but I guess I was trying to rationalize the delay in supporting decode acceleration from a business perspective (for both NV and ATI) and the general public