This whole thing is kind of useless. VP8 on the internet is low-bitrate enough as to not need acceleration. H.264 would be much more important to have on top of Gallium.
This is why I don't fear mobile devices taking over the PC or game console world. (Maybe the mobile game console world is in trouble, sure, but then only because Nintendo and Sony both are utter morons and either screw over every third-party developer they can or just design whacked out expensive hardware with gimmicky features but only 1/30th the power of my two year old phone.)
So he would prefer ending up with a less usable but functional (VP8) decoder, rather than a more useful (H.264) but non-functional/incomplete decoder.
Implementing a full H.264 decoder in software took him six months last time he tried (1), and this time he has to learn about shader-based optimizations as well, so it would seem to be a bit too much work for one GSoC.
Also, he first started out his thread on the mailing list by proposing a generic implementation of various processes involved in video decoding, so that an arbitrary codec could just hook into these and accelerate decoding this way. But I'm not sure if he's left that idea as well:
(2)The project would be to write a state tracker wich expose some of the
most shaders-friendly decoding operations (like motion compensation,
idct, intra-predictions, deblocking filter and maybe vlc decoding)
through a common API like VDPAU or VA-API.
These APIs can be used to decode mpeg2, mpeg 4 asp/avc, vc1 and
others, but at first I intend to focus on the h264 decoding to save
time, because I know it better and it is currently widely in use, but
again the goal of the project is to be generic.
In any case, if he successfully makes this VDPAU VP8 decoder state tracker, adding support for H.264, VC-1 etc. later will be much easier than it is now.
Can anyone answer exactly how these optimizations are coded in a state tracker?
I mean, I think I get how the state tracker itself functions, on a conceptual level, at least.
But in the state tracker code, in the part of the code that deals with the actual decoding of a video stream, how, concretely, is a certain part of this decoding code - let's say iDCT - written, to allow it to be executed on a GPU (in parallel)?
Would the process of starting to write useful code for this kind of thing be something like reading a couple of papers on parallelizing the iDCT algorithm, and then write the actual paralllelization-code in TGSI? Or does one use a higher level language like GLSL? Are there any currently working code examples in mesa that I can take a look at to get a better understanding of it?
More generally perhaps, is the point of access to the shaders of a graphics card in a Gallium state tracker always TGSI? If this is the case, wouldn't something as relatively easy as iDCT be fairly complicated to implement in TGSI? I mean, I've seen the C code, and the assembly code, that implements iDCT. Wouldn't the TGSI-code look a lot like the assembly (CPU) code except with some form a parallelization-enabled instructions?
On a third note, does anyone know where the "TGSI specification" pdf on this site has gone? I'd really like to take a look at it (even though I probably wouldn't understand much of it ). But it seems like it's the only documentation that I can find.
Then the hardware drivers take the TGSI as input and output commands that the actual hardware works with.
I suspect that the video decoding code will be written directly in TGSI within the state tracker, but I suppose it's possible to do it in something like GLSL and then compile it down to TGSI. I'm not sure how difficult that would be to implement, but it's probably more efficient to just code in TGSI directly.
1. Add the state tracker but do everything in C code, test until it works
2. Pick a function like idtc and write a separate test app with a shader implementing it
3. Test that shader until it works well, then use Mesa to record the TGSI code it generates
4. Move that generated code into the state tracker, test
5. Either move on to the next function to optimize, or work on optimizing the generated TGSI code directly in the state tracker. Repeat as needed.
I'm not sure what you mean by "write a separate test app with a shader implementing it" though. Why would we write a separate (test) application to implement a sub-feature of a state tracker? Or do you mean just writing an application that can be used to test whichever decoding routine we choose to optimize using shaders?
Also, in step 3: are we not writing this shader in TGSI ourselves? If so, why would we use mesa to record "the TGSI code it generates"?