Results 1 to 4 of 4

Thread: help with TGSI part 2(Marek ;) ), i know im annoying but im getting real close

  1. #1
    Join Date
    Jun 2009
    Posts
    1,095

    Default help with TGSI part 2(Marek ;) ), i know im annoying but im getting real close

    ok i advanced a big step in the shader code but i have a little doubt marek.

    my idct code is basically 3 matrixes muliplication, which at the end of the day is only sums and mult of floats values packed in a vec4(like sse __m128), so to reduce processing time i precomputed the all dct coefficients(i assumed that would be very expensive in the gpu since require a lot of div and cos) and loaded in 16 vec4 types, so np so far (this give me some nice speedup in the C-SSE version too btw). now in glsl i can load that data as 16 rgba data packs and call the same name in all the shaders, aka when i map those 16 rgba pacs as a texture i can keep them in the gpu memory for as long as i need(i've readed in a tutorial, i've done some code but i can't say for sure if the data is staying always in memory, in case im wrong XD ), so i figured that doing the same on tgsi i can save myself for a truckload of download to gpu memory and load the coeff data only once and use them as long as macroblocks keep coming from the video huffman parser, so the question is:

    1.) if i name the same const variable (uniform) in all the shader, the compiler assume that and keep using that texture as long as i need without look to ram again?
    2.) i should upload first a shader with all the coeff data and the process in a separate shader calling the same var names i used in the first shader that declared the coeff table?
    3.) i should use ARL(or MOV not sure what it means in tgsi compared to x85 asm) and load the data in the char matrix for that shader? btw how exactly work ARL?
    4.) im absoletely wrong and i should drop dead somewhere? XD

    btw this is correct?

    static const char shader1_asm[] =
    "FRAG\n"
    "DCL OUT[0], POSITION\n"
    "DCL OUT2[0], POSITION\n"
    "DCL TEMP[0]\n"
    "DCL CONST[0..3]\n"
    "0: DP4 TEMP[0].x, IN[0], CONST[0]\n"
    "1: DP4 TEMP[0].y, IN[0], CONST[1]\n"
    "2: DP4 TEMP[0].z, IN[0], CONST[2]\n"
    "3: DP4 TEMP[0].w, IN[0], CONST[3]\n"

    this is correct?
    "4: DP4 OUT2[0].x, OUT[0], CONST[0]\n"
    "5: DP4 OUT2[0].y, OUT[0], CONST[1]\n"
    "6: DP4 OUT2[0].z, OUT[0], CONST[2]\n"
    "7: DP4 OUT2[0].w, OUT[0], CONST[3]\n";

    i know it misses some variables and stuff, i just mean the way process the data first calling each component of out since DP4 says dst and not dst.x, etc and second if using out is valid

    thx a lot for your help and sorry for bother you this much XD

  2. #2
    Join Date
    Jun 2009
    Posts
    1,095

    Default

    ok i made a booboo writng the code plase replace mentally TEMP[] for OUT[], my bad

  3. #3
    Join Date
    Jun 2009
    Posts
    1,095

    Default

    and in the last example i missed to ask, since i need the result of the first 2 matrix multiplication to multiply it for third i can use OUT[] itself or i should MOV to create a copy of the result for OUT2[]?

    damn edit time lol

  4. #4
    Join Date
    Jan 2009
    Posts
    609

    Default

    I don't follow these forums often. Feel free to drop in on #dri-devel @ irc.freenode and ask there (the channel is dedicated to developers only).

    Quote Originally Posted by jrch2k8 View Post
    ok i advanced a big step in the shader code but i have a little doubt marek.

    my idct code is basically 3 matrixes muliplication, which at the end of the day is only sums and mult of floats values packed in a vec4(like sse __m128), so to reduce processing time i precomputed the all dct coefficients(i assumed that would be very expensive in the gpu since require a lot of div and cos) and loaded in 16 vec4 types, so np so far (this give me some nice speedup in the C-SSE version too btw). now in glsl i can load that data as 16 rgba data packs and call the same name in all the shaders, aka when i map those 16 rgba pacs as a texture i can keep them in the gpu memory for as long as i need(i've readed in a tutorial, i've done some code but i can't say for sure if the data is staying always in memory, in case im wrong XD ), so i figured that doing the same on tgsi i can save myself for a truckload of download to gpu memory and load the coeff data only once and use them as long as macroblocks keep coming from the video huffman parser, so the question is:

    1.) if i name the same const variable (uniform) in all the shader, the compiler assume that and keep using that texture as long as i need without look to ram again?
    I don't understand the question.

    Quote Originally Posted by jrch2k8 View Post
    2.) i should upload first a shader with all the coeff data and the process in a separate shader calling the same var names i used in the first shader that declared the coeff table?
    I don't understand the question. Think about Gallium like it was OpenGL and use the same techniques and approach like you would in OpenGL. You should really know how to implement your algorithm using OpenGL before going Gallium. If you can't do it with the former, you can't do it with the latter as well because they are conceptually the same thing.

    Quote Originally Posted by jrch2k8 View Post
    3.) i should use ARL(or MOV not sure what it means in tgsi compared to x85 asm) and load the data in the char matrix for that shader? btw how exactly work ARL?
    ARL is used for non-constant indexing. You first load an index using ARL to an address register and then use the register for indexing, example:

    VERT
    DCL IN[0]
    DCL OUT[0], POSITION
    DCL CONST[0..7]
    DCL ADDR[0]
    0: ARL ADDR[0].x, IN[0].x
    1: MOV OUT[0], CONST[ADDR[0].x+0]
    2: END

    Quote Originally Posted by jrch2k8 View Post
    4.) im absoletely wrong and i should drop dead somewhere? XD

    btw this is correct?

    static const char shader1_asm[] =
    "FRAG\n"
    "DCL OUT[0], POSITION\n"
    "DCL OUT2[0], POSITION\n"
    "DCL TEMP[0]\n"
    "DCL CONST[0..3]\n"
    "0: DP4 TEMP[0].x, IN[0], CONST[0]\n"
    "1: DP4 TEMP[0].y, IN[0], CONST[1]\n"
    "2: DP4 TEMP[0].z, IN[0], CONST[2]\n"
    "3: DP4 TEMP[0].w, IN[0], CONST[3]\n"

    this is correct?
    "4: DP4 OUT2[0].x, OUT[0], CONST[0]\n"
    "5: DP4 OUT2[0].y, OUT[0], CONST[1]\n"
    "6: DP4 OUT2[0].z, OUT[0], CONST[2]\n"
    "7: DP4 OUT2[0].w, OUT[0], CONST[3]\n";

    i know it misses some variables and stuff, i just mean the way process the data first calling each component of out since DP4 says dst and not dst.x, etc and second if using out is valid
    1) Use OUT[1] instead of OUT2[0].
    2) Two outputs cannot have the POSITION semantics for the same reason you cannot have two position outputs in GLSL (there's only one gl_Position). Consider using either COLOR[0] or GENERIC[0] for the second output. Also consider using a fragment shader instead if you play to do any kind of image processing.

    Quote Originally Posted by jrch2k8 View Post
    and in the last example i missed to ask, since i need the result of the first 2 matrix multiplication to multiply it for third i can use OUT[] itself or i should MOV to create a copy of the result for OUT2[]?
    I don't understand the question.

    If you use a Gallium driver, set the environment variable ST_DEBUG=tgsi and run any game or 3D application. It will print source code of all shaders in TGSI to stderr.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •