Announcement

Collapse
No announcement yet.

Opus 1.5 Audio Codec Able To Make Extensive Use Of Machine Learning

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Opus 1.5 Audio Codec Able To Make Extensive Use Of Machine Learning

    Phoronix: Opus 1.5 Audio Codec Able To Make Extensive Use Of Machine Learning

    Xiph.Org's Opus open-source audio format for lossy audio coding has rolled out Opus 1.5 as a big update that is now making greater use of machine learning...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I thought this was an interesting development until I read this line and then some trepidation creeped in:

    "...this is the first time it has used deep learning techniques to process or generate the signals themselves."
    The processing part is fine with me, but having a ML-backed ability to generate sounds seems creepy, amost like some of the "AI trash" that some companies are trying to foist on us all ... and only getting government inquiries for their trouble and public mea culpas by their CEO.

    Any sort of AI technology seems to be reaching the point where only fools rush in while the cautious remain just that.

    Comment


    • #3
      Originally posted by NotMine999 View Post
      I thought this was an interesting development until I read this line and then some trepidation creeped in:

      The processing part is fine with me, but having a ML-backed ability to generate sounds seems creepy, amost like some of the "AI trash" that some companies are trying to foist on us all ... and only getting government inquiries for their trouble and public mea culpas by their CEO.

      Any sort of AI technology seems to be reaching the point where only fools rush in while the cautious remain just that.
      The "generate" isn't meant in the same way as the current generative AI. It's either continuing a phoneme like existing traditional PLC, or it's generating speech from coded information like a normal codec. In neither case should you expect any hallucinations, which is the whole point. In this context, ML is used to significantly augment DSP techniques.

      Comment


      • #4
        Originally posted by NotMine999 View Post
        I thought this was an interesting development until I read this line and then some trepidation creeped in:



        The processing part is fine with me, but having a ML-backed ability to generate sounds seems creepy, amost like some of the "AI trash" that some companies are trying to foist on us all ... and only getting government inquiries for their trouble and public mea culpas by their CEO.

        Any sort of AI technology seems to be reaching the point where only fools rush in while the cautious remain just that.
        Given the whole "Instead of designing a new ML-based codec from scratch, we prefer to improve Opus in a fully-compatible way. That is an important design goal for ML in Opus. Not only does that ensure Opus keeps working on older/slower devices, but it also provides an easy upgrade path." part, it sounds like they're just using machine learning to tune the compression on the fly... like slapping the ML buzzword on "We replaced zlib's implementation of Deflate with p7zip's for better compression ratios".

        (i.e. Using ML as a smarter way to build a huffman tree or coding dictionary or what have you.)

        I don't see why it would be categorically different when it comes to using something like a PSNR test harness to fine-tune your implementation's ability to accurately reproduce what was fed into it.

        Comment


        • #5
          the best audio codec gets better
          i have been encoding my music on this for years now, transparent 96k is neat af

          Comment


          • #6
            Originally posted by ssokolow View Post
            (i.e. Using ML as a smarter way to build a huffman tree or coding dictionary or what have you.)

            I don't see why it would be categorically different when it comes to using something like a PSNR test harness to fine-tune your implementation's ability to accurately reproduce what was fed into it.
            Opus is lossy compression, so I can imagine the choice of model can actually have an audible effect. That said, I expect the neural network to only be a small part of the algorithm, so that deviations are limited.

            Comment


            • #7
              Originally posted by NotMine999 View Post
              The processing part is fine with me, but having a ML-backed ability to generate sounds seems creepy, amost like some of the "AI trash" that some companies are trying to foist on us all ... and only getting government inquiries for their trouble and public mea culpas by their CEO.
              The rough idea is that any lossy codec breaks down the input signal, figures out what to represent in a simpler way and packs it up. To my knowledge, Opus uses two methods depending on the input signal, Linear Predictive Coding (LPC) and Discrete Cosine Transform (DCT), so it's a set of "speech-like" pulses or a set of cosine waves respectively. So the encoder is analysing the input, trying to generate the best representation with those tools. This is where the AI presumably steps in, deconstructing the input to create the optimal input. A ML-system with a dataset that's based on audio is quite handy at this sort of task.

              Then the decoder is receiving an input which is a description of how to reconstruct the signal, sometimes called a "resynthesizer". The "AI" is not trying to mimic someones speech here.

              For good times, try a stem separator tool sometimes, those are fed music and they separate the instruments (voice, drums, bass, guitar etc.) to different audio tracks. That thing works by training a system with a dataset of separate and mixed sounds, then it's given a new input and it tries the same.

              Can I recommend Ultimate Vocal Remover (it can do more than just separate the voice)?
              GUI for a Vocal Remover that uses Deep Neural Networks. - Anjok07/ultimatevocalremovergui

              Comment


              • #8
                What about RISC-V vector extension support?

                Comment


                • #9
                  Originally posted by Anux
                  What's so creepy about it? We are operating in lossy audio world and you can hear clear improvements, that's what AI should really be about not having bl**k w*men in Naz1 uniform.​
                  I tried to decrypt the ****'s


                  Comment


                  • #10
                    Originally posted by skeevy420 View Post
                    I tried to decrypt the ****'s
                    Look up https://www.theverge.com/2024/2/21/2...ate-historical if you want to know what I tried to say. Since I don't know what words exactly are the reason for the censorship, I had no other choice than to guess.

                    Comment

                    Working...
                    X