# KLANG: A New Linux Audio System For The Kernel

Show 40 post(s) from this thread on one page
Page 7 of 21 First ... 5678917 ... Last
• 08-01-2012, 06:23 AM
unknown2
Quote:

Originally Posted by datenwolf
Well, technically it is and the RAID checksumming code actually makes use of the FPU registers, which are what the fuzz is all about.

A oftenly found misconception is that floating point equals precision. What floating point gives you is a fixed number of significant (binary) digits, the mantissa, and a value scale defined by the exponent. In a floating point number you never get more precision than the mantissa can hold. The mantissa is usually normalized, which means it's value is interpreted as being in the range 0…1 and the most significant bit shifted to the upper boundary. In addition you have a sign bit which gives you an effective mantissa value range of -1…1 quantized by the integer range given by the mantissa bits. As long as the effective value of a floating point number stays within the range -1…1 the scaling exponent is <=0 and what you actually have is the equivalent of an integer of the same number of bits as the mantissa has (divided by the integer value range).

Of course with small absolute values, and hence negative exponent the number resolution increases. But you don't benefit from this, because you're dealing with a quantized signal and eventually the signal goes into a DAC which can, by principle not resolve less than one bit. DACs are inherently fixed point, that's the nature of their circurity.

So why use floating point at all then? Because say if the absolute value is larger than what would fit into the integer range? Then a integer would either wrap or saturate (depending on operations used). A floating point number however uses a exponent >0 then. But this means that now you must drop bits of your mantissa. If the exponent is 1, then your mantissa resolution has been halved, if the exponent is 2 it's only 1/4 of the mantissa resolution and so on.

In audio applications float is used, because this allows you to mix several signals without fearing of saturating or wrapping them. However this means a loss in precision. Also it allows to apply a wide range of gain without thinking too much about it.

But let's look at this from an engineering standpoint. An IEEE754 32 bit floating point has 23+1 bits of mantissa. That's the precision you get. Add to this 8 bits exponent, allowing to expand the value range up to 2^127, which equates to some 20*log10(2^127)=770 dB of additional dynamic range (only usable if the each signal was a square wave). You'll never find this situation in audio.

Remeber why you need that additional headroom? Because when you mix together several channels they may saturate. By how much? Well lets assume every channel makes use of its full scale, then the total value range required is that of n channels. But every bit we add, doubles the value range.

Lets say we use a 32 bit integer value, but our audio signal is normalized to 24 bits. Then we have 2^8 = 256 times the value range we can work with. Which means we can mix up to 256 channels of 24 bit audio without saturating and without loss of precision, because we don't subquantize our original value range. Eventually we'll have to attenuate this, but until then we don't loose information. And now think about it: You'll hardly find 256 audio sources playing at the same time. So we can in practice mix even more stuff, without problems.

And similar applies for actually doing calculations on the stuff. What kind of calculations are these? Well actually only one, namely resampling, which is implemented as a fast finite impulse response filter. Wait what, how can a filte resample you ask? well I leave that as an exercise for the reader to think about this. Just a hint: If you reduce the sampling frequency, what kind of cutoff must you apply beforehand to avoid high frequency aliasing?

This is probably the number one key lesson told in every numerics lecture given (if not, it's a bad lecture): Using floating point makes only sense if your numbers vary over a large range and are not of similar scale. Especially never subtract floating points of similar absolute value. If the numbers are all of similar value, or in a narrow range and precision is pinnacle never use floating point. Use fixed point then. If you are a pure computer science student you actually might not have learnt this, because computer science doesn't deal with such practical problems ;) . But if you're in engineering or physics, then you must know how to work with digital numbers.

So what does this mean for KLANG? KLANG can be compiled with either 40 bits or 48 bits per sample processing, with a footroom of 8 bits (this is for reversible attenuation of up to 48dB, which is plenty). For the typical desktop audio user the 40 bit configuration is already total overkill, as for his el-cheapo 16 bit audio hardware this gives a headroom of 16 bits and due to the footroom of 8 bits, calculations can be done with a precision of 24 bits. Even with 24 bit audio samples the precision is sufficient for most tasks. If desired 48 bit processing can be enabled, but this is more snake-oil than actually audible. But: Since KLANG is also to be used to pass around audio between applications that additional precision might be required.

The theory seems ok for mixing sound, but can it be applied to resampling? if upsample from 44Khz to 48 khz, is it feasible in fixed-point?:confused:
• 08-01-2012, 08:05 AM
PaulDavis
Quote:

Originally Posted by datenwolf
So what does this mean for KLANG? KLANG can be compiled with either 40 bits or 48 bits per sample processing, with a footroom of 8 bits (this is for reversible attenuation of up to 48dB, which is plenty).

Apparently you don't follow any of the trends in digital audio. Just as almost everybody else (except Avid/Digidesign) has adopted floating point as THE standard format for computer-based digital audio, you want to revert Linux to fixed point. This is supposed to be smart?

You're going to explain to all the developers of every pro-audio and music creation software who might consider Linux as a platform for their work that their samples will be converted twice, once from the floating point format that their code uses into fixed point and then again into integer format before it hits the audio hardware? And that when they share audio data between applications, which will both be using floating point format, it will be converted to and then from fixed point as an intermediate?

Could you at least spend several months hanging out with audio developers before you try to redesign the kernel subsystems that we rely on?
• 08-01-2012, 11:30 AM
datenwolf
Quote:

Originally Posted by Khudsa
Is this post on Steam: Steam Linux thread, a new audio project, from a developer of your group?

This is indeed the case.
• 08-01-2012, 11:36 AM
bug!
Looking at it from an end-user perspective, what will change?

I use ALSA + PulsaAudio and I have never experienced any latency on native programs...
Heck, PulseAudio even got my "Windows-only" headphone (USB one, which without "special" software, doesn't even work on Windows) working via. software mixing, and it also got the microphone and the Dolby feature working flawlessly!

Linux Audio isn't as bad as everyone says it is, at least not for me, but then again, the World doesn't revolve around me.

I have never written any Linux-specific software which requires audio, therefore it would be nice if someone could give their insight about writing software for our current sound system on Linux.
• 08-01-2012, 12:04 PM
PaulDavis
Quote:

Originally Posted by datenwolf
You'll hardly find 256 audio sources playing at the same time. So we can in practice mix even more stuff, without problems.

BTW, you might want to check out: http://www.3daudioinc.com/3db/showth...00-bands-of-EQ Notice that the URL alone refers to the 384 inputs on this puppy.

I don't want to advocate that every system API needs to be designed to handle the most extreme possible use cases, but seriously - 200+ channels is really fairly typical in a major movie post-production environment. If we were discussing PulseAudio, this kind of thing would just be "not part of the problem space". But you want to replace the entire kernel side system, not just the consumer/desktop audio server.

You'll note that the above-referenced mixing console runs Linux, though full disclosure requires me to note that it does not use ALSA.
• 08-01-2012, 09:37 PM
MickStep
Shame
I guess you embarrassed datenwolf to the point he will not respond.
That's the end of KLANG then!
• 08-02-2012, 12:36 AM
Thatguy
Quote:

Originally Posted by PaulDavis
BTW, you might want to check out: http://www.3daudioinc.com/3db/showth...00-bands-of-EQ Notice that the URL alone refers to the 384 inputs on this puppy.

I don't want to advocate that every system API needs to be designed to handle the most extreme possible use cases, but seriously - 200+ channels is really fairly typical in a major movie post-production environment. If we were discussing PulseAudio, this kind of thing would just be "not part of the problem space". But you want to replace the entire kernel side system, not just the consumer/desktop audio server.

You'll note that the above-referenced mixing console runs Linux, though full disclosure requires me to note that it does not use ALSA.

While I think its absolutely fabulous your mixing, with hardware, that being controlled by software.

Whats your real soft roundtrip latency if you use all software monitoring, cuase frankly., 8 samples smells like bullshit to me.and I've been around pro audio since the 486 was a twinkle in Intels eye.
• 08-02-2012, 01:16 AM
ninez
Quote:

Originally Posted by Thatguy
While I think its absolutely fabulous your mixing, with hardware, that being controlled by software.

Whats your real soft roundtrip latency if you use all software monitoring, cuase frankly., 8 samples smells like bullshit to me.and I've been around pro audio since the 486 was a twinkle in Intels eye.

It's not bullshit. While you may think that it is impressive that you have been around proaudio since the 486 was a twinkle in Intel's eye(which i find funny..lol), i find it far more interesting that Harrison Consoles has been an industry leader (for both Analog/digital Consoles) for decades. Maybe you should have a look at some of their products or whitepapers. here is a link to their website;

http://www.harrisonconsoles.com/joom...tpage&Itemid=1

the product in question using 8 samples, which you think is BS, is very likely xdubber, found here;

http://www.harrisonconsoles.com/joom...d=23&Itemid=57

You should also take a look at the link Paul provided (scroll halfway down the page), so you can see the physical hardware and how much 'muscle' their is. It makes your proaudio setup look like you are still running a 486 (and i don't even need to know what your hardware is to draw that conclusion).

http://www.3daudioinc.com/3db/showth...00-bands-of-EQ

if you google "harrison consoles 8 samples" - you should turn up some whitepapers on the subject as well.
• 08-02-2012, 06:27 AM
renox
A remark on inter-process communication
A remark on inter-process communication, you said:

Quote:

Originally Posted by PaulDavis
You cannot be serious. Its hard to take anyone seriously who would claim such a thing. You think that two processes both touching an mmap'ed region in user space causes some kind of kernel hell? this is just ridiculous - you make it appear that you don't know how shared memory works at all! the address spaces have the region mapped. when each process touches part of the region NOTHING HAPPENS - its a memory access.

His point was indeed incorrect but shared memory in itself is not enough: for several processes to cooperate there must use IPCs to synchronize these access, and these IPC have the "two context switches" latency, so I'm not sure that putting a bigger part of the audio stack in the kernel couldnot reduce the average latency..

Also about the push/pull design, IMHO (I'm not an audio programmer) both are important and having low latency push (with low CPU usage) is important too: I'm thinking about a game which wants to have fast audio feedback when a player press a button, so I'm not sure that a push buffer over a pull mechanism is "good enough" for this use case.

Best regards.
• 08-02-2012, 07:33 AM
devius
Quote:

Originally Posted by bug!
I use ALSA + PulsaAudio and I have never experienced any latency on native programs...

Of course not. PulseAudio and ALSA work just fine for the typical consumer workload. Even if you had an audio latency of 150ms or more you would probably only notice it if you were playing a very intense FPS, where there could be a very slight delay in the sounds coming from the game compared to the image you were seeing. Even then it probably wouldn't make much of a difference to you. This all changes when you need to (e.g.) synchronize a playback and a recording track in a DAW, or apply real-time effects to audio on a live concert. You can't have a delay of 100ms between two tracks (unless it's intencional of course) and you can't send the audio mix back to the musicians (on a live concert) with such a delay. It has to be as close as possible to real-time.
Show 40 post(s) from this thread on one page
Page 7 of 21 First ... 5678917 ... Last