OpenMCU: Real story

xiaoxiao2021-03-06  68

(For the first translation article, some of which I don't know what the author wants to say, I gave the original text, and did I think the correct translation, please correct it.)

I sincerely hope that this new year is better than the past year, I think it should be the case, because it is not possible to become worse.

Recently, there are some discussions about OpenMCU on the mailing list (OpenH323 mailing list, translator's note), so I want to tell the dragon to the dragon in this program now. I hope to help people understand the reason why this program is existing, so that it can better assess whether it meets their needs.

The current OpenH323 has achieved global success (, yes), and in the "Head" days, I have had a lot of time to think and write code to verify a variety of interesting ideas. . During that time, I remember to wake up in the middle of the night, and the brain is all how the MULTI Conference Unit needs. In the next weekend, I wrote the code almost crazy to implement it, OpenMCU was born.

OpenMCU's mixing algorithm is very simple, it is based on two ideas. The first key idea is that all speech channels are converted to a PCM format so they can make mixing through a simple algebraic operation algorithm. Now I know that some voice data such as G.723.1 or G.729 does not need to be converted to the PCM format, you can directly mix (I have seen some powerful people to map the title and watermark of the MPEG video format by direct DCT coefficient) The method is added to the video stream in real time, so it should be as simple as the speech mix of the audio compression domain, but this has exceeded the range of things I want to implement at the time.

The second key idea is that in a meeting, there is a communication input on a connection (Connection), which will be copied in a queue that is prepared separately, and the voice output on the connection is that connection. Enter the algebra of the voice queue and. This is more complicated, in fact, this is very simple. There are N connections in a meeting, each connected with N-1 queues for saving input voice data for other connections. When there is a connection x with voice data arrival, it copies him to the head of the connection queue from 1 to N (except connection X). When a voice data is sent to the time, it creates this package by mixing the correct number of data at the end of this connection.

In order to imitate similar mixers in the conference system, I did some tests and the effect was good. Multi-circular voice streams can participate at the same time (I use a pre-recorded message and speech to mix some pre-recorded messages from the CD) and each stream can be clearly clear - just like a conference call.

This also proves that this method has several additional benefits. The connection in the meeting is not synchronized, which has dropped a lot of complexity. As long as the voice data arrives at the correct speed, everything will be fine. He also removed the problem of local echo, because a connected input speech package does not include its own output voicepag. Finally, he also allows the mixing algorithm to change freely because each channel has a full copy of the Source Signal. This is a good feature, which means that I can try a variety of different mix algorithms without rewrite the code of the large segment.

When I integrate all of these H.323-based architecture, I fully focused on the implementation of speech mixed algorithms, and the use efficiency of mutexes and connectivity is completely unscruted - this is a "two" Realization of point one line. Later, I will find this code. It can be done by saving the pointer to the connection object in each connection instead of each via the connection mark (Connection token). It can do better.

Obviously, this method also has some basic issues. The base noise level will increase with each connection, especially after more than 4 connections, the superposition of multiple background noise has produced a relatively large "" sound. I have tried the amplitude of the adjustment channel proportional to the number of channels, but the result is a problem that the volume of each channel is reduced, especially when more than 5 connections, leads to a problem that heard the sound. Finally, the use of mute suppression solves most of the noise problems. It is also very obvious that the algorithm complexity and memory consumption in a large number of connections are also obvious because the algorithm has an complexity of O (n ^ 2). Use a simple mixer (Mixer) for the entire meeting will result in an O (N) algorithm, but will retrore local echo re-introduce local echo, I think the best way is to use Many partial sums, which may make the code more complex, but may be a algorithm between complexity between the two. I have been trying to try it out.

Another serious problem is that the code does not use the hardware timer (Hardware Timer). This means that the voice quality will drop sharply as the CPU usage arrives, because the time information of the output voice data will be chaotic (IS LOST).

Anyway, I used to play this code and tried a variety of mixing algorithms and released in the name of OpenMCU, and then very fast, forget it.

A period of time, Derek Smithies adds video mixing and modifying the speech mixing algorithm to provide "Directed" voice. This algorithm provides priority to the maximum voice data in the channel, making it a unique sound in all connections, and the video can also switch with the optional voice source to show who is talking. For Derek, this looks good, but I have to say honestly, I don't like this idea. I have always hoped that this program can sound like the sound of the conference call, which means that many people can talk at the same time. If necessary, the control can be the most voice, not "I can only hear the most voice. Loudest gets all)

This is all true stories, OpenMCU has never been designed as a "product", it is just an idea that is achieved, there are many things need to be improved, these code can become more efficient and more robust - there Many people have reported that it is unstable on the SMP machine, I suspect that there are some obvious "RACE CONDition". It also adds some support for standard H.323 MCU commands. For example, allowing each room member information, a list of available rooms, and the management of participants, and so on.

I have not planned to do what's above recently, but if you need an MCU and willing to pay, please contact me.

The original text is as follows:

I Sincelely Hope This New Year is Better Than The Last One, But Then, I Guess It Has To Be As It Could Scarcely Be Worse.

There has been a few discussions regarding OpenMCU on the lists recently, so I though it might be a good time to retell the story of how that particular program came into existence. Hopefully this will help people understand the raison d'etre behind this particular piece of sofware and may assist them in better evaluating whether it suits their needs.Back in the heady days before OpenH323 was the international success it is now (yeah right), I had plenty of time to think about, and to write code for, all sorts of interesting ideas. During that time, I remember waking up in the middle of the night with the idea of ​​how to implement audio mixing as required by an MCU (Multi Conference Unit) fully formed in in my head. The following weekend I implemented that idea in an orgy of coding that went on for two days, and openMCU was born.The audio mixing algorithm for openMCU is very simple and is based on two concepts. The first key idea is that all of the the audio paths need to be Converted T o / from PCM so they can be mixed using simple algebraic operations. While I know it is possible to mix audio data like G.723.1 or G.729 without first converting them to PCM (I've seen some really kinky stuff done with MPEG video streams whereby titles and watermarks are added to video streams in real-time by direct manipulation of the DCT co-efficents, so audio mixing in the compressed domain has got to be a walk in the park), that is way beyond the scope of What I intended to Implement At the Time.

The second key idea is that the incoming audio for each connection is copied into a seperate queue for each of the other connections in a conference as it arrives, and the outgoing audio for each connection is created from the algebraic sum of the the incoming queues for . that connection that sounds complex, but it is really simple: in a conference with n connections, each connection has n-1 queues that contain a copy of the inomcing audio from each of the other connections When an incoming packet of audio arrives on. connection x, it is copied into the front of the correct queue for every connection 1 through connection n, but not connection x. When it is time to send a packet of audio on a connection x, it is created by mixing together the correct amount . of the data off the back of each of the queues for that connection.The intent was to emulate how the analog mixers in telephone conference systems work I did some tests and it worked well - multiple audio streams could be present at The Same Time (I Tested by Mixing a Pre-Recorded Message and Audio from A CD) And Each Stream Could Be He Heard Distinctly Seperate - Just Like A Telephone Conference Call.

It also turned out that this approach has several additional nice properties. The connections associated with a conference are unsynchronised which removes a lot of complexity. As long as the audio data arrives at the correct rate then everything just works. It also removed the problem of local echo, as the incoming audio for each connection does not contain a copy of it's own outgoing signal. And finally, it allowed the mixing algorithm to be changed at will because each channel has a complete copy of all of the source signals. This was a nice feature as it meant I could experiment with lots of different mixing methods without rewriting great chunks of code.When I combined all of this into a H.323-based infrastructure, I was completely focussed on the implementation of the audio mixing algorithm. Issues of effect - IT WAS A "Straight Line Between Two Points" Implementation. Looking at the code afterwards, it's pretty ob .

It became quickly apparent that the approach had several fundamental problems. The base noise level was raised with each additional connection, as multiple background noise signals were added together thus creating a loud "hiss" with more tha about 4 connections. I experimented with scaling the channel amplitude in proportion to the numner of channels, but all that did was decrease the volume of each channel resulting in an inaudible blur with more than about 5 connections! in the end, the use of silence supression fixed most of the noise problem.The issue of algorithmic complexity and storage at large numbers of conenctions was fairly apparent, as the algortihm being used is O (n ^ 2). Using a single mixer for the entire conference would result in a O (n) algorithm, but would also also Re-Introduce Local Echo. I Think The Best Solution for this Is To Use a Number of Partial Sums Which Would Make The Code More Complicated, But Wouldze Item Give An Algorithmic Complexity Somewhere Between I Never got around.

Another serious problem is the code operates completely without use of hardware timers. This means that audio quality will drop dramatically when the CPU reaches saturation, as the timing of the outgoing audio is lost.

In Any Case, Once I Had The Code Working and i Had Played with some Mixing Algorithms I Published The code as it stood the name Openmcu and promptly forgot it.

Some time afterwards, Derek Smithies added video mixing and changed the audio mixing to provide "directed" audio. This algorithm gives preference to the loudest incoming audio signal, which becomes the only audio heard by all connection, and the code also switches the video to come from the selected audio source as well to give an indication of who is speaking. that seemed to work OK for Derek, but I have to be honest and say that I never really liked that idea. I always wanted something that sounded like a telephone conference call, and that means the ability for multiple people to talk all at once, right over the top of each other if need be, rather than "loudest gets all" .That's really all there is to it. openMCU was never really intended to be a "product" - it was just an idea that was put into code and there is a lot of scope for improvement The code could be made much more efficient and robust -. there have been many people who have reported instabilty on SMP machines, SO I Suspe ct there are a few race conditions outstanding. It would also be good to add support for the standard H.323 MCU commands that allow retreival of members of each room, lists of available rooms and to do full attendee management.

I've no plan to do any of this myself in the Near Future, Butiff to pay the page please contact.

转载请注明原文地址:https://www.9cbs.com/read-113509.html

New Post(0)