Reposted - A Summary of Modern Audio Technology in the Game

xiaoxiao2021-03-05 66

From: http://blog.9cbs.net/9cbs_gamedev/Archive/2005/04/04/336216.aspx

3D Sound VS Surround Sound

In the development of game development, the status of the sound is not as important as the image. Game developers will fees most of the time to increase the new features and effects of 3D graphics; however, you want to persuade them to spend more time and money, to develop a game with high quality audio effects can be said to be very difficult. . At the same time, in terms of hardware, players are more happy to buy the latest 3D image acceleration cards, and don't seem to have a cold on new sound cards.

However, as the development of the display card is showing the state of the peak, the player also shows more and more picky attitudes to the game. It is indispensable to excellent games in addition to the image of the appointment and the beautiful special effects. So now the situation seems to have a sharp trend - users and developers are designed in the audio system than ever. In modern game development plans, sound effects account for 40% budget, time and manpower.

The developers of sound chip manufacturers and 3D sound effects tried to make users and applications developers to believe that good 3D sound will be the main components of modern multimedia computers.

The previous sound is three-dimensional, this is a very vague statement; after 3D Sound, we have entered the new era of multi-channel audio effects: 4.1, 5.1, and 7.1 channels.

Now let us approach the 3D sound, see it and the difference and difference between multi-channel solutions.

Figure 1: 3D concept

The concept of 3D sound is to accurately position the sound source of the 3D space around the audience. In the virtual game world, each object that can emit sound represents 1 sound source.

We use the typical first person shooting game "ViviSector: Beast Inside" (living anatomy: Human Beast) to explain the problems in this article. There are audiences and sound sources in the image, some of which are stereo (such as background music; in this special game, the sands of wind and jungle will be the main environment (noise) sound); there are 8 monsters Source; player's shooting, footsteps, etc. as a 16 sound source; there are 3 environmental sources (insects, birds, etc.).

In order to obtain a more realistic sound in the scene, the 3D sound effect of the virtual world is depressed: simulation or exaggerating the sound of the real world, where a variety of audio processing techniques are used, such as reverberation, reflection, Blocking, obstructing, distant sounds (the distance between the source and the audience) ..... and so on.

3D audio technology: positioning

Everyone can perceive the sound of different (this depends on the shape, age and psychological state of the ear), so it is impossible to only one quality option in 1 3D technology regarding different sound cards or processing effects. Whether the sound can be truly reproduced, it mainly depends on the sound card and speaker, and there is a sound processing engine in the game.

Figure 2: 3D Space

Now let's take a look at how the 3D sound is generated, we must first start with 2D Panning (positioning) (current technology is still used in the ID Software Doom). In this technique, each mono sound source is operated as a stereo, and their left--right volume level can be tonard to each other. Thus, although there is no vertical positioning in the system, it can also change the effect of the sound (for example, the high frequency filtering), so when the sound is sent from the listener, he can hear the suppression sound.

Now the hardware has been able to achieve this effect. The sound card can use the HRTF (head-related transmission function) technology to simulate the location of the source in two speakers or headphones; simulate the hearing of humans by filtration or other conversion. HRTF (Head Related Transfer Function) - Use two ears to determine the functions transmitted in the spatial location. During the passing process, our head and the body actually become an obstacle to change the sound. Our ear hits the back of the sound source, can perceive the sound signal change; then the sound signal will enter our brain, and decoded To determine the correct position in the space.

Figure 3: HRTF (head-related transmission function)

In the figure above, you can see 3 HRTFs from the left ear to the right ear: the sound source positioning, 135 degrees, and 36 degrees. All the processing of these data is basically consistent, and the usual practice is to record these data in a special method under special headset. Sensaura, under smooth law (for example, at 2500 Hz peak, and intervals under 5000 Hz low valley) uses artificial synthesis HRTF, while other companies usually use average HRTF.

The upper HRTF system consists of two FIR filters, and HRTF is their transmission function. Since HRTF is intelligent, the huge HRTF in which we store capacity seems to be wasted because the positioning of the real source can be achieved by HRTF interpolation.

Gradually fallen HRTF

The process that occurs severe distortion process is very slow. If the source is fixed, then their location will not be accurately positioned because the human brain needs to move the source (the movement of the source or the movement of the source of the audience) The source is accurately positioned in geometric space.

People suddenly turned their head to the sound source, this is something often; and in the moment of the head turned over, the sound of the sound can be known in the spatial position. Between the HRTF functions of the front and rear, if the sound source does not have a special frequency, then the mind ignores such a sound; in contrast, it will compare such data with the data in memory, and position the source in the space.

4. Headphones can get the most ideal audio effect. Headphones can solve the problem of moving the sound from 1 ear from one ear. However, most people don't like the headphones very much, even if it is a wireless model.

In addition, after the player puts the headset, the sound will sound closer, and this problem is still to be resolved.

Figure 4: Best listening position and interference

The development of sound student can avoid these problems that appear in headphones, but new problems appear: First, don't clear how to use speakers to generate sounds. For example, after the HRTF is transmitted, how to make a part of the sound signal between the two ears? When we use a speaker instead of a headset, two ears will get the same sound, here to solve this problem is to disturb interference (CC)). In

The optimal audio location can ideally listening to all 3D audio effects, and distortion occurs in other regional sounds. In this way, we need to choose the correct position when listening to the sound.

For a pair of speakers, there is a balance, vocal cord, detail, and the best listening position, called Sweet Spot. It is important to listen to the monitor when recording and making it. Sweet Spot is usually located in the middle of a pair of stereo speakers, where the front is a feet. Many experts think that from above the top of the herole, the tip of the listener constitutes an illusory equidistan triangle, which is the Sweet Spot. Because of many objective conditions, this location may have some offset, such as the reflection of the nozzle panel, the difference in speakers will also affect Sweet Spot, some speakers have a widerest position. Accurate actual location is usually determined by continuous audition and adjustment. The wider the range of Sweet Spot, the better the effect, which is why developers are trying to find ways to extend Sweet SPOT coverage. Figure 5: Configuration of multi-tank

In the multi-tank system (4.1, 5.1), the sound spreads from the speakers around the audience; the sound is passed from different speaker systems, and the audience can locate the sound source.

In the rule, it is sufficient to use PANNING, i.e. All speakers play several streams in synchronously (depending on the speaker), but they are in different volume levels - therefore produced. For example, Dolby Digital is configured to use 6 and 8 audio streams in 5.1 and 7.1.

SENSAURA MULTIDRIVE, CREATIVE CMSS (Creative Innovation Multi-speaker Surround) technology, can use 4 or more speakers to reproduce the sound of the HRTF function.

SENSAURA MULTIDRIVE3D sound technology must basically have at least 4 channels of speakers to show the positioning of the 3D sound effect, and each speaker outputs the sound content of the sounds is different. Creative multi-speaker surround (CMSS) technology can process any mono or stereo source to a 360 degree sound effect.

Every part of the speaker has two hemispheres before and after. Since the sound field is based on the HRTF function, Sweet Spot allows the audience to be the best perception of the audience of each side and the sound source positioning of the front and rear axes. As the coverage is widened, Sweet Spot's space will become large enough.

There is no crosstalk Cancellation (CC)), and the positioning of the source is not possible. Since HRTF is mainly used for more than 4 speakers in Multidrive technology, it is necessary to apply CC algorithms on all 4 speakers, but this requires a very powerful computing capacity of audio processing chips.

After using HRTF, the rear speaker can also obtain precise positioning as the front speaker. The front speaker is usually placed near the display, and the heavy-duty unit can be placed on the center of the floor, while the post-speaker can be placed anywhere in the audience, but I believe that no one will put it behind it.

To remember that HRTF and CC need very powerful computing power when using 4 speaker systems, so manufacturers have come to a lot of response. For example, proudness (Aureal has been innovated) uses a PanNing algorithm on the rear speaker because the positioning of the speaker is not so strict.

NVIDIA uses Dolby Digital 5.1 on the 3D audio. When positioning, the entire audio flow will decode the AC-3 format, followed by delivery to the external decoder (for example, home theater) in a digital format.

Minimum / Large Distance, Air Effect, Macro FX (MIN / MAX DISTANCE, AIR EFFECTS, MACRO FX)

Figure 6: Distance mode

One of the main features of the sound engine is its distance effect, the farther the distance of the source, the sound is more quiet. Among them, the easiest way to reduce the volume level during a distance; when the sound begins to fade out, the designer of the sound effect must be assigned a minimum distance. When the sound is within the distance, it only changes the orientation; whenever he crosss 1 meter distance, the intensity of the sound will be reduced by 6 dB. Before the farthest distance, the sound will have been weakened, and in the last sound will hear because the distance is too far. When the sound is close to 1 volume level, the engine turns off the sound to release the resource. The farther the maximum distance, the longer the passage of the sound. In most cases, the volume level is a logarithmic correlation. Designers can identify larger sounds and quiet sounds, and the source can also be distinguished into minimal and maximum distances. For example, the sound of mosquitoes can't hear it outside 50cm, and the sound of the aircraft engine is still clear outside of a few kilometers.

A3D EAX HF ROLLOFF

The A3D API opens the distance of DirectSound 3D through the modular high frequency attenuation - the same as the real world, the high-frequency part will be absorbed by the corresponding rule - approximately 0.05 dB per meter (the frequency of the selected: the default is 5000 Hz). But in the foggy weather, empty because the gas will be thicker, the high frequency attenuation will be more accelerated. EAX3 allows processing of low-order modular air effects: two reference frequencies - low frequency and high frequencies, their effects are based on the parameters of the environment.

Macrofx

Most of the HRTF measurements are performed in the far field, which can simplify calculations, but if the source is within 1 meter (in the nearby area), HRTF will not be able to fully work. At this time, MACROFX appeared, Macrofx technology was used to reproduce the sound from the proximity area. The MACROFX algorithm applies to the sound close to the area, and the sound is positioned as very close to the audience, as if the sound is transmitted from the speaker to the audience, and even penetrate him / her ear. The effect is based on the precise modularity of all spatial sound wave transmission around the audience, and the data transmission uses high efficiency algorithm.

The algorithm is integrated into the Sensaura engine and under DirectSound3D manipulation, I.E. It is transparent to the developer of the application and can use it to develop a large number of new effects.

For example, in a flight simulation program, as a pilot's listener can hear a conversation of the air traffic controller, just like he wears a headset.

Doppler, large sound source effect (zoom fx), more audience (Doppler, Volumetric Sound Sources (Zoom FX), Multiple Listener

Doppler Effect: The effective propagation distance between the source and the observation point in the transmission system will change with time, and the observed wave frequency has changed. Racing or flight games will be able to benefit from Doppler Effect, while in shooting, it can be used in noisy, laser, or plasma shooting, I.e. Any mobile very fast goal.

Large sound source effect

VoluMetric Sources effects allow designers to create large sound sources, you can think about it: a person is running, or a small weapon is a very small sound source; but if it is a group Cheering people, a huge generator, or a frequent high-speed road that travels, and the sound they have emitted is a wide range. A larger and synthetic sound source can achieve more realistic effects compared to the best source.

The best source can be applied well to a large but in distant objects, such as moving cars. In real life, when the car is close, the position of the audience will not be the best source position. However, the DS3D mode is considered to be the best source, and the picture is not so realistic (I.E. It looks like a small train is close to the huge train). Aureal is first applied to a large source in its A3D API 3.0; then Sensaura has joined support for large audiences in its Zoomfx. ZoomFX technology defines several sound sources as a large object (assuming that the synthesized of the train can be composed of wheels, engines, coupled cars, etc.).

Figure 8: More listeners

Multiple Listeners is a new technique for supporting two or more players for the game console (PlayStation 2, Xbox, Gamecube). For example, in the PS2 game "GT Racing 3" (Polyphony Digital Inc.) in the TV controller, two players are different areas in different computers and games; therefore, they must only hear around Nearby voices. Undoubtedly, they can hear the sound of each other, but this technology simplifies the implementation process. Unfortunately, there is currently no hardware API support more listeners. This technology is only used in commercial sound API - FMOD. Wait a minute, we will explain its details.

3D audio technology: Sound wave tracking VS winding (WaveTracing VS REVERB)

Figure 9: A variety of sound effect technologies

In 1997-1998, each chip manufacturer increased its efforts to develop them considered future audio technology. Aureal, the leader of the industry, putting the bet on the extremely real game, which uses the technique "WaveTracing). Creative believes that the use of winding pre-calculations will be more effective, so it develops EAX. Creative acquired EnsoniQ / EMU in 1997: Companies specializing in developing and manufacturing sound chips - this is why it has a wound technology at the time. When Sensaura appeared in the market, it used EAX as a foundation, and the technique named EnvironmentFX version was actually: Multidrive, Zoomfx and Macrofx. NVIDIA is the latest manufacturers in this field - it has implemented unique real Dolby Digital 5.1 decoding for 3D sound positioning.

Sonic tracking (WaveTraacing)

Figure 10: Sound path / sound wave tracking

In order to completely integrate the sound effect into the game, you must calculate the sound environment and the interaction of the sound source. With the spread of the sound, the sound wave and the environment have an interference role. Sound waves can be transferred to the audience in several different paths:

Direct Path 1ST Deminal Reflection (1st Order Reflection) 2nd order or advanced reflection (2nd Order or Late Reflection) Closure (OCCLusion)

Aureal's Sonic Tracking Algorithm By analyzing the geometric description of 3D space, then determines the method of transmitting in real-time mode, then they will be reflected, or pass through a 3D environment.

Geometry engine is a very unique mechanism in the A3D interface program, which can modular reflections and cross obstacles. It processes data from geometric levels: lines, triangles and quadramograms (sound geometry).

The audio polygon has its own location, size, shape, and properties of the material. Its shape position is closely related to the sound source, and the audience can feel that each independent sound is reflected, crossing or surrounding the polygon. The properties of the material can determine the sound sound of the transmission is absorbed or reflected throughout.

The database of the image geometry can pass through the converter, convert all graphics multilateral translation into an audio polygon when the game level is loaded. Global reflection or closed values can be modified by setting parameters. In addition, it can also process the multilateral conversion algorithm in the advanced mode, and the audio geometric database is stored in the form of a separate card file, and then the exchange is exchanged when the game is loaded. Finally, the sound can get a more formal effect: mixed 3D sound, through the acoustic design room and environment, the sound signal can be accurately reproduced in the ear of the audience. The environmental model realized by Aureal is not ideal, even if Creative latest version of EAX is also true.

Regardless of how to "Sonic Tracking" technology is very limited, the hardware flow used to calculate the reflection is very limited. That's why I have a real voice effect, there is still a long way to go. For example, it is currently not to tell the process of graphical sound on late reflection. In addition, the acoustic tracking technology is not agile; and huge resource expenses are required when implementing. This is also why you can't make it unexpected to the texture rendering of EAX technology. 3D graphics are currently not used to realize real-time rendering based on ray tracking methods.

Now let us study the closed effect. In principle, it can be implemented by toning the volume, but more practical implementation is to use low-pass (low-pass) filtration.

Figure 11: Closed

In most cases, one type of closure (OCCLusion) is sufficient - the source is positioned behind an obstacle that is invisible. Direct paths are blocked, and the filtered degrees should be based on geometric parameters (thickness) and wall manufacturing materials. Since there is no direct contact between the source and the audience, the echo of the source is suppressed according to the same principles.

Figure 12: Obstacle

Creative API developers use a more feasible concept that means that the obstacles that are directly connected - and the listeners have no direct contact, but the source and the audience are in the same room; then, the reflection will be in the same form Transfer in the ear of the audience.

Figure 12: Reverse

The most used is exclusive. The source and the audience are in different rooms, but they have direct contact, direct sounds can be transmitted to the audience, but the reflected sound will distortion (depending on the thickness, shape and attributes of the material).

In short, regardless of the true true (using Aureal A3D, Creative Labs EAX or manually selecting your own Music), you must track geometric (complete or partial sound) to find out whether it is directly in contact with the source. This has a great relationship on performance, which is why it is necessary to build the simplest geometric space for sound in most cases (in order to get a more realistic effect, especially shooting, 3D RPG, or other similar games). Fortunately, this type of geometry is usually processed to find collisions - in order to track the entire path in the player's room. That's why we can use the same geometry to show more sound details.

Environmental gradient (Environments morphing)

Figure 13: Environmental gradient

Another solution for Creative Lab is EAX3 released in 2001. This is an algorithm for the gradual conversion parameters of an environment to another environment. The above picture demonstrates the implementation of the two effects.

The first conversion: the REVERB parameter gradually changes according to the absolute different parameters of the player (in this case, outdoor space and space in the house). As the player is more adjacent to outdoor, the outdoor reconial parameters can work more efficient and vice versa. The next type is the limit change: When the player crosss the border = 1 area, the parameters will automatically change.

The environmental gradient is the most important function associated with echo. However, there is a problem when modifying the parameters already set in advance. Even if it is not used to gradually transition, you can use these functions to form a certain average environment by setting the gradient factor equal to 0.5 (for example, we are in the outdoor stone corridor) so that we can get the average effect of different sound fields. Before the environmental gradient is developed, the effect of the game (such as game "carnivore 2" / carnivores 2) is not capable of gradually gradually using different parameters (they have been pre-set in EAX1 and EAX2). The intermediate environment has 25 pre-set variables. For example, there is a set of cavities into the valley to the valley; and the stone corridor is selected as the intermediate parameters during listening. Now there is a gradient change, you can avoid a lot of complex processing work.

Interface programs and APIs (Interfaces and APIs)

Figure 14: Various popular API technology

Let us now discuss the application of API programming in the audio engine. Not available options: Windows MultiMedia, Direct Sound, OpenAl, Aureal A3D.

Unfortunately, the drive of Aureal A3D is still a bug (BUG), in the current most popular Windows 2000 and XP operating system, the stability of its efficiency is still very poor.

Windows MultiMedia System is the most basic sound reproduction system inherited from early Windows 3.1. Its larger buffer will result in a relatively large latency, so few applications in the game; however, the WINMM used by certain quasi-professional sound cards is specially optimized for WDM drivers.

Openal is a cross-platform API solution for Loki Entertaiment, similar to OpenGL. It is driven by Creative as one of Direct Sound. This idea is very good, but the reality is cruel because it has a relatively poor effect. In addition, Loki Entertaiment has recently announced bankruptcy. We hope that the new alternative sound API appears as soon as possible, because Openal is a nightmare for programmers. However, NVIDIA recently released the Openal hardware driver supported in its nforce chipset, which makes people don't believe it.

Direct Sound and Direct Sound 3D are the best APIs currently. They have no strong opponents, it is a bit self-evident; after all, it can truly reproduce the sound effect without any auxiliary.

These hardware APIs (with the API of the hardware driver, rather than analog sound from DirectSound or WinMM), which are called packaging (using ready-to-soft-hard interface programs to create their own application interface).

As a rule, each game has its own packaged application interface. There are currently many of this type of API component package (they don't have real hardware support): Miles Sound System, Renderware Audio, Gamecoda, Fmod, Galaxy, Bass, Seal.

Milesss is one of the most famous - 2700 games that fully use the component package. It obtains the INTEL RSX technology, and now it is one of the options for the optional options for software 3D Sound. This technology has many features available, but it is not enough to make up for its defects: it can only be applied in Win32 and Mac platforms and requires extremely expensive licensing fees.

Galaxy Audio is developed as an illusory, now it is used on all Unreal-based engines; but unreal 2 is Openal, which is why we can think that Galaxy is dead.

Game Coda and Renderware Audio come from Sensaura and Renderware, which have almost the same size, supports PC, PS2, GAMECUBE, Xbox and many other features, but its licensing cost is also very expensive.

FMOD, recently introduced technology, has a wide range of functional selection and perfect support for API technology, which accounts for current leadership.

EAX (environmental sound effect expansion)

EAX is fully named Environmental Audio Extension, which is the API slot standard introduced by innovation companies in the launch of SB Live sound cards, mainly for some specific environments, such as concert halls, corridors, rooms, caves, etc., as a sound effect. When the computer requires special sound effects, you can make the sound card processing through the DirectX and the driver, you can show the reaction of different sounds in different environments, and achieve stereo sound effects by multi-piece speakers. EAX is 1.0 in just now, and is currently 4.0, and many games are currently supported by this specification.

EAX Advanced HD (High Quality Audio and 3D Audio Technology)

In 2001, Creative announced an Audigy sound card and a new EAX function called Eax Advanced HD. It includes 25 parameters and 18 parameters for the audience to accurately adjust and 18 for sources (two of which are used for new enclosures).

Figure 15: EAX Advanced HD mode

● User optional settings, optimized for headphones, 2, 4 or 5.1 speaker systems and external A / V amplifier systems ● Dolby digital audio decoding is output to 5.1 speakers in analog or digital mode ● Upgradeable 3D audio architecture ● Game Hardware Accelerates Eax Advanced HD ● Creative multi-speaker surround (CMSS) technology can process arbitrary mono or stereo source to 360 degrees of sound ● EAX preset effect - user optional, DSP mode for analog acoustic environments ● Advanced Time Zoom Technology Regulates the speed of the track playback without changing the sound frequency ● Audio denoising function removes the background noise and CD disc population of the recording tape

Figure 16:

These effects are not typical true effects. They are used to create emotions, for example, if you feel dizzy, excitement, etc. We also have depth (0 ... 1) and modulation time (0.4 .... 4 seconds).

EAX4 (Eax Advanced HD Version 4)

In March 2003, Creative released EAX Advanced HD version 4, which is expected to be officially available at the end of April or early May. It is a pity that Creative does not describe its technical details. The difference between EAX3 and EAX4 is only conceptual.

EAX Advanced HD version 4 has the following new elements:

Studio Quality Effects Multi-Effect Slots Multi-Environment and Region Effects (Multiple Environments and Zoned Effects)

Studio quality effect

EAX4 provides 11 studio quality effects. You can select the following effects in 2D and 3D sources.

AGC Compressor - Automatically adjusts the level of sound volume Auto-Wah - Automatically adjust WAH Pedal version of the version of WAH PEDAL - Make a single instrument to send a plurality of instruments DISTORTION (distortion) - analog "excessive", set an amplifier Echo - Audio Space Equalizer (Equalizer) - 4-Bode Equalizer FLANGER - Production of 的的效效的产生的的产生en 的的的的的的的的的的的的的的的Transmission position) Ring Modulator ENVIRONMENT REVERB - EAX basic component multi-effect slot

You can join a variety of effects. For example, you can listen to the sound of several environments, or increase the gradient effect of distortion to the environment.

Figure 17: Specific plot of EAX Advanced HD V3

In EAX4, each source and listeners have their own environment; they are spreading from the source of the source in its own and the audience; closure, obstacles, and exclusion simultaneously apply in the source and listeners. Therefore, we can obtain mutual interference effects between the environment and the audience.

Regional effect

The concept of the region is very similar to the room or environment.

The regional effect is our best technology, but its implementation is far more difficult than the theoretical difficulties. The main difficulties that are currently facing are to find out the position of the source, and correct the nearest area loaded with each source and track the diffusion, closure, and obstacle parameters of each source. Of course, we don't need all the effects provided by EAX4; we only need to use the effects needed in real work.

转载请注明原文地址:https://www.9cbs.com/read-36741.html

9cbs

New Post(0)