Dynamic range compressed or standard. Sound compression: principle and configuration. Tasks that solves compression

The second part of the cycle is devoted to the functions of optimizing the dynamic range of images. In it, we will tell why such solutions are needed, consider various options for their implementation, as well as their advantages and disadvantages.

Through an immaterial

Ideally, the camera must fix the image of the world around the world as it perceives him. However, due to the fact that the mechanisms of "vision" of the camera and human eye differ significantly, there are a number of restrictions that do not allow this condition.

One of the problems faced by previously the users of film cameras and are now facing digital holders, lies in the inability to adequately capture the scenes with a large difference drop without using special fixtures and / or special shooting techniques. The peculiarities of the human visual apparatus allow you to equally well to perceive the details of the high-contrast scenes in both brightly illuminated and in dark areas. Unfortunately, the camera sensor is not always able to capture the image as we see it.

The greater the brightness drop on the photographed scene, the higher the probability of the loss of parts in lights and / or shades. As a result, instead of a blue sky with lush clouds, only a whitic spot is obtained in the picture, and the objects located in the shade turn into vague dark silhouettes or merge with the surrounding atmosphere.

In the classic photo to assess the feature of the camera (or carrier in the case of film cameras) to transmit a certain range of brightness uses the concept photographic latitude(For details, see in the insertion). Theoretically, the photographic latitude of digital cameras is determined by the discharge of an analog-digital converter (ADC). For example, when applying an 8-bit ADC, taking into account the quantization error, the theoretically achievable value of the photographic latitude will be 7 EV, for 12-bit - 11 EV, etc. However, in real devices, the dynamic range of images turns out w.theoretical maximum due to the influence of various noise and other factors.

The large difference in brightness levels is a serious
The problem when taking pictures. In this case, the capabilities of the camera
It turned out to be not enough for adequate transfer
bright areas of the scene, and as a result instead of a blue section
sky (marked by the stroke) turned out to be a white "patch"

The maximum brightness value that is capable of fixing the photosensitive sensor is determined by the level of saturation of its cells. The minimum value depends on several factors, including the magnitude of the thermal noise of the matrix, the noise of charge transfer and the ADC error.

It should also be noted that the photographic latitude of the same digital camera may vary depending on the sensitivity value set in the settings. The maximum dynamic range is achieved when the so-called basic sensitivity is set (corresponding to the minimum numerical value from possible). As the value of this parameter increases, the dynamic range is reduced due to the increasing level of noise.

Photographic latitude of modern models of digital cameras equipped with sensors big size and 14- or 16-bit ADCs, ranges from 9 to 11 EV, which is significantly more compared to similar characteristics of color negative films of 35 mm format (on average from 4 to 5 EV). Thus, even relatively inexpensive digital cameras have photographic latitude, sufficient to adequately transfer most typical fans of amateur shooting.

However, there is a problem of another kind. It is connected with the limitations imposed by the existing digital image recording standards. Using the JPEG format with a bit of 8 bits on the color channel (which has now become the actual standard for recording digital images in the computer industry and digital technology), even theoretically cannot be saved a snapshot with a photographic latitude of more than 8 EV.

Suppose that the Camera ADC allows you to get an image of a bit of 12 or 14 bits, containing distinguishable parts both in the lights and in the shadows. However, if the photographic latitude of this image is 8 EV, then in the process of conversion to a standard 8-bit format without any additional action (That is, simply by discarding the "unnecessary" discharges), the part of the information recorded by the photosensitive sensor is lost.

Dynamic range and photographic latitude

If we say simplistic, the dynamic range is defined as the ratio of the maximum image brightness value to its minimal value. In the classic photograph, the term photographic latitude is traditionally used, which, in fact, denotes the same.

The width of the dynamic range can be expressed in the form of a relationship (for example, 1000: 1, 2500: 1, etc.), however, the logarithmic scale is most often used for this. In this case, the value of the decimal logarithm of the maximum brightness ratio to its minimum value is calculated, and after the number is the capital letter D (from the English density? - Density), less often? - OD abbreviation (from English Optical density? - Optical density). For example, if the ratio of the maximum brightness value to the minimum value of any device is 1000: 1, then the dynamic range will be equal to 3.0 D:

To measure the photographic latitude, the so-called exhibition units denoted by the EV abbreviation (from English Exposure Values \u200b\u200bare traditionally used (from English. Exposure Values; professionals are often referred to by their "footsteps" or "steps"). It is precisely in these units that the magnitude of the exposure correction in the camera settings is usually set. An increase in the photographic latitude of 1 EV is equivalent to doubling the difference between the maximum and minimum brightness levels. Thus, the EV scale is also logarithmic, but to calculate the numerical values \u200b\u200bin this case, logarithm is applied with a base 2. For example, if any device allows you to fix images, the ratio of the maximum brightness value to the minimum value of which reaches 256: 1, then its Photographic latitude will be 8 EV:

Compression - reasonable compromise

The most efficient way to keep in full image information recorded by photosensitive camera sensor is recording pictures in rAW format. However, such a function is far from all cameras, and not every photographer is ready to engage in painstaking work on the selection individual settings For each picture taken.

To reduce the likelihood of loss of parts of high-contrast pictures converted inside the chamber in a 8-bit JPEG, in the devices of many manufacturers (not only compact, but also mirrored), special functions were introduced, allowing without user intervention to compress the dynamic range of the stored images. By reducing the total contrast and loss of a minor part of the source image information, such solutions allow you to save in 8-bit JPEG format parts in lights and shadows, fixed with a light sensitive sensor of the device, even if the dynamic range of the source image was wider than 8 EV.

One of the pioneers in the development of this direction was the company HP. In the HP Photosmart 945 digital camera released in 2003, HP Adaptive Lightling technology was implemented for the first time, allowing you to automatically compensate for the lack of illumination in the dark areas of the pictures and thus maintain parts in the shadows without the risk of overexposure (which is very relevant when shooting high-contrast scenes). The HP Adaptive Lightling algorithm is based on the principles set forth by the English scientist Edwin Land (Edwin Land) in the theory of the visual perception of Retinex.

HP Adaptive Lighting Features Menu

How does the Adaptive Lighting feature work? After receiving a 12-bit image, a snapshot of it is extracted with an auxiliary monochrome image, which actually represents a light map. When processing a snapshot, this card is used as a mask that allows you to adjust the degree of exposure to a rather complex digital filter on the image. Thus, in areas corresponding to the most dark points of the card, the impact on the image of the future snapshot is minimally, and vice versa. This approach allows you to show parts in the shadows due to selective lightening of these areas and, accordingly, reducing the overall contrast of the resultant image.

It should be noted that when the Adaptive Lighting function is turned on, the picture taken is processed in the manner described above before the finished image is recorded in the file. All described operations are performed automatically, and the user can only select one of the two modes of operation Adaptive Lighting in the camera menu (low or high level Impact) or disable this feature.

Generally speaking, many specific functions of modern digital cameras (including those considered in the previous article recognition of persons) are a kind of side or conversion products of research and development work, which were initially carried out for military customers. As for the functions of optimizing a dynamic range of images, one of the most famous providers of such solutions is Apical. The algorithms created by its staff, in particular, underlie the work of the SAT function (Shadow Adjustment Technology - shadow correction technology) implemented in a number of models of OLYMPUS digital cameras. In short, the SAT function can be described. in the following way: Based on the source image of the image, a mask is created corresponding to the most dark areas, and then for these areas, the exhibition value is automatically corrected.

A Sony has also acquired a license to use APICAL development. In many models of compact Cyber-Shot series and in the Alfa series mirror cameras, the so-called dynamic range optimization feature is implemented (Dynamic Range Optimizer, DRO).

Photographs made by HP PHOTOSMART R927 camera with disconnected (at the top)
and activated function adaptive lighting

Picture Correction When DRO activation is performed during the primary image processing process (that is, before recording a ready-made JPEG format file). In the basic version, DRO has a two-step setting (you can select the standard or advanced mode of its operation in the menu). When you select a standard mode based image analysis, the exposure value is corrected, and then a tone curve for aligning a general balance is applied to the image. In the advanced mode, a more complex algorithm is used, which allows the correction of both in the shadows and in the lights.

Sony developers are constantly working on the improvement of the DRO operation algorithm. For example, in the A700 mirror camera, when an advanced DRO is activated, it is possible to select one of five correction options. In addition, the possibility of saving the three options of one snapshot (a kind of bracketing) with various options for DRO is implemented.

In many models of Nikon digital cameras, there is a D-Lighting function, which is also based on Apical algorithms. True, in contrast to the solutions described above, the D-Lighting is implemented as a filter for processing previously stored images by means of a tone curve, the form of which allows you to make shadows with lighter, while maintaining the remaining sections of the image. But since in this case, ready-made 8-bit images are subjected to processing (and not the original frame image, having a higher bit and, accordingly, a wider dynamic range), the possibilities of D-Lighting are very limited. To obtain the same result, the user may be by processing the snapshot in a graphic editor.

When comparing enlarged fragments, it is clearly noticeable that the dark areas of the original image (left)
When you turn on the Adaptive Lighting function became lighter

There are a number of solutions based on other principles. So, in many cameras of the Lumix family of Panasonic (in particular, DMC-FX35, DMC-TZ4, DMC-TZ5, DMC-FS20, DMC-FZ18, etc.) is implemented a light recognition feature (Intelligent Exposure), which is an integral part of the system Intelligent automatic shooting control IA. The operation of the Intelligent Exposure function is based on the automatic frame image analysis and the correction of dark sections of the picture to avoid loss of parts in the shadows, as well as (if necessary) compression of the dynamic range of high-contrast scenes.

In some cases, the operation of the optimization function of the dynamic range provides for not only certain operations for processing the source image of the image, but also the correction of shooting settings. For example, in the new models of Fujifilm digital cameras (in particular, in FinePix S100FS), a dynamic range extension function is implemented (WIDE DYNAMIC RANGE, WDR), which allows the developers, to increase the photographic latitude of one or two steps (in the terminology of settings - 200 and 400%).

When activating the WDR function, the camera takes pictures with an exposure -1 or -2 EV (depending on the selected setting). Thus, the frame image is obtained incorrect - this is necessary in order to maintain maximum information about the details in the lights. Then the resulting image is processed using a tone curve, which allows to align the overall balance and adjust the black level. After that, the image is converted into an 8-bit format and recorded as a JPEG file.

Dynamic range compression allows you to save more details
In the lights and shadows, however, the inevitable consequence of such an impact
is a reduction in total contrast. On the lower imaging
It is much better developed by the texture of the clouds, however
Due to the lower contrast, this version of the picture
Looks less natural

A similar feature called Dynamic Range Enlargement is implemented in a number of compact and mirror cameras of Pentax (Optio S12, K200D, etc.). According to the manufacturer, the application of the Dynamic Range Enlargement function allows you to increase the photographic latitude of images on 1 EV without loss of parts in lights and shadows.

The function acting in this way called Highlight Tone Priority (HTP) is implemented in a number of Canon mirror models (EOS 40D, EOS 450D, etc.). According to the information provided in the User Guide, the activation of HTP allows you to improve the work of parts in the lights (or rather, in the level range from 0 to 18% gray).

Conclusion

Let's summarize. The built-in compression function of the dynamic range allows with minimal damage to convert the original image with a large dynamic range to the 8-bit JPEG file. In the absence of frames of saving frames in the RAW format, the compression mode of the dynamic range gives the photographer the opportunity to more fully use the potential of its camera when shooting high-contrast scenes.

Of course, it is necessary to remember that the compression of the dynamic range is not a miraculous means, but rather a compromise. For the preservation of parts in light and / or shadows, it is necessary to pay the noise level in the dark sections of the picture, a decrease in its contrast and some coating of smooth tonal transitions.

Like any automatic function, the compression algorithm of the dynamic range is not fully a universal solution that allows you to improve absolutely any picture. And therefore, to activate it makes sense only in cases where it is really necessary. For example, in order to remove the silhouette with a well-worked background, the compression function of the dynamic range must be turned off - otherwise the spectacular plot will be hopelessly spoiled.

Completing the consideration of this topic, it should be noted that the use of dynamic range compression functions does not allow "pull" on the resulting image part that were not fixed by the camera sensor. To obtain a satisfactory result, when shooting high-contrast scenes, you need to use additional devices (for example, gradient filters for photographing landscapes) or special techniques (such as shooting several frames with bracketing on exposure and further combining them into one image using Tone Mapping technology).

The next article will be devoted to the serial shooting function.

To be continued

Coding technology that is used in DVD players with its own

sound and receiver decoders. Compression (or reduction) of the dynamic range is used to limit sound peak when watching movies. If the viewer wishes to watch a movie in which sharp changes in the volume level (film is about war,

for example), but does not want to harm the members of your family, the DRC mode should be included. Subjectively, on a hearing, after turning on the DRC, the proportion of low frequencies decreases in sound and high sounds lose transparency, so without the need for DRC mode does not include.

Dreamweaver (see - Frontpage.)

The visual editor of hypertext documents developed by the software company Macromedia Inc. Powerful Professional Dreamweaver Program contains generation opportunities hTML pages Any complexity and scale, and also has built-in tools for supporting large network projects. Is an instrument visual designSupporting well-developed WYSIWYG concepts.

DRIVER (driver) (see Driver)

Software component that allows you to interact with devices

computer, such as lAN card (NIC), keyboard, printer or monitor. Network equipment (for example, a hub) connected to a PC requires drivers to ensure that the PC can interact with this equipment.

DRM (Digital Rights Management - Control and Copy Management, Copyright, Digital Rights Management)

u concept involving application special technologies and methods for protecting digital materials for the guaranteed provision of them only to authorized users.

v. Client program To interact with the Digital Rights Management Service package, which is designed to control access to the protected copywriter and its copying. DRM Services works in medium Windows Server 2003. Client software will work in Windows 98, Me, 2000 and XP, providing applications such as Office 2003, access to relevant services. In the future, Microsoft must release the digital rights management module for internet browser Explorer. In the future, it is planned to preserve such a program on a computer to work with any content using DRM technologies to protect against smuggling.

DROID (Robot) (see Agent)

DSA.(Digital Signature Algorithm - Digital Signature Algorithm)

An open key digital signature algorithm. Developed by NIST (USA) in 1991

DSL (Digital Subscrabe Line - Digital Subscriber Line)

Modern technology supported by urban telephone exchanges for exchanging signals at higher frequencies, compared to used in conventional, analog modems. The DSL modem can work simultaneously with the phone (analog signal) and with a digital line. Since the spectra of the voice signal from the phone and the digital DSL signal do not "intersect", i.e. Do not affect each other, DSL allows you to work on the Internet and talk on the phone along the same physical line. Moreover, DSL technology usually uses several frequencies, and DSL modems on both sides of the line are trying to choose the best of them for data transmission. The DSL modem not only transmits data, but also performs the role of the router. Equipped with an Ethernetport, the DSL modem makes it possible to connect several computers to it.

DSOM.(Distributed System Object Model, Distributed Som - Model of distributed system objects)

IBM technology with appropriate software support.

DSr.? (Data Set Ready - Data Transmission Signal, DSR signal)

A serial interface signal showing that the device (for example,

modem) Ready to send data bits in PC.

DSr.? (Device Status Report - Device Condition Report)

DSr.? (Device Status Register - Device Status Register)

DSS.? (Decision Support System - Decision support system) (see

, Media players

Plates, especially old, which were recorded and manufactured before 1982, with a much lower probability of mixing, during which the record would have been louder. They reproduce natural music with a natural dynamic range that is stored on the record and is lost in most standard digital formats or high-resolution formats.

Of course, there are exceptions here - Listen not to the long-lasting album Stephen Wilson from Ma Recordings or Reference Recordings, and you will hear how good it can be digital sound. But this is a rarity, most modern sound recordings are loud and compressed.

Recently, music compression is subject to serious criticism, but I am ready to argue that almost all your favorite records are compressed. Some of them are less, some more, but still compressed. The compression of the dynamic range is a kind of scapegoat, which is blamed in a bad musical sound, but strongly compressed music is not a new trend: Listen to the albums of the 60s. The same can be said about the classic work of LED Zeppelin or younger albums Wilco and Radiohead. The compression of the dynamic range reduces the natural ratio between the loud and quiet sound on the record, so the whisper can be as loud as a cry. It is quite problematic to find pop music of the last 50 years, which has not been subject to compression.

I recently talked cute with the founder and editor of Tape Op Larry Crane magazine (Larry Crane) about good, bad and "evil" aspects of compression. Larry Crane worked with such groups and performers as Stefan Marcus, Cat Power, Sleater-Kinney, Jenny Lewis, M. Ward, The Go-Betweens, Jason Little, Eliot Smith, Quasi and Richmond Fontaine. He also controls the sound recording studio Jackpot! In Portland, Oregon, who was a refuge for The Breeders, The Decepts, Eddie Vederra, Pavelment, R.E.m., She & Him and more for many other others.

As an example, surprisingly unnaturally sounding, but still excellent songs, I cite the album Spoon "The Want My Soul", released in 2014. Caren laughs and says that he listens to him in the car, because there he sounds perfectly. What leads us to another answer to the question why the music is compressed: because compression and additional "clarity" allow you to better hear it in noisy places.

Larry Craine at work. Photo of Jason Quigley (Jason Quigley)

When people say that they like the sound of audio recordings, I believe that they like music, as if the sound and music were inseparable terms. But for myself, I differ these concepts. From the point of view of music audana, the sound can be rude and raw, but it will not matter for most listeners.

Many hurry to accuse master engineers in compression abuse, but compression is applied directly during sound recording, during mixing and only then during mastering. If you personally did not attend each of these stages, you can't say how tools and vocal party sounded at the very beginning of the process.

Craine was in a blow: "If the musician wants to deliberately make the sound insane and distorted as a record guided by voices, then there is nothing wrong with that - the desire always outweighs the sound quality." The voice of the performer is almost always compressed, the same thing happens with bass, drums, guitars and synthesizers. With the help of compression, the volume of the vocal is saved at the desired level throughout the song or slightly distinguished against the background of other sounds.

Properly made compression can make the sound of drums more alive or intentionally strange. To music sound perfectly, you need to be able to use the necessary tools for this. That is why to understand how to use compression and not overdo it, years leave. If the mix-engineer squeezed too much a guitar party, then the master engineer will no longer be able to fully restore the missing frequencies.

If the musicians wanted you to listen to music that did not pass the stages of mixing and mastering, we would produce it on the shelves of stores straight from the studio. Crane says that people who create, edit, mix music and conduct their mastering, there are not to be confused by the musicians - they help performers from the very beginning, that is, more than a hundred years.

These people are part of the process of creation, as a result of which amazing works of art are obtained. Caren adds: "You do not need the version of the Dark Side of The Moon, which has not passed through mixing and mastering." Pink Floyd released a song in that kind, in what they wanted to hear it.

This group of methods is based on the fact that the transmitted signals are subjected to non-linear amplitude transformations, and in transmitting and receiving parts of nonlinearity is converted. For example, if the transmitter uses a nonlinear function ÖU, in the receiver - u 2. The consistent application of the convergent functions will lead to the fact that, in general, the transformation remains linear.

The idea of \u200b\u200bnonlinear data compression methods is reduced to the fact that the transmitter can give a larger range of change in the transmitted parameter with the same amplitude of the output signals (that is, greater dynamic range). Dynamic range - This is expressed in relative units or decibellah attitude of the greatest admissible signal amplitude to the smallest:

; (2.17)
. (2.18)

Natural desire to increase the dynamic range by reducing U MIN is limited by the sensitivity of the equipment and an increase in the effect of interference and its own noise.

Most often, the compression of the dynamic range is carried out using a pair of convergent functions of logarithming and potentiation. The first operation of changing amplitude is called compression(compression), second - expandment (stretching). The choice of these functions is related to their greatest compression capability.

At the same time, these methods have disadvantages. The first of them is that the logarithm of a small number is negative and in the limit:

that is, the sensitivity is very nonlinear.

To reduce these drawbacks, both functions are modified by offset and approximation. For example, for telephone channels, the approximated function is related (type A,):

and a \u003d 87.6. The gain from compression is 24dB.

Data compression by nonlinear procedures is implemented by analog facilities with large errors. The use of digital tools can significantly improve the accuracy or speed of the transformation. At the same time, the direct use of funds computer equipment (that is, the direct calculation of logarithms and exponentials) will give no better result due to low speed and accumulating calculation error.

Data compression by compression due to accuracy restrictions is used in non-response cases, for example, to transmit speech on telephone and radio channels.

Effective coding

Effective codes were offered to Sundon, Fano and Hafman. The essence of the codes is that they are uneven, that is, with a different category of discharges, and the length of the code is inversely proportional to the probability of its appearance. Another wonderful feature of effective codes - they do not require separators, that is special charactersseparating neighboring code combinations. This is achieved when observing a simple rule: shorter codes are not the beginning longer. In this case, the solid stream of binary discharges is uniquely decoded, since the decoder reveals the shortest code combinations first. Effective codes for a long time were purely academic, but recently used in the formation of databases, as well as in compressing information in modern modems and in software archivers.

Due to unevenness, the average code length is introduced. Medium length - mathematical expectation of code length:

moreover, L CP tends to h (x) from above (that is, L Wed\u003e H (x)).

The implementation of the condition (2.23) is enhanced by increasing N.

There are two varieties of effective codes: Shannon Fano and Hafman. Consider their receipt on the example. Suppose the probabilities of the characters in the sequence are the meanings shown in Table 2.1.

Table 2.1.

Probabilities of symbols

N.
P I. 0.1 0.2 0.1 0.3 0.05 0.15 0.03 0.02 0.05

Symbols are ranked, that is, they seek in a row on descending probability. After that, according to the Shennon Fano method, the following procedure is periodically repeated: the entire group of events is divided into two subgroups with the same (or approximately the same) total probabilities. The procedure continues until one element remains in the next subgroup, after which this element is eliminated, and with the remaining these actions continue. This happens until the last two subgroups remain one element. Continue consideration of our example, which is reduced in Table 2.2.

Table 2.2.

Chennon Fano method

N. P I.
4 0.3 I.
0.2 I. II.
6 0.15 I. I.
0.1 II.
1 0.1 I. I.
9 0.05 II. II.
5 0.05 II. I.
7 0.03 II. II. I.
8 0.02 II.

As can be seen from Table 2.2, the first symbol with a probability P 4 \u003d 0.3 participated in two partitioning procedures and both times hit the group with number i. In accordance with this, it is encoded by two-bit code II. The second element in the first stage of the partition belonged to Group I, on the second - group II. Therefore, its code 10. The codes of the rest of the characters in additional comments do not need.

Usually uneven codes are depicted in the form of code trees. The code tree is a graph indicating the allowed code combinations. Pre-specify the directions of the ribs of this graph, as shown in Fig.2.11 (the choice of directions is arbitrary).

The graph is guided as follows: make up a route for a dedicated symbol; The number of discharges for it is equal to the number of edges in the route, and the value of each discharge is equal to the direction of the corresponding rib. The route is made up of the source point (it is labeled in the drawing a). For example, the route to the vertex 5 consists of five ribs, of which everything, in addition to the latter, have direction 0; We get the code 00001.

Calculate for this example entropy and middle length of the word.

H (x) \u003d - (0.3 log 0.3 + 0.2 log 0.2 + 2 0.1 Log 0.1+ 2 0.05 log 0.05+

0.03 Log 0.03 + 0.02 log 0.02) \u003d 2.23 bits

l cp \u003d 0.3 2 + 0.2 2 + 0.15 3 + 0.1 3 + 0.1 4 + 0.05 5 +0.05 4+

0.03 6 + 0.02 6 = 2.9 .

As can be seen, the medium length of the word is close to entropy.

Hafman codes are built on a different algorithm. The encoding procedure consists of two stages. At the first stage, one-time compression of the alphabet is consistently. One-time compression is the replacement of the last two characters (with lower probabilities) one, with a total probability. Compression is carried out until two characters remain. At the same time fill the coding table in which the resulting probabilities are affixed, and also depict routes for which new characters are moving at the next stage.

At the second stage, the coding itself occurs, which begins from the last stage: the first of two characters assign code 1, the second - 0. After that, go to the previous stage. To the symbols that did not participate in the compression at this stage, attribute codes from the subsequent stage, and to the two latest characters twice attribute the symbol code obtained after gluing, and add to the upper symbol code 1, Lower - 0. If the symbol is further in gluing Participates, its code remains unchanged. The procedure continues to the end (that is, until the first stage).

Table 2.3 shows coding along the Hafman algorithm. As can be seen from the table, the coding was carried out in 7 stages. On the left are the probabilities of characters, right - intermediate codes. The arrows show moving newly formed characters. At each stage, the last two characters differ only with the younger discharge, which corresponds to the coding technique. We calculate the average length of the word:

l cf \u003d 0.3 2 + 0.2 2 + 0.15 3 ++ 2 0.1 3 + +0.05 4 + 0.05 5 + 0.03 6 + 0.02 6 \u003d 2.7

It is even closer to the entropy: the code is even more effective. In fig. 2.12 shows the Hafman Code tree.

Table 2.3.

Coding on the Hafman algorithm

N. P I. the code I. II. III IV V. VI VII
0.3 0.3 11 0.3 11 0.3 11 0.3 11 0.3 11 0.4 0 0.6 1
0.2 0.2 01 0.2 01 0.2 01 0.2 01 0.3 10 0.3 11 0.4 0
0.15 0.15 101 0.15 101 0.15 101 0.2 00 0.2 01 0.3 10
0.1 0.1 001 0.1 001 0.15 100 0.15 101 0.2 00
0.1 0.1 000 0.1 000 0.1 001 0.15 100
0.05 0.05 1000 0.1 1001 0.1 000
0.05 0.05 10011 0.05 1000
0.03 0.05 10010
0.02

Both codes satisfy the requirement of decoding uniqueness: as can be seen from the tables, shorter combinations are not the beginning of longer codes.

With increasing number of symbols, the effectiveness of codes increase, so in some cases encoded larger blocks (for example, if we are talking about texts, you can encode some of the most common syllables, words, and even phrases).

The effect of the implementation of such codes is determined in comparison with the uniform code:

(2.24)

where n is the number of uniform code discharges, which is replaced with effective.

Modifications of Khafman codes

The classic Hafman algorithm refers to two-passable, i.e. Requires the initial set of statistics on symbols and messages, and then the procedures described above. It is inconvenient in practice, because it increases the processing time of messages and the accumulation of the dictionary. Single-pass methods in which accumulation and coding procedures are combined. Such methods are also called adaptive compression along Hafman [46].

The essence of adaptive compression across Hafman is reduced to the construction of the initial code tree and its consistent modification after the receipt of each next symbol. As before, the trees here are binary, i.e. From each vertex of the graph - wood, a maximum of two arcs occurs. It is customary to call the original peak by the parent, and the two associated next vertices - children. We introduce the concept of weight of the vertex - this is the number of characters (words) corresponding to this vertex obtained when the initial sequence is applied. Obviously, the sum of the scales of children is equal to the weight of the parent.

After entering the next symbol of the input sequence, the code tree is revised: the weights of the vertices are recalculated and, if necessary, the vertices are rearranged. The rule of rearrangement of the vertices as follows: the weights of the lower vertices are the smallest, and the vertices that are left on the column have the smallest weights.

At the same time, the vertices are numbered. The numbering begins with the lower (hanging, i.e. who do not have children) vertices from left to right, then transferred to upper level etc. to the numbering of the last, source vertex. At the same time, the following result is achieved: the less weight of the vertex, the less its number.

The permutation is carried out mainly for hanging vertices. When permutation, the formulated rule is considered: the tops with high weight have a larger number.

After passing the sequence (it is also called control or test), the code combinations are assigned to all hanging vertices. The rule assignment rule is similar to the above: the number of code discharges is equal to the number of vertices through which the route runs from the source to this hanging vertex, and the value of a specific discharge corresponds to the direction from the parent to the "child" (say, the transition to the left from the parent corresponds to the value 1, right - 0 ).

The obtained code combinations are entered into the memory of the compression device along with their analogues and form a dictionary. The use of the algorithm is as follows. The compressible sequence of characters is divided into fragments in accordance with the existing dictionary, after which each of the fragments is replaced by its code from the dictionary. Fragments not detected in the dictionary form new hanging vertices, gain weight and are also entered into the dictionary. This is formed by an adaptive algorithm for a dictionary replenishment.

To increase the efficiency of the method, it is desirable to increase the size of the dictionary; In this case, the compression coefficient is rising. Virtually the size of the dictionary is 4 - 16 KB of memory.


We illustrate the algorithm given by an example. In fig. 2.13 shows the source diagram (it is also called with a hafman tree). Each vertex of wood is shown by a rectangle, in which two digits are inscribed through the fraction: the first means the number of the vertices, the second is its weight. How can you make sure that the versic weights and their numbers are satisfied.

Suppose now that the symbol corresponding to the vertex 1, in the test sequence met the secondary. The weight of the vertices changed, as shown in Fig. 2.14, as a result, the number of numbering the vertex is violated. At the next stage, we change the layout of hanging vertices, for which we change the vertices 1 and 4 and renumbers all the vertices of the tree. The resulting graph is shown in Fig. 2.15. Next, the procedure continues similarly.

It should be remembered that each hanging peak in the Hafman tree corresponds to a specific symbol or their group. The parent is different from children by the fact that a group of characters, it is appropriate to him, for one symbol in short, than his children, and these children differ in the last symbol. For example, the parents correspond to the "car" symbols; Then children may have a "Kara" and "carp" sequences.

The above algorithm is not academic and is actively used in programs - archivers, including when compressing graphic data (they will be discussed below).

Lempel - Ziva algorithms

These are the most commonly used compression algorithms. They are used in most programs - archivers (for example, Pkzip. Arj, LHA). The essence of algorithms is that some set of characters is replaced when archiving it in a specially generated dictionary. For example, often found in the affairs of the phrase "on your letter outgoing number ..." can occupy in the dictionary position 121; Then, instead of transferring or storing the mentioned phrase (30 bytes), you can store the phrase number (1.5 bytes in binary - decimal form or 1 byte - in binary).

Algorithms are named after the authors who first offered them in 1977. Of these, the first - LZ77. For archiving, the so-called sliding window consisting of two parts is created. The first part, greater format, serves to form a dictionary and has a size of the order of several kilobytes. In the second, smaller part (usually up to 100 bytes) are accepted by the current characters of the text being viewed. The algorithm is trying to find in the dictionary Set of characters coinciding with the viewed window. If it is possible, a code consisting of three parts is generated: a displacement in the dictionary regarding its initial substring, the length of this substring next to this substrate character. For example, a dedicated substrate consists of "application" symbols (only 6 characters), the following symbol is "e". Then, if the substring has an address (place in the dictionary) 45, then the record in the dictionary has the form "45, 6. E". After that, the contents of the window shifts to the position, and the search continues. Thus, a dictionary is formed.

The advantage of the algorithm is an easily formalized algorithm for compiling a dictionary. In addition, it is possible to unzip and without the initial dictionary (it is desirable to have a test sequence) - the dictionary is formed in the process of unimber.

The disadvantages of the algorithm appear with an increase in the size of the dictionary - the time to search is increasing. In addition, if a string of characters are missing in the current window, each symbol is written to three-element code, i.e. It turns out not compression, but stretching.

Best features It has the LZSS algorithm proposed in 1978. It has differences in maintaining the sliding window and the output codes of the compressor. In addition to the window, the algorithm forms a binary tree, similar to the Hafman tree to speed up the search for coincidences: each substring leaving the current window is added to the tree as one of the children. Such an algorithm allows you to further increase the size of the current window (it is desirable that its value equal to the degree of two: 128, 256, etc. byte). The sequence codes are also formed differently: 1-bit prefix is \u200b\u200badditionally introduced for distinguishing the non-projected characters from pairs "offset, length".

An even greater compression is obtained using LZW type algorithms. The previously described algorithms have a fixed window size, which leads to the impossibility of entering into the dictionary of phrases is longer than the window size. In the LZW algorithms (and their predecessor lz78) the view window has an unlimited size, and the dictionary accumulates the phrase (and not a totality of characters as before). The dictionary has an unlimited length, and the encoder (decoder) operates in the mode of standby mode. When the phrase that coincides with the dictionary is formed, the coincidence code is issued (i.e. code of this phrase in the dictionary) and the code of the following symbol behind it. If as symbols accumulate a new phrase is formed, it is also entered into the dictionary, as the shortest one. As a result, a recursive procedure is formed, providing quick encoding and decoding.

An additional compression capability provides compressed encoding of repeated characters. If in the sequence, some characters follow in a row (for example, in the text it may be the "space" characters, in the numerical sequence - flowing zeros, etc.), it makes sense to replace their pair "Symbol; Length" or "Sign, Length ". In the first case, the code indicates the feature that the sequence is encoded (usually 1 bit), then the code of the repeating symbol and the length of the sequence. In the second case (provided for the most common repeated symbols) in the prefix indicates simply a sign of repetitions.

People who are enthusiastic with homemade sound demonstrate an interesting paradox. They are ready to shove the listening room, to build columns with exotic emitters, but they are embarrassed in front of the musical canned, as if the wolf in front of the red flag. And in fact, why it is impossible for the checkbox to get out, and from canned try to cook something more edible?

Periodically, there are complaints on the forum: "Advise well-recorded albums." It is understandable. Special audiophile editions, though they will delight hearing the first minute, but no one is listening to the end, it hurts the repertoire. As for the rest of the phonothek, the problem seems obvious. You can save, but you can not save and empty a buzz of money into the components. I still do not like to listen to your favorite music on high volume and the possibility of an amplifier here.

Today, even in Hi-Res albums, the peaks of the phonogram and the volume of the driven into clipping are cut. It is believed that the majority listens to music on every junk, and therefore it is necessary to "ask the Gat", to make a kind of dedication.


Of course, this is not done specifically to upset audiophiles. About them generally few people remember. Well, except that they guessed to let the master files with which the main circulation is copied - CDs, MP3, and so on. Of course, the wizard has long been flattened by the compressor, no one will consciously prepare special versions for HD Tracks. Is that a certain procedure for vinyl carrier, which for this reason and sounds more humanely. And for the digital path, everything ends the same - a large thick compressor.

So, currently all 100% of the phonograms published, minus classical music, are subjected to compression when masthering. Someone performs this procedure more or less skillfully, and someone is completely in stupid. As a result, we have pilgrims on the forums with the line of the DR plugin for the sinus, painful comparisons of publications, escape to Vinyl, where you also need a Main Popper.

The most frostbitten at the sight of all these disgraces turned literally in audio shoes. No joke, they read the sound source holy Scripture backwards! Modern programs Sound editing have some tool for restoring a sound wave subject to clipping.

Initially, this functionality was intended for studios. When mixed, there are situations when clipping has come to write, and it is no longer possible to remake the session for a number of reasons, and here comes to the aid arsenal audio editor - decalipper, decompressor, etc.

And already for such software, all the bolder pulls the handles of ordinary listeners who have blood from the ears after the next novelty. Someone prefers Izotope, someone Adobe Audition, someone operations shares between several programs. The meaning of the restoration of the former dynamics is to correctly correct the clip-plated signal peaks, which, resting in 0 dB, resemble a gear.

Yes, about 100% revival of the Source of Speech does not go, since the processes of interpolation on fairly speculative algorithms occur. But still, some of the results of processing seemed to me interesting and worthy of study.

For example, the album of Lana del Rey "Lust for Life", consistently frowning, pah, driving! In the original song "WHEN THE World Was At War We Kept Dancing" was like this.


And after a series of decalippers and decompressors, it became like this. The DR coefficient has changed from 5 to 9. Download and listen to the sample before and after processing.


I can not say that the method is universal and is suitable for all the deployed albums, but in this case I preferred to preserve in the collection exactly this option treated with a rutraker activist, instead of the official publication in 24 bits.

Even if the artificial pulling of peaks from the sound minced is not returning the true dynamics of musical performance, your DAC will still say thank you anyway. It was so hard for him to work without mistakes at the limit levels, where the likelihood of the so-called intersmonic peaks (ISP) is great. And now up to 0 dB will dope only rare spoors of the signal. In addition, the triggered phonogram when compressed in FLAC or other Lossless codec will now be smaller in size. More "air" in the signal saves Hard Drive space.

Try to revive your most hated albums killed on the "Volume War". For the reserve of the speaker, you first need to lower the level of the track on -6 dB, and then start the declipper. Those who do not believe computers can simply stick between the CD player and the amplifier studio expander. This device is essentially engaged in the same - as it can restores and pulls the peaks compressed over the audio signal. Cost similar devices From the 80-90s, not to say very expensive, and as an experiment, try them very interesting.


The dynamic range controller DBX 3BX processes the signal separately in three stripes - LF, SC and RF

Once the equalizers were for granted component of the audio system, and no one was afraid of them. Today it is not necessary to level the high frequencies of the magnetic tape, but with the ugly dynamics it is necessary to solve something, brothers.