System implementation. Image Pretreatment Algorithms Apply Neural Network Recognition

Digital noise is an image defect, which is randomly located areas that have dimensions close to pixel size and differ from the source image with brightness or color. Noise reduction plays an important role in transmitting, processing and compressing video sequences and images.

Noise on video can occur for several reasons:

1. Imideless video capture equipment.

2. Bad shooting conditions - for example, night photo / video shooting, shooting in rainy weather.

3. Interference when transmitted by analog channels - tip from sources of electromagnetic fields, own noise of active components (amplifiers) of the transmission line. An example is a television signal.

4. Inaccuracy of filtration when highlighting brightness and colorless signals from an analog composite signal, etc.

The amount of noise in the image can vary from an almost imperceptible spot on the digital photograph made in good light, to astronomical pictures whose noise is hiding most of the useful information, which can be obtained only by labor-intensive image processing.

Noise is different types, depending on the nature of the random distribution of interference in the image. In practice, the following types are most often found:

White Gaussian noise

One of the most common noise is the Auditive Gaussian noise, which is characterized by adding an image of values \u200b\u200bwith a normal distribution to each pixel and with zero average. The term "additive" means that this type of noise is summed with a useful signal. There is a bad signal reception environment.

Digital noise

The reason for the occurrence of digital noise is most often associated with the features of the equipment used for shooting - usually with insufficient photosensitivity of the matrix. This type of noise is characterized by replacing part of the pixels in the image of a fixed or random variable. If the brightness of the points is approximately equal, the digital noise is also called "impulse". If the intensity of points may vary from black to white, such noise is called the noise of the type "salt and pepper".

Usually this type of noise affects only a small number of pixels of the image.

Combined noise

Cases are much less likely when the image is equal to the amount of gaussian noise and random impulses. Such a totality is called combined noise.

Defects scanning images

Also in the image may appear extraneous effects, such as cracks, scratches, referrals. These artifacts do not have a homogeneous structure, the definition of their form and location is mainly not amenable to mathematical analysis. The fight against defects of this kind is possible only with manual image processing, so in this paper they are not considered.

Noise removal algorithms

There are a large number of algorithms for eliminating noise from images, and not only special processing programs can be applied, but some photos and camcorders. Despite this, there is still no universal filtering algorithm, since when processing the image consistently measures the need to choose between the degree of eliminating effects and the preservation of small parts that have similar noise characteristics. In addition, the algorithm, easily coping with one type of noise, can only spoil the image with another type of interference.

Consider several of the most famous noise suppression algorithms on images.

Linear averaging pixel

The simplest idea of \u200b\u200bremoving noise is to averaged the pixel values \u200b\u200bin the spatial neighborhood. Since the noise changes independently of the pixel to the pixel, the noises of neighboring pixels when summing will compensate each other. A rectangular window is specified, which is overlapped by each pixel image. The value of the central pixel is calculated based on the analysis of all the pixels adjacent to it in the window. Accordingly, the greater the window taken, the more averaged value will be as a result, which leads to a strong blurring effect.

In the simplest embodiment, the analysis of neighboring pixels is to find their average arithmetic value. To reduce the effect of pixels that do not belong to the same area as considered (for example, a dark circuit on a light background), you can enter some numerical threshold and take into account when calculating only those neighbors, the difference from the central pixel does not exceed this threshold. The greater the threshold value, the stronger the averaging will occur. The considered option can be complicated by entering the weight coefficients for each adjacent pixel, depending on their distance to the values \u200b\u200bof the region under consideration.

This method can also be used in the time domain, averaging each pixel at adjacent video stream personnel (each pixel will averaged the pixels located in the same position in the adjacent frames).

This algorithm is very simple, but it does not give a good result, at the same time leading to a strong blurring of the details of the image.

Filter Gaussa

It has the principle of operation similar to the previous method and also refers to the number of smoothing filters. However, noise reduction using a linear averaging filter has a significant drawback: all neighbors of the processed pixel have the same effect on the result, regardless of their distance to it. The Gaussian filter also averages the central pixel and its neighbors in some region, only this happens on a certain law, which sets the Gauss function.

Where parameter y sets the degree of blur, and the parameter A provides normalization. As a result, the central pixel of the region under consideration will have the greatest value corresponding to the peak of the Gauss distribution. The values \u200b\u200bof the remaining elements will have an increasing effect as they remove from the center.

The matrix filter, calculated according to the specified formula, is called Gaussian; The more its size, the stronger the blur (with a fixed y). Since this filter is separable, it can be represented as:

From here it follows that the convolution can be performed sequentially on rows and by columns, which leads to a significant acceleration of the method of the method at large filter sizes.

Algorithm 2DCleaner

Replaces each pixel image by the average value of neighboring pixels taken in the region bounded by some radius. At the same time, not all points falling into the radius are considered, but only those whose value differs from the central pixel is nothing more than a predetermined value (threshold). Due to this, evenly painted areas are blurred stronger than sharp boundaries of objects. This allows you to reduce low-level noise in the image, at the same time preserving intact small details.

Median filtration

Linear algorithms are very effective when the Gaussian interference is suppressed when neighboring pixels, although they have some random variation of values, but still remain within a certain mean, characteristic of the area to which they belong. However, sometimes you have to deal with images distorted by other types of interference. An example of such interference is a pulse noise manifested in the presence of chaotic scattered random brightness points. Averaging in this case "smears" every such point on the adjacent pixels, leading to a deterioration in image quality.

The standard method of suppressing pulse noise is median filtering. This non-linear image processing method allows to eliminate sharp emissions, but, in contrast to the averaging linear algorithms, leaves the monotonous sequences of pixels unchanged. Due to this, the median filters are capable of preserving without distortion of the contours of objects and differences between areas of various brightness, while effectively suppressing non-corrosioned interference and small parts.

PRINCIPLE OF FILTRATION: A kind of odd window is set, which is sequentially superimposed on each image pixel. Among all the pixels in the area under consideration, including the central, are looking for a median value, which is eventually assigned to the central pixel of the area. Under median in this case it is understood as the middle element of the array of sorted pixel values \u200b\u200bbelonging to the region. The odd size of the window is selected precisely to ensure the existence of a median pixel.

It is possible to use the median filter and to suppress the white Gaussian noise in the image. However, the study of noise suppression using median filtering shows that its effectiveness in solving this problem is lower than that of linear filtering.

Median filtering is not devoid of a disadvantage that is characteristic of most noise-canceling filters - with increasing mask size to improve the degree of noise reduction, the definition of the image and the blur of its circuits occurs. However, it is possible to reduce negative effects to a minimum, using median filtering with a dynamic mask size (additive median filtering) its principle remains the same, only the size of the sliding filtration window may vary depending on the brightness of adjacent pixels.

Enhance the sharpness of the image

Almost all algorithms for suppressing noise in the image lead it to blur, as a result, small parts are lost, the image perception is hampered. Partially compensate for this negative effect and restore the lost contour contrast and the color transitions is capable of increasing the image sharpness. The sharpness may depend on many other factors - on the quality of the lens, from the aperture used, from the thickness of the anti-moar filter located on the matrix of most digital cameras, in varying degrees of the working image. Also, sharpness of images often need to be increased after decreasing their size, because at the same time part of the information and with it and the clarity of contours are inevitably lost.

An unwitting masking is a reception that allows you to increase the contrast of transitions between the image tones to improve its visual perception due to the illusion of increasing sharpness. In fact, the sharpness remains at the same level, because, in principle, it is impossible to restore the lost details of the image, but the improvement of the contrast between the sections of different brightness leads to the fact that the image is perceived as clearer.

Figure 5.1 - Illustration of the concept of "Contour Sharpness"

The sharpness of the image depends on the magnitude of the brightness of the brightness between the areas (W), forming its contours, and from the sharpness of the change in this drop (H).

Reception of the unaware masking was first applied to handling film photos. The method adapted to digital image processing is not very different from the original: the so-called "unwitting mask" is subtracted from the image - its blurred and inverted copy. The result becomes a new image, containing only the light contours of the original. Dark contours can be obtained by simply inverting the result.

If in the future subtracting from the original image dark contours and add light, it will be a significant increase in contrast on each brightness drop.

To blur the original in order to obtain a "non-timber mask", you can use any of the noise-repeating filters, for example, a Gauss filter.

Figure 5.2 - Result Application of blissful masking

The convolution operation is quite often used when processing images. In addition to increasing the sharpness, it is used for blur, increase brightness, clarification, etc.

The image of the image is called the calculation operation of a new value of a given pixel, at which the values \u200b\u200bof the surrounding adjacent pixels are taken into account. In general value, this term means some action that is performed above each part of the image.

The main element of the convolution is a convolution mask - this is a matrix (arbitrary size and relationship of the parties). Often such a mask is called a filter, core, template or window. The values \u200b\u200bof the matrix elements are called coefficients.

Most often, a square matrix is \u200b\u200bused as a campus kernel.

Image processing by a convolution operation is as follows: the central element of the matrix is \u200b\u200bsequentially superimposed on each pixel of the image, called "anchor". The new value of the pixel under consideration is calculated as the sum of the values \u200b\u200bof the adjacent pixels multiplied by the coefficients of the convolution mask.

The resulting effect depends on the selected cobble kernel.

The core of the contrasting filter matters, greater than 1, at point (0, 0), with the total amount of all values \u200b\u200bequal to 1. For example, the contrasting filter is filters with kernels defined by matrices:

The effect of increasing contrast is achieved due to the fact that the filter emphasizes the difference between the intensities of neighboring pixels, removing these intensities from each other. This effect will be the stronger than the value of the central member of the kernel.

Linear contrast-rising cartridge-based filtration can lead to visible colored halisol around the image contours.

Compensation difference difference

The image lighting problems most often occur when entering the frame of windows, the sun or other unregulated light sources.

This situation is called the "excess of light" and leads to the fact that due to the too bright counter-refusal lighting the parts and colors of items located on the background of the too bright objects are lost, becoming difficult to distinguish.

Also often occurs the situation of lack of light. Its reason can be shooting in dark rooms with poor illumination, as well as a limited range of video equipment sensitivity.

Single Scale Retinex algorithm

When you try to lighten the image by increasing the brightness of each pixel to some fixed value, initially light areas may be completely littered.

In such cases, it is necessary to apply a "smart" color correction, which would be able to align lighting in the image, processing light areas to a lesser extent than dark.

These requirements satisfy the Single Scale Retinex algorithm based on the principles of the device retina receptors. The main purpose of the algorithm is to divide the image on the components that are responsible separately for illumination and the part. Since the problems in the image are associated with the lighting of the scene, then, having received the component responsible for the lighting, it becomes possible to transform it separately from the image, thereby significantly enhancing its quality.

Any image can be represented as a product of a high-frequency signal (reflection - R) and a low-frequency signal (illumination - I).

S (x, y) \u003d i (x, y) * r (x, y) (5.6)


Figure 5.3 - Image view in the Retinex algorithm.

An approximate illumination image can be obtained using low-frequency filtering - in other words, simply blur the original image, for example, the Gauss filter.

where G is a Gaussian filter

Since the logarithm of the signal does not change the frequency, and thanks to the properties of the logarithmic function (the logarithm from the product is equal to the sum of the logarithms of the factory), the task of separating the signal of the signals can be simplified before the separation of the signal amount.

After that, it remains only to take from the resulting signal to the exhibitor to return it to the original amplitude scale. The obtained high-frequency component can be folded with a blurred and clarified source image, acting as a new light model.

The effect obtained from alignment of illumination may be too strong (dark areas will become the same as light). To reduce the effect, you can simply mix the processed image with the source in a certain proportion.

Gamma correction

The initial purpose of gamma correction is compensation of differences in the displayed colors on various output devices so that the image looks the same when viewing on various monitors. Thanks to the nonlinear type of the used power function, the gamma correction also allows you to increase the contrast of darkened areas of the image, without turning bright parts and not losing the distinguishability of the boundaries of the image objects.

Information on brightness in analog form in television, as well as digital form in most common graphic formats, is stored in a nonlinear scale. The brightness of the pixel on the monitor screen can be considered proportional

where i is the brightness of the pixel on the display screen (or brightness of the components of colors, red, green and blue separately),

V - numerical color value from 0 to 1, and

g is an indicator of gamma correction.

If g is less than 1, the level transmission characteristic will be a convex and resulting image will be lighter than the original one. If g is greater than 1, then the level transmission characteristic will be concave and the resulting image will be darker than the original one.

By default, the parameter g is equal to 1, which corresponds to the linear characteristic of the transfer of levels and the absence of gamma - correction.

Selecting contour image

The contour analysis can be used to describe, recognize, compare and search for graphic objects presented in the form of external outlines. Since the use of contours eliminates the internal points of the object from consideration, this can significantly reduce the computational and algorithmic complexity of these operations.

Figure 5.4 - change of the type of power function depending on the parameter g

The object circuit is a list of points that are a kind of curve in the image separating the object from the background. Most often along the contour there is a jump in brightness or color.

To simplify the search for contours in the image, it is possible to pre-conduct its binization.

Bobe filter highlights objects borders based on their brightness. Since the color component is not taken into account, the images must first be converted to shades of gray.

The Babe filter is used in series to each pixel, calculating the approximate value of the gradient of its brightness. The gradient for each point of the image (the brightness function) is a two-dimensional vector, the components of which are derived brightness of the image horizontally and vertical.

At each point of the image, the gradient vector is oriented towards the largest increase in brightness, and its length corresponds to the magnitude of the brightness change. These data allow us to make an assumption about the probability of finding the point under consideration at the border of a certain object, as well as about the orientation of this border.

So The result of the work of the Kamel operator at the point of the constant brightness region will be the zero vector, and at the point lying on the boundary of the areas of various brightness - the vector crossing the border in the direction of increasing the brightness.

To calculate the approximate values \u200b\u200bof the derivatives at each point of the image, the Babe filter uses a 3h3-sized matrix.

Matrix coefficients Bobe:

The total magnitude of the gradient is calculated by approximating by the formula:

| G | \u003d | GX | + | GY |

Kenny's borders detector

Although the work of Kenny was held at the dawn of computer vision (1986), Kenny's borders detector still is one of the best detectors. The Kenny method is a multi-step algorithm, and includes the following steps:

1. Cleaning the image from noise and unnecessary parts.

2. Cleaning the image from noise and unnecessary parts.

3. Search for image gradients, for example, Bobe operator.

4. Suppression of non-maxima. Only local maxima are marked as borders.

5. Double threshold filtration. Potential boundaries are determined by the thresholds.

6. Tracing of contours (tie the edges in contours)

Since the slightest noise in the image can disrupt the integrity of its contours, before starting the search, it is recommended to filter the image to any noise-canceling method. Due to the high speed and simplicity of implementation, the Gauss filter is most often used. The boundaries in the image may be in different directions, so the Kenny algorithm uses four filters to identify horizontal, vertical and diagonal boundaries. Taking advantage of the border detection operator (for example, the Bobe operator), a value for the first derivative in the horizontal direction (GU) and the vertical direction (GX) is obtained. From this gradient you can get an angle of direction of boundary:

The border direction angle is rounded to one of the four angles representing the vertical, horizontal and two diagonals (for example, 0, 45, 90 and 135 degrees). The boundaries are declared only those pixels in which the local maximum of the gradient is achieved in the direction of the gradient vector. The direction of the direction should be multiple of 45 °. After the suppression of non-maxima, the edges become more accurate and thin.

In the next step by threshold filtering for each pixel under consideration, it is determined whether it refers to the image boundaries. The higher the threshold is, the more homogeneous contours will be, however, weak edges can be ignored. On the other hand, with a decrease in the threshold, the susceptibility of the algorithm to noise increases. The selection of Kenny borders uses two filtering thresholds: if the pixel value is above the upper limit - it takes the maximum value (the boundary is considered to be reliable), if the pixel is suppressed below, the point with the value falling into the range between the thresholds takes the fixed average value (they will be refined on next stage).

The last step processing is the binding of individual edges into homogeneous contours. The pixels that received the average value at the previous step are either suppressed (if they are not in contact with any of the already detected edges) or are attached to the corresponding contour.

Segmentation

Most of the images obtained from photo and video equipment are raster, that is, consisting of color points located in the form of a rectangular mesh. However, people perceive the world around the world as a set of solid objects, and not a matrix out of points. The human brain is able to unite the scattered images of the image into homogeneous areas, at the subconscious level, clearly dividing it to objects. This process is called segmentation, and can be implemented programmatically when solving a task of computer analysis and image recognition. Segmentation is performed in the first phases of the analysis, and the quality of its execution can have a strong effect on its speed and accuracy.

Segmentation methods can be divided into two classes: automatic - not requiring interaction with the user and interactive - using the user input directly during operation.

In the first case, no a priori information about the properties of the areas is used, but on the very image splitting, some conditions are superimposed (for example, all areas should be homogeneous in color and texture). Since, with such a formulation of the segmentation problem, prior information about those depicted objects is not used, the methods of this group are universal and applicable to any images.

For a rough estimate of the quality of the method, several properties are usually recorded in a particular task, which should have good segmentation:

§ The homogeneity of the regions (color uniformity or texture);

§ Ensure neighbor regions;

§ smoothness of the boundary of the region;

§ a small number of small "holes" within the regions;

Threshold segmentation

The threshold treatment is the simplest method focused on image processing, separate homogeneous sections of which are varying medium brightness. However, if the image is illuminated unevenly, some objects may match the intensity with the background, which will make the threshold segmentation ineffective.

The simplest and at the same time, the frequently used type of threshold segmentation is binary segmentation, when only two types of homogeneous segments are allocated on the image.

In this case, the conversion of each point of the source image to the output is performed according to the rule:

where X0 is the only processing parameter called the threshold. The levels of the output brightness Y0 and Y1 may be arbitrary, they only perform the functions of the labels, with which the markup of the received card is carried out - the attribution of its points to classes K1 or K2, respectively. If the formed preparation is prepared for visual perception, then they often correspond to the levels of black and white. If there are more than two classes, then a family of thresholds separating the brightness of various classes from each other should be specified during the threshold processing.

The threshold segmentation is well suited to highlight a small number of non-intersecting objects having a homogeneous structure and sharply released from the background. With an increase in the degree of heterogeneity of the image, which means the number of segments and their complexity, this type of segmentation becomes ineffective.

Segmentation based on partitioning graph

Methods of the theory of graphs are one of the most actively developing directions in image segmentation.

The overall idea of \u200b\u200bthe methods of this group is as follows. The image is presented in the form of a suspended graph, with vertices at images of the image. The weight of the edge of the graph reflects the similarity of the points in a sense (the distance between points by some metric). The separation of the image is modeled by cuts of the graph.

Usually in the methods of the theory of graphs, the functionality of the "cost" of the cut, reflecting the quality of the segmentation obtained, is introduced. So the task of splitting the image into homogeneous areas is reduced to the optimization task of searching for a minimum cost per column. Such an approach allows the color and texture of the segments in addition to the colors and textures of the segment form, their size, the complexity of the boundaries, etc.

To search for a minimum cost section, various methods are applied: greedy algorithms (at each step such a rib is selected so that the total value of the cut is minimal), the methods of dynamic programming (it is guaranteed that, choosing the optimal edge at each step, we will result in the optimal path), algorithm Daeksters, etc.

Interpolation

In computer graphics, the interpolation method is often used in the process of changing the scale of images. By changing the number of image points, interpolation helps to avoid unnecessary pixelization of the picture when it increases or loss of important parts when decreasing.

In the process of interpolation, additional points are inserted between the pixels of the image, the intended tone and the color of which are calculated according to a special algorithm based on the analysis of available data on neighboring regions. Unfortunately, since any interpolation is just an approximation, the image will invariably lose as whenever interpolation is exposed.

Interpolation by the nearest neighbor

This algorithm is the easiest view of interpolation, simply increasing every pixel image to the desired scale. It requires the smallest processing time, but leads to the worst results.

Bilinear interpolation

This type of interpolation is performed for each two-dimensional grid coordinate. The image is considered as a surface, the color is the third dimension. If the image is color, then interpolation is carried out separately for three colors. For each unknown point of the new image, the bilinear interpolation examines the square of the four surrounding known pixels. As an interpolated value, weighted averaging of these four pixels is used. As a result, the image looks much more smooth than the result of the method of the nearest neighbor.

Bilinear interpolation works well with integer values \u200b\u200bof scaling coefficients, but at the same time it is quite blurring sharp boundaries of the image.

Biobubic interpolation takes one step further bilinear, considering an array of 4x4 surrounding pixels - only 16. Since they are at different distances from an unknown pixel, the nearest pixels are obtained by calculating greater weight. Biobubic interpolation produces significantly sharper images than previous two methods, and possibly optimal by the ratio of processing time and output. For this reason, it has become standard for many image editing programs (including Adobe Photoshop), printer drivers and built-in camera interpolation.

Scaled image can be significantly less sharp. Interpolation algorithms, which better retain sharpness, are simultaneously susceptible to Moir, while those that exclude moire, usually give a softer result. Unfortunately, this compromise cannot be avoided when scaling.

One of the best ways to deal with this is to apply immediately after scaling the blur mask, even if the original has already been raised to sharpness.

5.2 Justification of the choice of algorithms used in the subsystem

The main requirement for the developed software package was minimizing the reproduction of the video stream when it is preprocessing on a computing cluster. In addition, shooting can occur in any conditions, which means in a short time it was necessary to implement a large number of simple filters to neutralize various negative effects. In addition, it was necessary in a short time to study a large number of negative factors that appear on video and implement simple filters to neutralize them. Algorithms that satisfy the required requirements should be easily accessible, well optimized, have high reliability and at the same time simplicity. These properties have the functions of the OpenCV library, so when choosing specific methods to implement video streaming filters, the priority was given to algorithms, in one form or otherwise contained in this library.

All algorithms considered in the theoretical part of the exhaust qualification work were implemented in test form in order to compare their characteristics in practice. In particular, preference was resulted in a compromise between the processing rate of the video stream and the quality of the result obtained.

As a result, the following algorithms were selected to implement the video stream processing filters on a computing cluster:

1. To remove the "additive white" noise, a Gauss algorithm was chosen. As the most rational noise reduction method, it is very well optimized and, accordingly, has a high speed of work.

2. To remove the "additive white" noise, a Gauss algorithm was chosen. As the most common noise reduction method, it is very well optimized and, accordingly, has a high speed of work.

3. To remove the "impulse" noise, median filtering was selected. This method is also well optimized, in addition, it was designed specifically to eliminate pulse noise and noise of the type "Salt and Pepper"

4. To increase the sharpness of the image, a convolution was selected, as it works much faster than the unaware masking, at the same time giving acceptable results.

5. OpenCV library does not contain color correction algorithms - therefore it was decided to implement the most common and well-documented Single Scale Retinex algorithm. This method has very high efficiency, but requires optimization to accelerate the speed.

6. As a method, the selection of contours was chosen by the Kenny algorithm, as it gives better results than the Bobe filter.

7. The algorithm of pyramidal segmentation, presented in the OpenCV library works extremely slowly, so it was decided to use the segmentation algorithm considered earlier.

8. Interpolation - a bicubic interpolation method is selected as the most sensible compromise between the speed and quality of the result.

Installing and configuring used software.

The GNU Linux system has been installed on the computing cluster (Ubuntu)

After installing the operating system, you must install multiple libraries that support read and write image files, drawing on the screen, work with video, etc.

CMAKE installation

A project assembly is carried out using CMake (version 2.6 or higher). You can install it with the command:

aPT-Get Install Cmake

The following libraries can also be needed:

build-Essential Libjpeg62-Dev Libtiff4-Dev Libjasper-Dev Libopenexr-Dev Libtbb-Dev Libeigen2-Dev Libfaac-dev Libopencore-AMRNB-DEV Libopencore-AMRWB-DEV LIBOPENCORE-AMRWB-DEV LIBTHEORA-DEV LIBVORBIS-DEV LIBXVIDCORE-DEV

Installing ffmpeg

In order for OpenCV to correctly process video files, you must install the FFMPEG library. This is done by the following teams:

1) downloading original library codes

wget http://ffmpeg.org/releases/ffmpeg-0.7-rc1.tar.gz.

2) Unpacking archive with source codes

tar -xvzf ffmpeg-0.7-rc1.tar.gz

3) Library Configuration

configure --Enable-GPL --Enable-Version3 --enable-Nonfree --enable-postProc

Enable-libfaac --enable-libopencore-amrnb - libeable-libopencore-amrwb

Enable-libtheora --enable-libvorbis --enable-libxvid --enable-x11grab

Enable-Swscale --Enable-Shared

4) Assembly and installation of the library

installing GTK.

To display the OpenCV windows, the installed GTK + 2.x library is required or higher, including header files (LIBGTK2.0-DEV)

aPT-Get Install Libgtk2.0-Dev

Installation OpenCV.

After installing all associated libraries, the OpenCV2.2 installation is performed by the following commands:

1) Download original OpenCV library codes

http://downlaads.sourceforge.net/project/opencvlibrary/opencv-unix/2.2/opencv-2.2.0.tar.bz2.

2) Unpacking an archive with source codes

tAR -XVF OpenCV-2.2.0.tar.bz2

3) Generation Makefile using CMake.

4) assembly and installation of the OpenCV library

5) maybe you may need to register the path to the libraries

export LD_LIBRARY_PATH \u003d / USR / LOCAL / LIB: $ LD_LIBRARY_PATH

Installing and compiling a developed software package

You must copy the source code codes from the disk attached to this explanatory note. In the same folder you need to copy the BUILD_ALL.SH packet file, and then run it. If the GCC compiler is installed in the system, the assembly will automatically.

Digital TREATMENT Signals

Topic 17. Image Processing

There is nothing, for whatever the imagination of man dare.

Tit Lucretia. Roman philosopher and poet. I in. BC e.

Imagination thing is good. But pull out the homemade from the basement, wash, turn into apollo, pack in the match box and send a friend by email a good graphic program will make it better.

Anatoly Pyshmintis, Novosibirsk Geophysician of the Ural School. Xx in.

Introduction

1. Basic concepts. Graphic representation of images. Representation of color in machine graphics. Color model RGB. Color system Cie XYZ.

2. Geometric conversions of raster images. Areas and stages of transformations. Sampling. Interpolation series of recovery of a two-dimensional signal. Frequency distortions of images and their elimination. Empressing images.

3. Image filtering. Linear filters. Smoothing filters. Drawing filters. Different filters. Two-dimensional cyclic convolution. Nonlinear filters. Threshold filtering. Median filtering. Extreme filters.

4. Image compression. Repetition length encoding algorithms (RLE). Word algorithms. Statistical coding algorithms. Compression of images with losses. Rating loss in images. Fourier transformation. Wavelet transformation.

Introduction

Rate studies in the field of digital image processing rapidly increases. This is determined by the fact that image processing is the processing of multidimensional signals, and most of the signals in the real world are multidimensional.


The image in mathematical representation is a two-dimensional signal that is carrying a huge amount of information. Color image of 500 × 500 elements is an array of several hundred thousand bytes. You can process such information only a rational organization of calculations. For specific image processing tasks, you can use effective processing methods, taking into account the features and restrictions of this particular task. But if we talk about processing images to solve a wide class task, you need to select a set of standard operations, from which you can build algorithms to solve arbitrary tasks. These include linear transformations, a two-dimensional convolution and a two-dimensional discrete Fourier transformation.

But when processing images, nonlinear transformations are also widely used. The feature of the images is that individual elements of the image are in a certain connection with adjacent elements. Therefore, most image conversion algorithms are local in nature, i.e., images are treated with groups of elements located in the vicinity around the given one. Linear transformations satisfy the property of locality and allow the construction of algorithms, the computational complexity of which is little depends on the size of the neighborhood covered. The same properties are required from nonlinear image transformations. The class of such transformations includes algorithms that are called rank filtering algorithms based on calculating local rank statistics of images. When calculating rank statistics and derivatives from them, simplifications are possible associated with information redundancy of images. The most famous algorithm of this class is the median filtering algorithm. Other examples of rank algorithms can serve as extreme filtering algorithms that replace the analyzed image element with a maximum or minimum in the surrounding area. Another property of rank algorithms is a local adaptation to the characteristics of the image being processed and the potential possibilities of their use not only for smoothing and cleaning against noise, but also to highlight features with automatic image recognition.

When processing images, methods for processing one-dimensional signals are widely used, if they are generalized to multidimensional signals. At the same time, it is necessary to take into account that mathematical methods for describing multidimensional systems are not distinguished by the completion. Multidimensional systems have a large number of degrees of freedom, and their design acquires flexibility that does not characterize one-dimensional systems. At the same time, multidimensional polynomials do not decompose into simple factors, which complicates the analysis and synthesis of multidimensional systems.

17.1. Basic concepts

Graphic representation of images. For the presentation of graphic information on a two-dimensional plane (monitor screen) two approaches are used: raster and vector.

With a vector approach, graphic information is described as a set of abstract geometric objects - straight, segments, curves, rectangles, etc. Vector description involves a priori knowledge of the image structure.

Raster graphics operate with arbitrary images in the form of rasters. Raster (Raster) is a description of the image on the plane by splitting (sampling) to the same elements on the regular grid and assigning each element of its color and any other attributes. The simplest raster is rectangular, the most economical by the number of samples for the transfer of images is hexagonal. With mathematical positions, the raster is a piecewise constant approximation on the plane of the continuous image function.

The element of the raster is called pixel (PIXEL). Standard Pixel Identification:


f (i, j) \u003d (a (i, j), c (i, j)), (17.1.1)

where A (i, j) is R2 - the pixel area, C (i, j) î C - the pixel attribute (as a rule, color). Most often two types of attributes are used:

C (i, j) \u003d i (i, j) - the intensity (brightness) of the pixel;

C (I, J) \u003d (R (I, J), G (I, J), B (I, J)) - Color attributes in the RGB color model.

In matrix form:

Mij \u200b\u200b\u003d (Aij, Cij).

When sampling continuous images, AIJ values \u200b\u200bcan be determined in two ways, or as the values \u200b\u200bof the points Aij \u003d (i, j), for which the attributes Cij are defined, or as the values \u200b\u200bof the squares Aij \u003d (i, i + 1) × (J, J + 1) or Any other form, with the definition of CIJ on average values \u200b\u200bwithin this form (Fig. 17.1.1).

In practice, as a rule, X and Y are limited sets of non-negative integers of a square or rectangular raster with an aspect ratio (Aspect Ratio) width to a raster height, which is written as, for example, "4: 3".

Representation of color in machine graphics. The concept of color is based on the perception by the eyes of a person of electromagnetic waves in a specific frequency range. The daylight perceived by us has the wavelengths λ from 400 nm (violet) to 700 nm (red). The description of the light flux can be its spectral function I (λ). The light is called monochromatic if its spectrum has only one certain wavelength.

On the retina, there are two types of receptors: sticks and columns. The spectral sensitivity of the sticks (Fig. 17.1.2) is directly proportional to the brightness of the incident light. Columns are divided into three types, each of which has a certain sensitivity in limited ranges with highs to red, green and blue colors, and dramatically lose their sensitivity in the dark. The susceptibility of the eye to the blue color is significantly lower than to two others. An important property of the perception of light by man is a linearity when folding colors with different wavelengths.

Color model RGB. (Red, Green, Blue - Red, Green, Blue) in machine graphics currently is the most common. In this model, the spectral function is represented as the sum of the sensitivity curves for each type of molds with non-negative weight coefficients (with a normalization from 0 to 1), which are denoted - R, G and B. The model is characterized by the additivity property for new colors. For example, encoding spectral functions:

Black: fblack \u003d 0, (r, g, b) \u003d (0,0,0);

Purple color fviolet \u003d fred + fblue, (r, g, b) \u003d (1,0,1);

White FWHITE \u003d FRED + FGREEN + FBLUE, (R, G, B) \u003d (1,1,1).

The three-dimensional color space of the RGB model is shown in Fig. 17.1.3. Due to the peculiarities of the perception of light by receptors, not all colors visible by the person are presented in this model. However, the share of reproducible colors is much larger than the proportion of not representable in this model.

Color system Cie XYZ. International Cie Color Representation Standard (CIE - Commission International De L "EclaiRage) was adopted in 1931 by the International Commission for Lighting, it defines three basic functions ρx (λ), ρy (λ), ρz (λ), depending on the wavelength Linear combinations of which with non-negative coefficients (X, Y and Z) allow you to get all the color-visible colors. These functions take into account the relative perception of the light intensity of the eye receptors. In three-dimensional space, the color system CIE forms a cone in the first quadrant and is used for high-quality color image display.

17.2. Geometric conversions of raster images

Areas and stages of transformations. Images can be divided into textural and detailed. In texture images, all samples (items) carry information (image on the TV screen). A detailed image is called the image on which you can select interfering objects, background and useful objects.

There are three main groups of image processing algorithms on computers:

1. Primary (preliminary) image processing for the purpose of restoration, cleaning from random noise, improve quality, correction of geometric distortions of optical systems (defocus, aberration, etc.).

2. Image description, image recognition. It is performed to determine the parameters of the image details and includes: finding homogeneous by the level of illumination and color of the image regions, selection of features of the image form, determination of the coordinates of the special points of objects, etc.

3. Effective coding to reduce the volume during transmission and storage.

Most primary processing methods are based on the use of linear spatially invariant (LPI) filters. Linear algorithms are performed using two-dimensional analogues of one-dimensional and biix filters. They can be applied, for example, when implementing filters to reduce noise levels on images.

Filters are implemented by a sweeter method. The advantage of two-dimensional filters is visibility, simplicity and absolute stability. Bih filters are implemented using difference equations and Z-transformations. They are more speed-speed compared to the filters, but may be unstable. The synthesis of two-dimensional bih filters differs from the synthesis of one-dimensional, since for a two-dimensional function explicitly, the poles are not detected.

Nonlinear methods may be required to restore images and improve their quality. For example, to suppress the noise and at the same time maintain the contour part of the images, it is necessary to use nonlinear or linear spatially non-invariant (LPNI) filters that are implemented by rank algorithms. All rank nonlinear filters are based on fast algorithms for calculating local histograms.

One of these methods is median filtering. The use of median filters is effective to suppress certain types of noise and periodic interference without simultaneous signal distortion, for example, to suppress noise emission packs, including lines. The method can also be used in solving tasks associated with recognition, for example, to highlight thin lines and small isolated objects.

Algorithms for describing images and image recognition, as a rule, are nonlinear and are heuristic. Signs of objects are usually an object of an object image, an image circuit perimeter, an area ratio to a square of an image perimeter. The form of the object can be characterized by the radius inscribed in the image or described around the image of the object of the circumference, the length of the minimum and maximum radius-vector from the "center of mass" of the image.

Sampling. Image conversion in the computer and storing the processed data are performed in a discrete form. To obtain a discrete view of the continuous analog images of the real world, sampling is applied (sample). Almost it is carried out input devices (digital camera, scanner or other). For visual perception of processed images on output devices (display, plotter, etc.), an analog image is reconstructed by its discretized representation.

In the simplest case of black and white images, we have a two-dimensional SA array (x, y). For color images in the RGB model, given the property of additivity when colors are added, each layer R, G and B can also be considered and processed as a two-dimensional array, followed by summing the results.

From the methods of generalizing one-dimensional periodic sampling on a two-dimensional case, periodic sampling in rectangular coordinates is the simplest one:

s (n, m) \u003d SA (NDX, MDY),

where DX and DY is the horizontal and vertical sampling intervals of the two-dimensional continuous SA (X, Y) signal with continuous X and Y coordinates. Below the values \u200b\u200bof Dx and Dy, as in the one-dimensional case, are taken equal to 1.

Discretization of the two-dimensional signal also leads to the periodization of its spectrum and vice versa. The condition of the information equivalence of the coordinate and frequency representations of the discrete signal is maintained with an equal number of sampling points in the main ranges of the signal. For rectangular sampling, the direct and reverse Fourier transform is determined by expressions:

S (k, l) \u003d s (n, m) exp (-jn2pk / n-jm2pl / m), (17.2.1)

S (k, l) \u003d exp (-jn2pk / n) s (n, m) exp (-jm2pl / m), (17.2.1 ")

s (n, m) \u003d s (k, l) exp (-jn2pk / n-jm2pl / m). (17.2.2)

s (n, m) \u003d exp (-jn2pk / n) s (k, l) exp (-jm2pl / m). (17.2.2 ")

Fig. 17.2.1. Periodization of the spectrum.

These expressions show that a two-dimensional DPT for a rectangular data sampling raster can be calculated using one-dimensional consecutive DPF. The second sums of expressions (17.2.1 ") and (17.2.2") are one-dimensional DPF sections of the functions S (N, M) and S (k, L) along the lines N and K, respectively, and the first - one-dimensional DPF calculated functions in sections on m and l. In other words, the initial matrices of the values \u200b\u200bof S (N, M) and S (K, L) are counted first into intermediate matrices with DPF on strings (or by columns), and intermediate - in the final with the DPF on columns (or respectively by lines).

In order for the periodic repetition of the spectrum (Fig. 17.2.1) caused by the sampling of an analog signal with the frequency FX \u003d 1 / DX and FY \u003d 1 / DY, did not change the spectrum in the main frequency range (with respect to the source analog signal), it is necessary And it is enough that the maximum frequency components of FMAX in the analog signal spectrum in both rows and columns are not exceeded by Nyquist frequency (FMAX. X £ Fn \u003d FX / 2, FMAX. Y £ FX \u003d FY / 2). This means that the sampling frequency of the signal must be at least twice the maximum frequency component in the signal spectrum:

FX ³ 2Fmax. X, FY 2FMAX. y, (17.2.3)

what provides the output of spectral functions to zero values \u200b\u200bat the ends of the main spectrum range.

Interpolation series of recovery of a two-dimensional signal. If the continuous SA (X, Y) signal is a limited spectrum signal, and the discretization periods are selected sufficiently small and the adjacent periods are not overlapped:

SA (WX, WY) \u003d 0 with | WX | P / DX, | WY | P / DX,

the way, as in the one-dimensional case, signal SA (X, Y) can be restored by a discrete signal using a two-dimensional analogue of Kotelnikov-Shannon's row:

sA (X, Y) \u003d SN SM S (N, M) . (17.2.4)

Frequency distortions of images and their elimination. The signal with an unlimited spectrum can also be sampled, but in this case there is an imposition of spectra in adjacent periods, while high frequencies, the high frequencies of Nyquist, will "disguise", as in the one-dimensional case, under the low frequencies of the main period. The effect of "reflections" from the boundaries of the period gives an even more complex picture due to the interference of frequencies reflected in different coordinates. A similar effect known as aliasing (Aliasing) will also be observed with an insufficient frequency of image sampling. Especially clearly this effect can be observed on sharp contrasting changes in brightness.

To combat similar phenomena, pre-filtration (anti-aliasing) is used - a pre-convolution of an analog image with a watering function that cuts off high-frequency components that can lead to aliasing. In a two-dimensional case, filtering is described as follows:

z (x, y) \u003d h (x ", y") ③③ s (x-x ", y-y"). (17.2.5)

It should be noted that the analog images exist only in the optical range, for example, in the form of a light display of the screen, a photographic paper or a film, but cannot exist in the computer's memory. Therefore, the physical execution of prefiltration is possible only when registering the image by defocusing it, which is usually not applied. Primary information should always be registered with maximum fullness and accuracy, and cleaning primary information from unnecessary details and redundancy is the case of subsequent data processing. Therefore, in relation to equation 17.2.5, two-dimensional prefiltration, in its practical design, can only be filtered by images, discretened with a large margin along the main frequency range (with excessive resolution), and is used, as a rule, when crossing the larger step, For example, when compressing images. Prefiltration can also be embedded in image constructing algorithms.

In fig. 17.2.3 and below, in Table 17.2.1, examples of the most common one-dimensional filters for antialsing are given. They can also be performed in the form of analog filters, and apply, for example, when transmitting television strings of images in analog form by radio channels (horizontal anti-aliasing). In principle, a similar operation can be performed on columns (double-image), and after summing up the image, the antialsing operation will be fulfilled, but this method relates more to the field of special scientific research.

Table 17.2.1.

Major weight functions

Temporary window

Weight function

Fourier-thought

Natural (P)

P (T) \u003d 1, | T | £ T; P (T) \u003d 0, | T |\u003e T

P (W) \u003d 2T SINC

Bartletta (D)

B (W) \u003d T SINC2 (WT / 2).

Henning, Ganna

p (T) \u003d 0.5

0.5P (W) + 0.25P (W + P / T) + 0.25P (W-P / T)

Hemming

p (T) \u003d 0.54 + 0.46 COS (PT / T)

0.54P (W) + 0.23P (W + P / T) + 0.23P (W-P / T)

Carre (2nd window)

p (T) \u003d B (T) SINC (PT / T)

t · b (w) * n (w), p (w) \u003d 1 with | w |

Laplas Gaussa

p (T) \u003d EXP [-B2 (T / T) 2/2]

[(T / B) EXP (-T2W2 / (2B2))] ③ P (W)

Two-dimensional analogs of one-dimensional filters F1 (X) are built in two symmetry options: or as a function from radius:

f2 (x, y) \u003d f1 (),

or how to work:

f2 (X, Y) \u003d F1 (X) × F1 (Y).

The first option is more correct, but the second has the properties of separable, i.e., a two-dimensional convolution can be performed by two one-dimensional convolutions in series by lines from F1 (X) and in columns with F1 (Y).

Pescript image or Resampling is a change in the digital signal sampling frequency. With regard to digital images, this means a change in image size.

There are various resampling algorithms of images. For example, to increase the image by 2 times using the Bilinear Interpolation (BILINEAR INTERPOLATION), the intermediate columns and lines are obtained by a linear interpolation of adjusting columns and rows. You can obtain each point of the new image as a weighted amount of a larger number of source points (bicubic and other types of interpolation). The highest quality resopping is obtained using algorithms that take into account not only the temporary, but also frequency range of the signal.

Consider the resampling algorithm with the maximum preservation of the image frequency information. The operation of the algorithm will be considered on one-dimensional signals, as the two-dimensional image can first stretch or squeeze horizontally (by line) and then - vertically (by columns), and reduce the resopping of a two-dimensional image to resampling one-dimensional signals.

Suppose we have a one-dimensional signal (Fig. 17.2.4) specified on the 0-T interval and sampled with a pitch dt \u003d 1 (n intervals). You need to "stretch" the signal in M \u200b\u200btimes. The spectrum of the signal shown in the figure is calculated by the rapid transformation of Fourier (BPF, the number of spectrum samples is equal to the number of signal samples) and is given in the main BPF range (0-2p, the frequency of Nyquist WN \u003d P / DT \u003d P, or 0.5N by numbering spectrum samples When step on the spectrum DF \u003d 1 / T or DW \u003d 2P / T). To perform stretching, you must perform 2 steps.

The first step is the interpolation zeros, which increases the length of the signal in M \u200b\u200btimes. (Fig. 17.2.5). You need to multiply all the counts of the source signal on M, and then after each countdown of the signal insert M-1 zero value. On the 0-T interval, the value of which remains unchanged is now located in M-times more sampling intervals (Mn), and the new sampling step will be equal to DX \u003d DT / M. Accordingly, the new Nyquist frequency for this signal is equal to MP / DT \u003d MP. But the physical value of the step on the spectrum in the frequency units is inverse the physical size of the signal task interval (DF \u003d 1 / T), and, consequently, the BPF according to the signal points of the signal will calculate the Mn spectrum points in the main BPF 0-2PM switch in the initial signal spectrum step in the original signal which will be present M-periods of the source spectrum (one main and M-1 side).

The second step is the filtering of the side ranges of the spectrum using the LB filter or in the temporary, or in the spectral region. In fig. 17.2.6 Cleaning the spectrum and reverse Fourier transform, resulting in a signal in M \u200b\u200btimes longer than the original with full preservation of all frequency information.

According to a similar principle, the compression algorithm (decimation) of the signal in N times can be constructed, and the order of steps is changed to the opposite. When compressing the signal, an increase in the sampling step is performed and, accordingly, the reduction in the frequency of Nyquist, while the sliced \u200b\u200bhigh frequencies (noises and insignificant high-frequency parts of the signal spectrum) will be reflected from the boundary of the main range and consuming with the basic information, creating distortion. To exclude this phenomenon, we first carry out low-frequency filtering of the signal with a slice frequency equal to the new Nyquist frequency (anti-aliasing), and only then decimating the signal by thinning.

When performing resampling, only in the time domain, the stretching and compression algorithms are combined, as a rule, into a single serial process with a set of resolution step in the form of a M / N ratio, which allows you to set the integer values \u200b\u200bM and N under fractional values \u200b\u200bof the sampling step. This greatly simplifies the algorithms and improves the efficiency and quality of their work. For example, when tensile a signal at 1.5 times at m / n \u003d 3/2, the signal is first stretched by 3 times (a simple and uniform supplement with zeros of all samples, then the LF filtering is performed, after which the signal is being cut down twice. The anti-aliasing filter is not required , T. K. The frequency of its cut is overlapped by the frequency of the first LF filter. With a reverse compression operation (for example, M / n \u003d 2/3), only an anti-aliasing filter is used.

17.3. Image filtering

Under the filtration of images, an operation that has its result is a picture of the same size obtained from the original according to some rules. Usually, the intensity (color) of each pixel of the resulting image is due to the intensities (colors) of pixels located in some of its surroundings in the source image.

Filtration rules can be the most diverse. Image filtering is one of the most fundamental operations of computer vision, image recognition and image processing. With one or another filtering of the source images begins the work of the overwhelming majority of image processing methods.

Linear filters Have a very simple mathematical description. We assume that the original halftone image A is specified, and denote the intensity of its pixels a (x, y). The linear filter is determined by the real-valued function H (the filter core) specified on the raster. Filtration itself is performed using a discrete convolution operation (weighted summation):

B (x, y) \u003d h (i, j) ③③a (x, y) \u003d h (i, j) a (x - i, y-j). (17.3.1)

The result is the image B. Typically, the filter core is different from zero only in some neighborhood N point (0, 0). Outside this neighborhood H (I, J) is zero, or very close to it and they can be neglected. The summation is made by (i, j) î n, and the value of each pixel B (x, y) is determined by the pixels of the image A, which lie in the window N, centered at the point (x, y) (the designation - the set N (x, y) ). The core of the filter specified on the rectangular neighborhood n can be considered as a matrix M on n, where the lengths of the parties are odd numbers. When you specify the kernel, the matrix should be centered. If the pixel (x, y) is located in the neighborhood of the edges of the image, the coordinates A (X - I, Y-J) for certain (i, j) can correspond to non-existent pixels A outside the image. This problem can be resolved in several ways.

Do not make filtering for such pixels by cutting an image B along the edges, or applying source values \u200b\u200bfor their values \u200b\u200bA.

Do not include the missing pixel in summation by distributing its weight h (i, j) evenly among other pixels of the neighborhood N (x, y).

Conduct the values \u200b\u200bof the pixels beyond the image borders with extrapolation.

Conduct the pixel values \u200b\u200boutside the image boundaries, using a mirror continuation of the image.

The choice of method is made taking into account the specific filter and features of the image.

Smoothing filters. The simplest rectangular smoothing radius filter R is set using the size matrix (2R + 1) × (2R + 1), all values \u200b\u200bof which are 1 / (2R + 1) 2, and the sum of values \u200b\u200bis equal to one. This is a two-dimensional analogue of a low-frequency one-dimensional P-shaped filter of the moving average. When filtering with such a nucleus, the pixel value is replaced by averaged value of pixels in a square with a side of 2R + 1 around it. Example Mask Filter 3 × 3:

.

One of the filter applications is noise reduction. The noise changes independently of the pixel to the pixel and, provided that the mathematical expectation of the noise is zero, the noises of neighboring pixels will compensate for each other. The greater the filtration window, the less there will be averaged intensity of noise, but the corresponding blur of the meaningful parts of the image will occur. The white point image on a black background when filtering (reaction to a single pulse) will be even gray square.

Noise cancellation with a rectangular filter has a significant drawback: all pixels in the filter mask at any distance from the treated effect on the result the same effect. A somewhat better result is obtained by modifying the filter with an increase in the weight of the center point:

.

More efficient noise reduction can be carried out if the effect of pixels on the result will decrease with increasing distance from the processed. This property has a Gaussian filter with a kernel: H (i, j) \u003d (1/2 2ps2) EXP (- (I2 + J2) / 2S2). Gaussian filter has a nonzero core of an infinite size. However, the values \u200b\u200bof the filter kernel decreases very quickly to H), and therefore in practice it is possible to limit the convolution with a small window around (0, 0), for example, taking the window radius equal to 3σ.

Gaussian filtration is also smoothing. However, in contrast to a rectangular filter, the points for gaussian filtration will be a symmetrical blurred spot, with a decrease of brightness from the middle to the edges. The degree of image blur is determined by the parameter σ.

Drawing filters . If smoothing filters reduce the local contrast of the image, blurring it, then the contrasting filters produce the opposite effect and, essentially, are high spatial frequency filters. The core of the contrasting filter at point (0, 0) matters, greater than 1, with the total amount of values \u200b\u200bof 1. For example, the contrasting filters are filters with a kernel defined by matrices:

. .

An example of using the filter is shown in Fig. 17.3.1. The effect of increasing contrast is achieved due to the fact that the filter emphasizes the difference between the intensities of neighboring pixels, removing these intensities from each other. This effect will be the stronger than the value of the central member of the kernel. The characteristic artifact of linear contrasting filtration is noticeable bright and less noticeable dark halisses around the borders.

Different filters - These are linear filters specified by discrete approximations of differential operators (by the end difference method). These filters play a crucial role in many applications, for example, for the tasks of searching borders in the image.

The simplest differential operator is the capture of the D / DX X coordinate derivative, which is defined for continuous functions. Common options for similar operators for discrete images are Pruit filters and Bobel (Sobel):

. .

The filters approaching the operator derivative according to the Y-coordinate D / DY are obtained by transposing matrices.

The simplest algorithm for calculating the norm of the gradient for three adjacent points:

G (x, y) \u003d .

A simplified computing formula is also used:

Calculation of the gradient rate of four adjacent points (Roberts operator):

In the storage algorithm, eight brightness samples in the vicinity of the center point are used:

G (x, y) \u003d , G (x, y) @ ,

GXX, Y \u003d -,

GYX, Y \u003d -.

Along with a more accurate determination of the gradient rate, the Bobe algorithm allows you to determine the direction of the gradient vector in the image analysis plane in the form of an angle j between the gradient vector and the direction of the matrix rows:

j (x, y) \u003d argtg (gyx, y / gxx, y).

In contrast to smoothing and contrasting filters that do not change the average image intensity, as a result of the use of difference operators, it turns out, as a rule, the image with a medium pixel value is close to zero. The vertical drops (boundaries) of the source image correspond to pixels with large modulo module by values \u200b\u200bon the resulting image. Therefore, difference filters also refer to the filters for the separation of objects.

Similar to the above filters, by the end difference method, you can make filters for other differential operators. In particular, important for many applications Differential Laplace operator (Laplacian) d \u003d 𝝏2 / 𝝏x2 + 𝝏2 / 𝝏Y2 can be closed for discrete images with a filter with a matrix (one of the options):

.

As can be seen in fig. 17.3.2, as a result of the use of discrete Laplacian, the large-module values \u200b\u200bcorrespond to both vertical and horizontal brightness drops. The filter is thus a filter that is the boundaries of any orientation. Finding borders in the image can be done by applying this filter and take all pixels, the module of the value of which exceeds some threshold.

However, such an algorithm has significant disadvantages. The main one is uncertainty in choosing the magnitude of the threshold. For different parts of the image, an acceptable result is usually obtained with substantially different threshold values. In addition, the difference filters are very sensitive to the noise of the image.

Two-dimensional cyclic convolution. As with one-dimensional signals, a two-dimensional convolution can be performed in the spatial frequency area using the Fourier Fast Conversion Algorithms and multiplying the two-dimensional image spectra and the filter kernel. It is also cyclic, and is usually performed in the sliding version. Taking into account the cyclicity, to calculate the constant kernel spectrum template, the sizes of the kernel filter mask are doubled over the axes and are supplemented with zeros, and the same mask dimensions are used to highlight the window sliding along the image, within which the BPF is performed. The implementation of the Filter using the BPF is particularly effective if the filter has a large support area.

Nonlinear filters . In digital image processing, nonlinear algorithms based on rank statistics are widely used to restore images damaged by various noise models. They allow you to avoid additional image distortion when removing noise, as well as to significantly improve the results of the filters on the images with a high degree of noise.

We introduce the concept of the M-neighborhood of the image element A (x, y), which is central for this neighborhood. In the simplest case, the M-neighborhood contains N-pixels - points falling into a filter mask, including (or not including) central. The values \u200b\u200bof these N-elements can be positioned in the variation rod V (R), ranked ascending (or descending), and calculate certain moments of this series, for example, the average Mn brightness value and DN dispersion. Calculating the output filter value to which the central count is replaced by the formula:

B (x, y) \u003d aa (x, y) + (1-a) Mn. (17.3.2)

The value of the coefficient A \u003d is associated with a certain dependence with sample statistics in the filter window, for example:

a \u003d dn / (dn + k ds), (17.3.3)

where DS is the dispersion of noise in the image as a whole or by the S-neighborhood at S\u003e M and Mîs, k - the confidence constant of the dispersion of the S-neighborhood. As follows from this formula, at k \u003d 1 and dn, "DS takes place a" 0.5, and the value B (x, y) \u003d (A (X, Y) + Mn) / 2, i.e. it is equally From the values \u200b\u200bof the central reference and the average value of the pixels of its M-neighborhood. With an increase in the values \u200b\u200bof the DN, an increase in the contribution to the result of the value of the central reference, with a decrease, the values \u200b\u200bof the Mn is reduced. The weight gain of the average values \u200b\u200bof the M-neighborhood can be changed by the value of the coefficient k.

The selection of the statistical function and the nature of the dependence of the coefficient A can be quite diverse (for example, according to the variance of sample differences in the M-neighborhood with a central sampling), and depends on both the size of the filter aperture and the character of the images and noise. Essentially, the value of the coefficient A should set the degree of damage of the central reference and, accordingly, the borrowing function to correct it counts from the M-neighborhood.

The easiest and most common types of nonlinear filters for image processing are threshold and median filters.

Threshold filtration Set, for example, as follows:

B (x, y) \u003d

Value p. is a filtering threshold. If the value of the central point of the filter exceeds the average value of the samples of Mn in its M-neighborhood by the threshold value, then it is replaced by a medium value. The value of the threshold can be both a constant and functionally dependent on the value of the central point.

Median filtration Determined as follows:

B (x, y) \u003d med (m (x, y)),

i.e., the result of filtration is the median value of the neighborhood pixels, the form of which is determined by the filter mask. Median filtering is able to effectively remove from the image of the interference, independently affecting separate pixels. For example, such interferences are "broken" pixels with digital shooting, "snow" noise when part of the pixels is replaced by pixels with maximum intensity, etc. The advantage of median filtering is that the "hot" pixel on a dark background will be replaced Dark, not "smeared" in the surrounding area.

Median filtering has a pronounced selectivity with respect to the elements of an array, which is a non-monotonic component of the sequence of numbers within the filter aperture. At the same time, the monotonous component of the sequence, the median filter leaves unchanged. Thanks to this feature, median filters with optimally selected aperture are saved without distortion sharp boundaries of objects, suppressing uncorrelated or weakly correlated interference and small-sized parts.

Filters extremes Defined by the rules:

Bmin (x, y) \u003d min (m (x, y)),

Bmax (x, y) \u003d max (m (x, y)),

i.e. the filtering result is the minimum and maximum pixel values \u200b\u200bin the filter mask. Apply such filters, as a rule, for binary images.

17.4. Compression of images

A typical image with a resolution of about 3000 × 2000 at 24 bits per pixel for color transmission has a volume of 17 megabytes. For professional devices, the size of the resulting image raster can be significantly larger, the color depth is up to 48 bits per pixel, and the size of one image can be more than 200 megabytes. Therefore, image compression algorithms are very relevant to reduce the amount of data representing the image.

There are two main class algorithms:

1. Lossless compression (lossless compression), if there is a reverse algorithm A-1, which for any H - image A [H] \u003d H1 we have a-1 \u003d h. Compression without loss is used in graphic formats of representation of images, such as: GIF, PCX, PNG, TGA, TIFF, and is used in the processing of highly valuable primary information (medical images, aero- and space, etc.), when even the slightest distortion undesirable

2. Suspension with losses (Lossy Compression), if it does not allow to accurately restore the original image. Pair to a algorithm for exemplary image recovery will be denoted as a *. The pair (A, A *) is selected so as to provide large compression ratios while maintaining visual quality. Compression with losses is applied in graphic formats: JPEG, JPEG2000, etc.

All algorithms and statements refer both to images and arbitrary sequences whose elements can receive a final number of values. It should be borne in mind that there are no ideal algorithms compressing without loss any data set.

Repetition length coding algorithms (RLE) They are based on a simple principle: replacing the repeated groups of elements of the initial sequence to a pair (number, item), or only by quantity.

Bit level. We will consider the initial data at the bits sequence level, for example, representing a black and white image. In a row, several 0 or 1 usually goes, and it is possible to encode the number of suitable numbers in a row. But the number of repetitions should also be encoded by bits. It can be assumed that each number of repetitions varies from 0 to 7 (3-bit code), alternating sequence of units and zeros codes. For example, the sequence can be compared to the number 7 0 4, i.e. 7 units, 0 zeros, 4 units, and we have a new year - the larger the length of the sequences of the same bit, the greater the effect. Thus, a sequence of 21 units, 21 zero, 3 units and 7 zeros are encoded as follows:, i.e., from the initial sequence of 51 bits long, we have a sequence of 36 bits long.

Byte level. Suppose that halftone is applied to the input, where 1 byte is given to the pixel intensity value, while waiting for the long chain of the same bits is significantly reduced.

We will break the input stream to bytes (code from 0 to 255) and encode repeated bytes by a pair (number, letter). Single byte can not be changed. Thus, AabbBCDAA bytes encoding (2a) (3b) (C) (D) (2A).

However, the modifications of this algorithm are rarely used by themselves (for example, in PCX format), since the sequence subclass, on which the algorithm is effective, relative to narrow. More often they are used as one of the stages of the compression conveyor.

Word algorithms Instead of coding, only one element of the incoming sequence is made encoding the chain of elements. In this case, a dictionary of chains (created by input sequence) is used to coding new ones.

The LZ77 algorithm was one of the first to use the dictionary. As a dictionary, n last already encoded elements of the sequence are used. In the process of compressing the dictionary-subsection "Slip" by incoming sequence. The chain of the outlet elements is encoded as follows: The position of the coincident part of the elements treated in the dictionary is a shift (relative to the current position), the length, the first element following the coincided part of the chain. The length of the coincidence chain is limited from above N. Accordingly, the task is to find the largest chain from the dictionary that coincides with the treated sequence. If there are no coincidences, the zero offset, the unit length and the first element of the inexpected sequence are recorded.

The coding scheme described above leads to the concept of a sliding window (Sliding Window), consisting of two parts:

Subsequence of already encoded elements of length N-dictionary - search buffer (Search Buffer);

The length of the length N from the chain of the elements for which the search for a coincidence will be attempted - the preview buffer (Look-Ahead Buffer).

Decoding a compressed sequence is decoding recorded codes: each record is mapped to the chain from the dictionary and explicitly recorded element, after which the dictionary shift is made. The dictionary is reconstructed as decoding algorithm.

This algorithm is the investigator of a whole family of algorithms. Its advantages include a decent degree of compression on sufficiently large sequences and quick unpacking. The disadvantages include slow compression speed and smaller than that of alternative algorithms, the degree of compression.

LZW algorithm. The dictionary in this algorithm is a table that is filled with chains of elements as the algorithm is working. In the process of compression, the longest chain has been found, already recorded in the dictionary. Each time the new chain of the elements is not found in the dictionary, it is added to the dictionary, and the chain code is recorded. In the theory on the size of the table, restrictions are not superimposed, but the size limit allows to improve the degree of compression, since unnecessary (not found) chains accumulate. The more occurrences there is a table, the more information needs to be allocated for storage of codes.

Decoding lies in direct decoding of codes, i.e. in building a dictionary and output of the corresponding chains. The dictionary is initialized in the same way as in the encoder. The advantages of the algorithm include a high degree of compression and sufficiently high speed, both compression and decoding.

Statistical coding algorithms They put in accordance with each element of the sequence code so that its length corresponds to the probability of the appearance of the element. Compression occurs due to the replacement of the elements of the original sequence having the same lengths (each element occupies the same amount of bits), on elements of different lengths, proportional to the negative logarithm from the likelihood, that is, elements occurring more often than the rest have a smaller length code.

The Huffman algorithm uses the prefix code of a variable length, which has a special property: Less short codes do not coincide with the prefix (initial part) are longer. Such code allows you to carry out mutually unambiguous coding. The compression process consists in replacing each element of the input sequence by its code. Building a set of codes is usually carried out using the so-called code trees.

Algorithm Haffeman is two-ways. The first passage in the image creates a table of scales of elements, and during the second, coding occurs. There are an algorithm to be implemented with a fixed table. It often happens that a priori probability distribution of the elements of the alphabet is unknown, since the whole sequence is not available immediately, while the adaptive modifications of the Huffman algorithm are used.

Compression of images with losses. The amount of information needed for storing images is usually great. Classic algorithms, being general purpose algorithms, do not take into account that the compressible information is an image - a two-dimensional object, and do not provide sufficient compression.

Compression with losses is based on the peculiarities of the human perception of the image: the greatest sensitivity in a certain color wave range, the ability to perceive the image as a whole, without noticing small distortion. The main class of images on which loss-loss algorithms are oriented - photos, images with smooth color transitions.

Rating loss in images. There are many measures to assess losses in images after their recovery (decoding) from compressed, however, you can choose such two images that their dimensions will be quite large, but the differences will be almost imperceptible to the eye. Conversely - you can choose images that greatly differ on the eyes, but having a small measure of the differences.

A standard numerical dimension of losses is usually a standard deviation (SCA) of the pixel values \u200b\u200bof the recovered image from the source. Nevertheless, the most important "measure" of the loss estimate is the opinion of the observer. The smaller the differences (and better - their absence) discovers an observer, the higher the quality of the compression algorithm. Losses Compression Algorithms often provide the user with the ability to choose the volume of "lost" data, i.e. the right to choose between the quality and size of the compressed image. Naturally, the better the visual quality with a larger compression ratio, the algorithm is better.

Fourier transformation. In general, the image can be viewed as a function of two variables defined at the points of the end raster. Many of these functions on the points of the fixed finite raster form a finite-dimensional Euclidean space, and a discrete Fourier transformation can be applied to them, that is, the spectral representation of the image. It provides:

The non-corrosionism and independence of the spectrum coefficients, i.e. the accuracy of the representation of one coefficient does not depend on any other.

- Energy Compaction (Energy Compaction). Conversion saves basic information in a small number of coefficients. This property is most pronounced on photorealistic images.

The spectral representation coefficients are amplitudes of the image frequency. In the case of images with smooth transitions, most of the information is contained in a low-frequency spectrum.

The compression algorithm used in JPEG format is built on the use of discrete cosine Fourier transform. The compression scheme in the algorithm is a conveyor, where this transformation is only one of the stages, but one of the main. The algorithm contains the following basic operations:

1. Transfer to the color space YCBCR. Here y - brightness component, CB and CR - chroma components. The human eye is more sensitive to brightness than color. Therefore, it is more important to maintain a greater accuracy when transmitting Y than when transmitting CB and CR.

2. Discrete cosine transformation (DKP). The image is broken down into blocks 8 × 8. The discrete cosine-conversion is used to each block (separately for the Y, CB and CR component).

3. Reduction of high-frequency components in DCP matrices. The human eye practically does not notice changes in high-frequency components, therefore, the coefficients responsible for high frequencies can be stored with less accuracy.

4. Zigzag-ordering of matrices. This is a special passage of the matrix to obtain a one-dimensional sequence. At first it is the element T00, then T01, T10, T1. For typical photorealistic images, we will first go nonzero coefficients corresponding to low-frequency components, and then a plurality of zeros (high-frequency components).

5. Compression first by the RLE method, and then by the Huffman method.

The image recovery algorithm acts in the reverse order. The degree of compression from 5 to 100 or more times. At the same time, visual quality for most photorealistic images remains at a good level when compressed up to 15 times. The algorithm and format are the most common for transferring and storing full-color images.

Wavelet transformation Signals is a generalization of the classic Fourier transformation. The term "Wavelet" (Wavelet) translated from English means "small (short) wave." Wavelets are a generalized name of families of mathematical functions of a certain form, which are local in time and frequency, and in which all functions are obtained from one base through its shifts and stretching along the time axis.

In the algorithms of compression with losses, as a rule, all operations of the compression conveyor are saved with the replacement of the Fourier discrete conversion to the discrete wavelet transformation. Wavelet transforms have a very good frequency-spatial localization and the traditional Fourier transforms are superior to this indicator. At the same time it becomes possible to use stronger quantization, improving the sequence properties for subsequent compression. Image compression algorithms based on this transformation, with the same compression show the best results to preserve image quality.

literature

46. \u200b\u200bAnd others. Fast algorithms in digital image processing. - M.: Radio and Communication, 1984. - 224 p.

47. Ceper image processing. Part 2. Methods and algorithms. - Soros Educational Journal No. 3, 1996.

48., noise cartilage from images based on nonlinear algorithms using rank statistics. - Yaroslavl State University, 2007.

49. Andreev television surveillance systems. Part II. Arithmetico - logical bases and algorithms. Tutorial. - St. Petersburg: St. Petersburg, Guitmo, 2005. - 88C.

51. Introduction to digital signal processing (mathematical foundations). - M.: MSU, Laboratory of Computer Graphics and Multimedia, 2002. - http: // PV. ***** / DSP / DSPCOURSE. PDF, http: // DSP-Book. ***** / DSPCOURSE. DJVU, http: // geogin. ***** / ARHIV / DSP / DSP4.PDF.

1i. and others. Algorithmic bases of raster graphics. - Internet Information Technology University. - http: // www. ***** / GOTO / COURSE / RASTRGRAPH /

2i. Lukin -Electronic systems: lecture ability. ITMO, 2004. - St. Petersburg IFF, 2004. - http: // IFF. ***** / kons / oes / kl. Htm.

On the observed errors and suggestions for the addition: ***** @ *** RU.

Copyright © 2008.Davydov. BUT.V..

1

This article has developed image processing algorithms by intelligent mobile robots based on fuzzy logic and neural networks, providing boundaries in the image using the Babel operator. The essence of the image processing is to bring the source image of the scene to the form that allows you to solve the task of recognizing its objects. The main problems, as well as ways to solve them, are considered with primary preparation of the image to recognize. The pretreatment algorithm is described in detail using fuzzy logic and image binarization process. Built algorithm for fuzzy processing to highlight boundaries in the image using the Bobe operator.

image processing

fuzzy logic

intellectual system

recognition of objects

1. Vesnin E.N., Vetovo A.V., Tsarev V.A. On the issue of the development and use of adaptive optoelectronic technical vision systems // Automation in industry, 2009.- № 11.- P. 48-52.

2. Grishin V.A. Technical vision systems in solving control problems of unmanned aerial vehicles // Sensors and systems, No. 2, 2009.- C. 46-52.

3. Klevalin VA, Polyvanov A.Yu. Digital recognition methods in technical vision systems of industrial robots // Mechatronics, automation, management, 2008, No. 5.- S. 56-56.

4. Mikhailov S.V., Romanov V.V., Zaicin D.A. System of technical vision for the diagnosis of the process of cutting materials // Bulletin of computer and information technologies, 2007, No. 3. - S. 12-19.

5. Semin M.S. An overview of the solution of applied tasks with the help of technical vision systems // http://www.videoscan.ru/page/718#13.

Currently, automatic image processing is one of the most important directions in the field of artificial intelligence and implies the development of robotic complexes that make recognition of images. One of the most effective tools for image recognition is a system built on fuzzy logic and artificial neural networks. In the system of technical vision (CTZ), several methods and algorithms are needed, solving the same task in various ways, while ensuring the necessary indicators for the speed and reliability of identification.

The essence of the hybrid image processing algorithm in the CTZ mobile robotic complexes (MRK) is to bring the source image of the scene to the form that allows you to solve the task of recognizing its objects.

Algorithm for pre-processing of the image with a fuzzy system in the sz

For image processing, fuzzy treatment is a variety of different fuzzy approaches that are understanding, representation, image processing, segments and fuzzy sets. In the process of image recognition, the process of pre-fuzzy image processing is of great importance, as it is precisely from it that the quality of the data depends on the neural network inputs. In the framework of the problem, developed by the algorithm of pre-fuzzy processing, can be represented as the following sequence of steps (Fig. 1): capture images using a webcam; transformation of the resulting color image into the image in gray gradations; Fuzzy image processing.

Fig. 1. Algorithm for pre-fuzzy image processing

Thus, the first step of pre-fuzzy processing is the conversion of the image from the colored in shades of gray. Converting image colors in gray shades is as follows. The entire color palette is presented in the form of a cube, the vertices of which correspond to different colors. The gray scale is located on the cube diagonal connecting the black and white vertices.

To convert the image to shades of gray for each image point, the intensities of the red, green and blue component are highlighted, and the color is transformed according to the following formula:

where - the new value of the color, the intensity of the red component, is the intensity of the green component, and the intensity of the blue component. The output of each algorithm for shades of gray between 0 and 1. To convert images to using only gray, there are some methods. In the method of defining light, the average value between the two most and least significant colors is used :. In the middle method, the average value of all three colors is used: . In the method of determining the brightness, a weighted average value of all three colors, taking into account human perception. So, since the human eye is most susceptible to green, his weight is considered the most important :. The brightness definition method uses image processing software. It is implemented by the function "rGB2Gray »In Matlab environment, and this is often used for computer vision. In the process of pre-fuzzy processing, it has the process of converting images from color (RGB) to shades of gray using the brightness definition method. Further, the image is converted from shades of gray to black and white (Fig. 2).

Fig. 2. The process of converting images from color to shades of gray

Image binarization during pre-processing

The purpose of pre-fuzzy image processing is the formation and subsequent improvement in the image, its binarization and coding (in particular, obtaining the contour representation). The image binarization is the process of converting an image consisting of a gradation of one color (in our case - gray), in the binary image, i.e. An image in which each pixel can have only two colors (in our case it is black and white colors). As a result of such a transformation, the color of the pixel is conditionally considered to be zero or unit, while pixels with a zero value (in this case, it is white pixels) are called the posterior plan, and the pixels with a value equal to one (black) are called the foreground. But the binary image obtained as a result of such a transformation is distorted compared to the original, which is characterized by the appearance of breaks and blur at objects, the occurrence of image noise in homogeneous areas, as well as to the loss of integrity of the structure of objects.

The loss of the object's integrity, as well as the rupture of the object, occurs due to a number of reasons, such as the large unevenness of the lighting of the object or touch (or overlay objects to each other). Special difficulty in processing causes precisely (or touch - as a special case of overlay), because On the one hand, the image of several objects can be interpreted as one object, and on the other hand, algorithms that check the geometric integrity of the object will form gaps in the field of overlay, representing these areas in the form of a rear plan. The complexity of processing is the lack of theoretical solution to the problem of interpretation of the imposition of objects, as part of the information is lost. In implementing algorithms in practice, one of these options is taken as a correct solution - either the intersection is considered to be a continuation of the current object, or the overlay area will be considered the rear.

Threshold processing converts a color or gray image into a black and white image. The threshold transformations occupy a central place in the applied tasks of image segmentation due to intuitive properties and ease of implementation. For each pixel in the image, its intensity level is investigated if its value is above some threshold level, it corresponds to white color. If this is below the setpower, it is installed in black. The threshold level will be between 0 and 255.

Currently, there are a large number of binarization methods. The essence of this conversion of raster images is a comparative analysis of the brightness of the current pixel with a certain threshold value: if the brightness of the current pixel exceeds the threshold value, i.e. , the color of the pixel on the binary image will be white, otherwise the color will be black. The threshold surface is the matrix, the dimension of which corresponds to the dimension of the source image.

In the process of binarization, all methods are divided into two groups on the principle of constructing the threshold surface - these are methods of global and local processing of binarization. In the methods of global binarization processing, the threshold surface is a plane with a constant value of threshold brightness, i.e. The threshold value is calculated on the basis of the histogram analysis of the entire image and is the same for all pixels of the original image. Global threshold treatment has a significant drawback - if the original image has inhomogeneous lighting, areas that are covered worse, are fully classified as the foreground. In the local methods of binarization processing, the threshold value changes for each point on the basis of some signs of the region belonging to some neighborhood of this point. The disadvantage of this kind of transformation is the low speed of the algorithms associated with recalculation of threshold values \u200b\u200bfor each image point.

As a method for solving the task, we will use the method of Burnsen. The method is based on the idea of \u200b\u200bcomparing the brightness level of the converted pixel with the values \u200b\u200bof local averages calculated in its environment. Images pixels are processed alternately by comparing their intensity with average brightness values \u200b\u200bin windows with centers at points (Fig. 3).

Fig. 3. Pixel conversion image

Algorithm for fuzzy processing to highlight boundaries and image segmentation

After converting the image to black and white, the gradient image is obtained using the Babe operator and goes to the inputs of fuzzy image processing (NII) (Fig. 4).

Fuzzy image processing consists of three main steps: image phasection, fuzzy output system at the belonging values \u200b\u200band defasification of images. The main fuzzy image processing is in the middle step (fuzzy output system). After transmitting the image data from the gray level to the phase, fuzzy output system is determined by the values \u200b\u200bof the accessories. Phase printing - Coding of image data and defasification - decoding results that allow processing images with fuzzy methods.

Image - size with gray levels And it can be defined as an array of a fuzzy single-point set (fuzzy sets can be supported with only one point), indicating the value of the belonging of each pixel in relation to pre-image properties (for example - brightness, smoothness, etc.).

(1)

where and - the accessories of the pixel in the designation of fuzzy sets. Determining the values \u200b\u200bof accessories depends on the specific requirements of special use and from the corresponding knowledge base.

The system output for the input system is set to the following formula:

(2)

Fig. 4. Algorithm of fuzzy image processing to highlight boundaries

Application of neural networks to recognize images

The multilayer perceptron is called an artificial neural network consisting of several input nodes forming the inlet layer, one or more computing layers of neurons and one output layer (Fig. 6). In such networks, the signal fed on the inlet layer is transmitted in series directly from the layer to the layer. This type of INS is successfully applied to solve a variety of tasks, in particular for the image recognition task.

The neural network of reverse error distribution consists of several layers of neurons, each neuron of the previous layer is associated with each neuron of the subsequent layer. In such networks, after determining the number of layers and the number of elements of each layer, it is required to calculate the values \u200b\u200bof the scales and network thresholds in such a way that it would minimize the forecast error. This task is solved with a variety of learning algorithms. The essence of these algorithms is to fit the network to learning data. The error of the implemented network will be determined by running all input data and the comparison of the actually obtained values \u200b\u200bat the network output with target values. Then the case differences are summed up into a common, so-called error function that characterizes the overall network error. But more often, as a function of errors, the sum of the squares of errors is taken.

One of the most common algorithms for learning multi-layer neural networks is an error distribution algorithm. In this algorithm, the vector of the error surface gradient is calculated. Then we move to a certain value in the direction of the vector (it will indicate us the direction of the pre-core descent), where the value of the error will already be less. Such sequential promotion will gradually result in minimalization of the error. There is a difficulty with determining the value to be moved. If the step of step is relatively large, it will lead to a similar descent, but there is a chance to "jump over"

the desired point or to leave in the wrong direction, if the surface has a rather complicated form. For example, if the surface is a narrow ravine with steep slopes, the algorithm will be very slowly advanced, jumping from one slope to another. If the step value is small, this will lead to the most optimal direction, but it can significantly increase the number of iterations. To achieve the most optimal result, the amount of step is taken in proportion to the slope steepness with some constant and speed of learning. The selection of this constant is carried out experimentally and depends on the conditions of a particular task.

We introduce the following notation. The matrix of weight coefficients from the inputs to the hidden layer is denoted, and the matrix of the scales connecting the hidden and output layer. For indexes, we will take the following notation: Inputs will be numbered only by the index, the elements of the hidden layer - the index, and the outputs - the index. The number of network inputs is equal to the number of neurons in the hidden layer -, the number of neurons in the output layer is. Let the network study on the sample ,. Then the algorithm for learning a multilayer perceptron will look like this:

Step 1. Network Initialization. Weighing coefficients are assigned small random values, for example, from the range (-0.3, 0.3); The parameter of the accuracy of learning is specified - the learning speed parameter (as a rule, can still decrease in the learning process), the maximum allowable number of iterations.

Step 2. Calculating the current output signal. One of the image sample images is fed to the network input, and the values \u200b\u200bof the outputs of all neurosette neurons are determined.

Step 3. Setting the synoptic scales. Calculate the change in scales for the output layer of the neural network by formulas:

where,. Calculate the change in scales for the hidden layer according to the formulas: where

Step 4. Steps 2-3 are repeated for all training vectors. Training is completed upon reaching each of the training images the value of the error function that does not exceed or after the maximum permissible number of iterations.

In step 2, the vectors from the training sequence are better to impose on an input in random order.

The number of inputs and outputs of the network, as a rule, is dictated by the conditions of the problem, and the size of the hidden layer is experimentally found. Typically, the number of neurons in it is 30-50% of the number of inputs. Too much of the neurons of the hidden layer leads to the fact that the network loses the ability to generalize (it simply thoroughly remembers the elements of the training sample and does not respond to similar samples, which is unacceptable for recognition tasks). If the number of neurons in the hidden layer is too small, the network is simply not able to learn.

Conclusion

The main problems, as well as ways to solve them, are considered with primary preparation of the image to recognize. The pretreatment algorithm is described in detail using fuzzy logic and image binarization process. Built algorithm for fuzzy processing to highlight boundaries in the image using the Bobe operator.

Reviewers:

Gagarina L.G., D.T., Professor, Head of the Department of Informatics and Software Computing Systems, National Research University MIET, Moscow.

Portnov E.M., Dr. N., Professor of the Department "Informatics and Software Computing Systems", Head of the Scientific Research Laboratory "Managing Information Systems" of the National Research University "MIET", Moscow.

Bibliographic reference

Aung Ch.Kh., Tant Z.P., Fedorov A.R., Fedorov P.A. Development of image processing algorithms Intellectual mobile robots based on fuzzy logic and neural networks // Modern problems of science and education. - 2014. - № 6.;
URL: http://science-education.ru/ru/Article/View?id\u003d15579 (Date of handling: 02/01/2020). We bring to your attention the magazines publishing in the publishing house "Academy of Natural Science"

Presentation of images

There are two main types of image representations - vector and raster.

In vector view, the image is described by a set of lines (vectors) that contain coordinates of the initial and endpoints, curvature of lines and other geometric characteristics, the rules for building various areas and color characteristics are also described. In other words, for a raster presentation, the formation of some mathematical model is necessary. Therefore, the vector representation is used mainly when solving problems of image synthesis. Although some image recognition algorithms for their work require that the vector representation that needs to be obtained from the original image.

The raster image is one or more matrices describing the spatial distribution of the characteristics of the image on some Cartesian coordinate grid. In this case, the image is built from a plurality of points and has a raster structure. The main element of the raster representation of the image is a pixel (contraction from the phrase "Picture Elements" - image elements), having coordinates in the raster coordinate system and some attributes (color, brightness, transparency, etc.). The number of pixels by x and y coordinates (horizontally and vertical) sets the resolution (dimension) of the image representation. The color of the pixel is set in depth - the number of bits necessary to specify any color.

Raster images, depending on the methods of setting the color of the pixel and the properties of the original image are divided into:

Binary

Halftone

Palettery

Full color

In the binary representation, the color of the pixel can be either white or black and is encoded by one bit. The image is a matrix. Each element I (i, j) of this matrix matters either 0 or 1, where i is the line number, a - number of the column of the element corresponding to the specified pixel (Fig. 1).

In halftone images, pixels represent the brightness values \u200b\u200bcorresponding to shades of gray. The matrix indices describing the halftone image define the position of the pixel on the raster, and the value of the matrix element

- Specifies its brightness (i, j) (Fig. 2).

Palette images are described by two matrices (Fig. 3). One stores the values \u200b\u200bof indexes that set the appeal to the string of the palette matrix. Matrix palette is a color card. It contains 3 column groups - corresponding to the red "R", green "G" and blue "b" in flowers. They define the color of the corresponding pixel.

The palette is a matrix dimension NC 3, where NC is the number of colors.

Algorithms for pre-processing images

Full-color images are constructed in RGB format and are three matrices R (i, j), G (i, j), b (i, j). The corresponding elements of each matrix contain the values \u200b\u200bof the intensities of red, green and blue colors for the pixel defined by the indexes of matrices. Thus, a full-color image does not have a color card and the color of each pixel is represented by three numbers taken from the corresponding matrices (Fig. 4).

The format of numbers in matrices can be both as a whole and floating point format. The first case refers to the so-called digitized images obtained using various devices - scanners, digital cameras, cameras, etc. It is in such format that information about the images and is stored in standard graphic files.

The second option is used for internal representation of images when processing them. In this case, it is convenient to normalize the intensity data to one range, for example, to the range, and carry out various calculations with floating numbers, and the result then convert to the original integer view. Such a method allows to reduce calculation errors and improve the accuracy of the processing result.

For full-color images, one of the parameters is the maximum number of colors that can be represented in this format. The most commonly used images having 16, 256, 65536 (High Color) and 10.7 million (True Color) colors.

Algorithms for pre-processing images

0 0 0 0 1 1 1 0 0

120 122 125 128 115 117 118

1 0 0 0 1 1 1 1 0

119 121 124 125 128 130 133

1 1 0 0 1 1 0 0 1

122 122 124 123 127 126 128

120 121 123 125 127 125 126

1 1 1 0 1 1 0 0 0

118 110 109 108 108 109 110

0 0 1 0 0 1 0 0 1

Algorithms for pre-processing images

Index matrix

31 15 03 09

Matrix palette

Algorithms for pre-processing images

The full color image can be represented not only in the RGB format, but also with the help of other color systems.

In the HSB system, the color is presented in the following color characteristics: hue - color tone;

Saturation - saturation; BRIGHTNESS - brightness.

It is believed that this color system corresponds to the peculiarities of human perception of color.

In the Lab system, the color is considered as a totality of brightness (Lightness) and two independent color values, which determine the true color of the pixel. Color A - The color component is selected in the range from purple to green. Color B - the second color component is selected from the range from yellow to blue.

There are other color representation systems. Naturally, they are all connected and another can be obtained for one presentation. The variety of color systems is due to the tasks solved with their help. For example, the color correction is more conveniently performed in the LAB system, play the image on the monitor screen in the RGB system, print better,

Algorithms for pre-processing images

using CMYK view. However, in any case, in the processing of images and their recognition, it is operated with a raster representation of images containing one or more matrices.

Classification of pre-processing algorithms

Pre-processing algorithms are subdivided into various groups depending on the classifying feature. All pretreatment algorithms either should be improved in some sense of image quality, or convert it to the species most convenient for subsequent processing.

Algorithms aimed at improving the color transmission of the image are called color correction algorithms. This group also includes algorithms that work with halftones that change their brightness and contrasting characteristics.

Algorithms aimed at the processing of spatial characteristics of images are called algorithms spatial filtering.This group includes interference suppression algorithms, spatial smoothing algorithms and spatial amplification algorithms, algorithms for suppressing and enhancing spatial frequencies.

Algorithms performing geometric operations with the image are called geometric processing algorithms. These include:

Algorithms for pre-processing images

Crop image - Selection from the source image of some part of the rectangular shape;

Changing image size. These algorithms use various interpolation methods that allow either correctly filling the missing pixels in an enlarged image, or to recalculate pixel values \u200b\u200bwhen decreasing an image

Rotate the image. These algorithms turn the source image to the specified angle, correctly recalculating the pixel values \u200b\u200busing various interpolation methods.

Algorithms performing transformations from one color system to another are called coloring algorithms. These include algorithms for converting color images in halftone and binarization algorithms, which translated the original image into binary.

Algorithms allocated on the source image of some areas of different, often informal conditions are called segmentation algorithms. An example of such an algorithm may, for example, serve as an algorithm, which should be allocated in the image of the document of the text and graphics information or the algorithm allocated in in the image of the text area related to certain words.

Algorithms for pre-processing images

Spatial filtering algorithms

The spatial filtering of the image in a mathematical form is a discrete convolution of a discrete image with a certain pulse characteristic of the spatial filter

IF (I, J)

Im (i m, j n) h (m, n), where:

m N11 N N21

If, if matrix of original and filtered images, H matrix pulse filter characteristics,

N 11, N 21 Lower and upper boundaries of the columns of the pulse characteristic, N 12, N 22 left and right boundaries of the series of pulse characteristics.

The pulse characteristic matrix can be obtained by calculating the spatial filter based on the specified parameters. The methods for calculating the spatial filters are devoted to a large number of literature on dedicated digital filtering, for example. For practical calculations, standard mathematical packages can be used, for example, the "MATLAB" system includes the "Image Filter Design" filter calculation system.

Note that filtering can be carried out in the frequency domain. In that

Algorithms for pre-processing images

the case of filtering order is as follows:

Transfer the image from the spatial area to frequency using a two-dimensional discrete Fourier transformation

Perform a single multiplication of the frequency matrix of the image to the filter frequency matrix

The result is converted to the spatial area using the reverse two-dimensional discrete Fourier transformation.

Im (x, y)

IM (F X, F Y)

If (f x, f y) im (f x, f y) h (f x, f y)

If (FX, F Y)

If (x, y).

Filtering images in the frequency domain is rarely applied due to the large amount of calculations. However, this method of filtering is widely used in theoretical calculations when analyzing image processing options. It allows you to clearly imagine what kind of filtering is needed. For example, if you want to select sharp brightness drops on the image, it is obvious that the upper frequency filters must be used. On the contrary, if you need to get rid of low-frequency interference - trembling contours, individual emissions, etc., then you need to apply lower frequency filters. Specific filter parameters are selected based on the frequency analysis of the interference and properties of the original image.