Digital signal processing DSP (digital signal processor). DSP for sound. Pleasant features Why do we need DSP processors

What is DSP?

Digital signal processors (DSP, Digital Signal Processors) take as input pre-digitized physical signals, such as sound, video, temperature, pressure and position, and perform mathematical manipulations on them. The internal structure of digital signal processors is specially designed so that they can perform mathematical functions such as addition, subtraction, multiplication and division very quickly.

Signals must be processed so that the information they contain can be displayed graphically, analyzed, or converted into another type of useful signal. In the real world, signals corresponding to physical phenomena such as sound, light, temperature or pressure are detected and manipulated by analog components. Then, an analog-to-digital converter takes the real signal and converts it into a digital format as a series of ones and zeros. At this stage, a digital signal processor enters the process, which collects digitized information and processes it. It then outputs the digitized information back into the real world for further use. Information is provided in one of two ways - digital or analogue. In the second case, the digitized signal is passed through a digital-to-analog converter. All these actions are performed at very high speed.

To illustrate this concept, consider the block diagram below, which shows how a digital signal processor is used as part of an MP3 audio player. During the recording phase, an analog audio signal enters the system from a receiver or other source. This analog signal is converted to a digital signal using an analog-to-digital converter and sent to a digital signal processor. The digital signal processor encodes it into MP3 format and stores the file in memory. During the playback phase, the file is retrieved from memory, decoded by a digital signal processor, and converted by a digital-to-analog converter back into an analog signal that can be played back in the speaker system. In a more complex example, the digital signal processor may perform additional functions such as volume control, frequency compensation, and providing a user interface.

The information generated by a digital signal processor can be used by a computer, for example, to control security systems, telephones, home theater systems, or video compression. Signals can be compressed to allow faster and more efficient transmission from one location to another (for example, in teleconferencing systems for transmitting voice and video over telephone lines). Signals may also be subject to additional processing to improve their quality or provide information that is not initially available to humans (for example, in echo cancellation tasks in mobile phones or computer image enhancement). Physical signals can be processed in analog form, but digital processing provides improved quality and speed.

Because the DSP is programmable, it can be used in a wide variety of applications. When creating a project, you can write your own software or use software provided by Analog Devices or third parties.

For more information on the benefits of using DSPs in real-world signal processing, you can read the first part of the article Digital Signal Processing 101 - An Introduction to DSP System Design, entitled “Why a DSP?”

What's inside a Digital Signal Processor (DSP)?

The digital signal processor includes the following key components:

Program memory: Contains programs that the digital signal processor uses to process data
Data memory: Contains information that needs to be processed
Computing core: Performs mathematical processing by accessing the program contained in the program memory and the data contained in the data memory
I/O subsystem: Provides a range of functions to interface with the outside world

To learn more about Analog Devices processors and precision analog microcontrollers, we encourage you to review the following resources:

Digital signal processing is a complex subject, and it can overwhelm even the most seasoned DSP professionals. We've only given a brief overview here, but Analog Devices also offers additional resources that provide more detailed information about digital signal processing:

- review of technologies and practical application issues

Series of articles in Analog Dialogue magazine: (in English)

Part 1: Why do you need a digital signal processor? DSP architectures and advantages of digital signal processing over traditional analog circuits
Part 2: Learn more about digital filters
Part 3: Implementation of algorithms on a hardware platform
Part 4: Programming Considerations for Real-Time I/O Support

: Frequently used words and their meanings

Hands-on DSP labs are a quick and effective way to become familiar with the use of Analog Devices DSPs. They will enable you to gain confident, hands-on skills in working with Analog Devices digital signal processors through a course of lectures and hands-on exercises. You can find the schedule and registration information on the Training and Development page.

Digital Signal Processor(digital signal processor - DSP) is a specialized programmable microprocessor designed for real-time manipulation of a stream of digital data. DSP processors are widely used to process streams of graphic information, audio and video signals.

Any modern computer is equipped with a central processor and only a few are equipped with a digital signal processor (DSP - digital signal processor). The CPU is obviously a digital system and processes digital data, so the difference between digital data and digital signals, that is, the signals processed by the DSP, is not clear at first glance.

In the general case, it is natural to include all digital information flows that are formed in the process of telecommunications as digital signals. The main thing that distinguishes this information is that it is not necessarily stored in memory (and therefore may not be available in the future), therefore, it must be processed in real time.

The number of sources of digital information is almost unlimited. For example, downloaded files in MP3 format contain digital signals that actually represent the sound recording. Some camcorders digitize video signals and record them in digital format. High-end cordless and cell phones also convert voice to a digital signal before transmission.

Variations on a theme

DSP processors are fundamentally different from the microprocessors that form the central processing unit of a desktop computer. Due to the nature of its activity, the central processor has to perform unifying functions. It must manage the operation of various computer hardware components, such as disk drives, graphic displays, and the network interface, to ensure they operate in harmony.

This means that desktop CPUs have a complex architecture because they must support basic functions such as memory protection, integer arithmetic, floating point operations, and vector graphics processing.

As a result, a typical modern central processor supports several hundred instructions that perform all of these functions. Therefore, an instruction decoding module is needed that allows the implementation of a complex instruction dictionary as well as a variety of integrated circuits. They, in fact, must perform the actions determined by the commands. In other words, a typical processor in a desktop computer contains tens of millions of transistors.

The DSP processor, on the contrary, must be a “narrow specialist”. Its only job is to change the flow of digital signals, and to do it quickly. A DSP consists primarily of high-speed arithmetic and bit-manipulating hardware circuits optimized to quickly change large amounts of data.

Because of this, the DSP has a much smaller command set than a desktop computer's central processor; their number does not exceed 80. This means that the DSP requires a lightweight command decoder and a much smaller number of actuators. In addition, all execution devices must ultimately support high-performance arithmetic operations. Thus, a typical DSP processor consists of no more than a few hundred thousand transistors.

Being highly specialized, the DSP processor does its job perfectly. Its mathematical functions allow you to continuously receive and change digital signals (such as MP3 audio recordings or cell phone conversations) without slowing down or losing information. To increase throughput, the DSP processor is equipped with additional internal data buses, which provide faster data transfer between arithmetic modules and processor interfaces.

Why do we need DSP processors?

The DSP's specific information processing capabilities make it ideal for many applications. Using algorithms based on the appropriate mathematical apparatus, the DSP processor can perceive a digital signal and perform convolution operations to enhance or suppress certain properties of the signal.

Because DSPs have significantly fewer transistors than CPUs, they consume less power, allowing them to be used in battery-powered products. Their production is also extremely simplified, so they find application in inexpensive devices. The combination of low power consumption and low cost leads to the use of DSP processors in cell phones and robotic toys.

However, the range of their applications is far from limited to this. Due to the large number of arithmetic modules, the presence of on-chip memory and additional data buses, some DSP processors can be used to support multiprocessing. They can perform compression/decompression of live video when transmitted over the Internet. Such high-performance DSP processors are often used in video conferencing equipment.

Inside DSP

The diagram shown here illustrates the core structure of the Motorola DSP 5680x processor. Separate internal command, data, and address buses contribute to a dramatic increase in computing system throughput. The presence of a secondary data bus allows the arithmetic unit to read two values, multiply them and perform an accumulation operation of the result in one processor cycle.

Some cookies are required for secure log-ins but others are optional for functional activities. Our data collection is used to improve our products and services. We recommend you accept our cookies to ensure you’re receiving the best performance and functionality our site can provide. For additional information you may view the . Read more about our.

The cookies we use can be categorized as follows:

Strictly Necessary Cookies: These are cookies that are required for the operation of analog.com or specific functionality offered. They either serve the sole purpose of carrying out network transmissions or are strictly necessary to provide an online service explicitly requested by you. Analytics/Performance Cookies: These cookies allow us to carry out web analytics or other forms of audience measuring such as recognizing and counting the number of visitors and seeing how visitors move around our website. This helps us to improve the way the website works, for example, by ensuring that users are easily finding what they are looking for. Functionality Cookies: These cookies are used to recognize you when you return to our website. This enables us to personalize our content for you, greet you by name and remember your preferences (for example, your choice of language or region). Loss of the information in these cookies may make our services less functional, but would not prevent the website from working. Targeting/Profiling Cookies: These cookies record your visit to our website and/or your use of the services, the pages you have visited and the links you have followed. We will use this information to make the website and the advertising displayed on it more relevant to your interests. We may also share this information with third parties for this purpose.

Let us now consider the function x = f(t), which represents some sound or some other vibration. Let this fluctuation be described by a graph on a time interval (Fig. 16.2).

To process this signal in a computer, you need to sample it. For this purpose, the time interval is divided into N-1 parts

Rice. 16.2.

and the values of the function x 0 , x 1 , x 2 , ..., x N-1 are stored for N points on the boundaries of the intervals.

As a result direct discrete Fourier transform N values for X k can be obtained according to (16.1).

If we now apply inverse discrete Fourier transform, then the original sequence (x n) will be obtained. The original sequence consisted of real numbers, and the sequence (X k) is generally complex. If we equate its imaginary part to zero, we get:

(16.8)

Comparing this formula with formulas (16.4) and (16.6) for harmonics, we see that expression (16.8) is the sum of N harmonic oscillations of different frequencies, phases and amplitudes. That is, the physical meaning discrete Fourier transform consists of representing some discrete signal as a sum of harmonics. The parameters of each harmonic are calculated by the direct Fourier transform, and the sum of the harmonics is calculated by the inverse one.

Now, for example, a "low-pass filter" operation that "cuts" from a signal all frequencies above a certain specified value can simply set the coefficients corresponding to the frequencies that need to be removed to zero. Then, after processing, it executes inverse conversion.

Peculiarities digital signal processing Let's look at the example of a non-recursive filtering algorithm. The structure of the device that implements this algorithm is shown in Fig. 16.3.

Processing consists of generating an output signal Y[k] based on the values of the N last input samples x[k], which are received at the device input after a certain time interval T. Received samples are stored in circular buffer cells. When the next sample is received, the contents of all buffer cells are rewritten to the adjacent position, the oldest sample leaves the buffer, and the new one is written to its lowest cell.

Analytically, the algorithm for operating a non-recursive filter is written as:

(16.9)

where a i are coefficients determined by the filter type.

Samples from the outputs of the buffer elements are sent to multipliers, the second inputs of which receive coefficients a i . The results of the products are added and form a sample of the output signal Y[k], after which the contents of the buffer are shifted by 1 position and the filter operation cycle is repeated. The output signal Y[k] must be calculated before the next input signal arrives, that is, within the interval T. This is the essence of the device's real-time operation. The time interval T is specified by the sampling frequency, which is determined by the area of application of the filter. By corollary to Kotelnikov's theorem, in a discrete signal the period corresponding to the highest representable frequency corresponds to two sampling periods. When processing an audio signal, the sampling frequency can be taken at 40 kHz. In this case, if it is necessary to implement a digital non-recursive filter of the 50th order, then in a time of 1/40 kHz = 25 μs 50 multiplications and 50 accumulations of multiplication results must be performed. For video signal processing, the time interval during which these actions must be performed will be several orders of magnitude shorter.

If you perform the DFT of the input sequence directly, strictly according to the original formula, it will take a lot of time. Calculating by definition (sum N terms N times), we obtain a value of the order of N 2 .

However, you can get by with a significantly smaller number of operations.

The most popular of the algorithms for accelerated DFT calculations is the Cooley-Tukey method, which allows you to calculate the DFT for the number of samples N = 2k in a time of the order of N*log 2 N (hence the name - fast Fourier transform, FFT, or in English version FFT - Fast Fourier Transformation). The main idea of this method is to recursively split an array of numbers into two subarrays and reduce the calculation of the DFT from the whole array to the calculation of the DFT from the subarrays separately. In this case, the process of dividing the original array into subarrays is carried out using the bitwise reverse sorting method (bit- reverse sortINg).

First, the input array is divided into two subarrays - even and odd numbers. Each of the subarrays is renumbered and again divided into two subarrays - with even and odd numbers. This sorting continues until the size of each subarray reaches 2 elements. As a result (which can be shown mathematically), the number of each original element in the binary system is reversed. That is, for example, for single-byte numbers, the binary number 00000011 will become the number 110000000, the number 01010101 will become the number 10101010.

There are FFT algorithms for cases where N is a power of an arbitrary prime number (not just two), and also for cases where N is the product of powers of prime numbers of any number of samples. However, the FFT implemented using the Cooley-Tukey method for the case N = 2k has become the most widely used. The reason for this is that the algorithm built using this method has a number of very good technological properties:

the structure of the algorithm and its basic operations do not depend on the number of samples (only the number of runs of the basic operation changes);
the algorithm is easily parallelized using a basic operation and pipelined, and is also easily cascaded (FFT coefficients for 2N samples can be obtained by converting the coefficients of two FFTs over N samples, obtained by “decimating” the original 2N samples through one);
The algorithm is simple and compact, allows data processing “in place” and does not require additional RAM.

Single-chip microcontrollers and even general-purpose microprocessors are relatively slow when performing DSP-specific operations. In addition, the requirements for the quality of analog signal conversion are constantly increasing. IN signal microprocessors such operations are supported at the hardware level and are therefore performed quite quickly. Real-time operation also requires the processor to support hardware-level actions such as interrupt processing and software loops.

All this leads to the fact that D.S. P-processors, architecturally incorporating many of the features of general-purpose microprocessors, especially with RISC architecture, so single-chip microcontrollers, at the same time differ significantly from them. A universal microprocessor, in addition to purely computational operations, serves as a unifying link for the entire microprocessor system, in particular the computer.

It must control the operation of various hardware components such as disk drives, graphic displays, network interface in order to ensure their coordinated work. This leads to a rather complex architecture, since it must support both integer arithmetic and operations with floating point basic functions such as memory protection, multiprogramming, processing vector graphics etc. As a result, a typical universal microprocessor with CISC - and often RISC - architecture has a system of several hundred instructions that ensure the execution of all these functions, and corresponding hardware support. This leads to the need to have tens of millions of transistors in such an MP.

At the same time DSP processor is a highly specialized device. Its only task is to quickly process a stream of digital signals. It consists mainly of high-speed hardware circuits that perform arithmetic functions and bit manipulators, optimized to quickly process large amounts of data. Because of this, the set of commands DSP much less than that of a universal microprocessor: their number usually does not exceed 80. This means that for DSP a lightweight command decoder and a much smaller number of actuators are required. In addition, all execution devices must eventually support high-performance arithmetic operations. So typical DSP processor consists of no more than several hundred thousand (and not tens of millions, as in modern CISC-MP) transistors. Because of this, such MPs consume less energy, which allows them to be used in battery-powered products. Their production is extremely simplified, so they find application in inexpensive devices. Combination of low energy consumption and low cost allows them to be used not only in high-tech areas telecommunications, but also in cell phones and robot toys.

Let's note the main architecture features of digital signal processors:

Harvard architecture, which is based on the physical and logical separation of instruction memory and data memory. Key commands DSP processor are multi-operand, and speeding up their operation requires simultaneous reading of several memory cells. Accordingly, the chip has separate address and data buses (in some types of processors there are several address and data buses). This allows you to combine the fetching of operands and the execution of instructions in time. Usage modified Harvard architecture assumes that operands can be stored not only in data memory, but also in instruction memory along with programs. For example, in the case of implementing digital filters, the coefficients may be stored in program memory and the data values in data memory. Therefore, the coefficient and data can be selected in one machine cycle. To ensure instruction fetching in the same machine cycle, either program cache memory or twice accessing program memory during the machine cycle is used.
To reduce the execution time of one of the main operations of digital signal processing - multiplication - a hardware multiplier is used. In general-purpose processors, this operation is implemented over several shift and addition cycles and takes a lot of time, but in DSP processors Thanks to a specialized multiplier, only one cycle is needed. The built-in hardware multiplication circuit allows you to perform the main DSP operation in 1 clock cycle - multiplication with accumulation ( MultiIPly - Accumulate - MAC) for 16- and/or 32-bit operands.
Hardware support for circular buffers. For example, for the filter shown in Fig. 16.3, each time a sample of the output signal is calculated, a new sample of the input signal is used, which is stored in memory in place of the oldest. A fixed area of RAM can be used for such a circulating buffer. In this case, during calculations, only sequential values of RAM addresses are generated, regardless of what operation - writing or reading - is currently being performed. The hardware implementation of cyclic buffers allows you to set buffer parameters (start address, length) in the program outside the body of the filtering loop, which allows you to reduce the execution time of the cyclic section of the program.
Reducing the duration of the command cycle. This is largely ensured by techniques characteristic of RISC processors. The main ones are the placement of the operands of most instructions in registers, as well as pipelining at the instruction and microinstruction levels. The conveyor has from 2 to 10 stages, which allows up to 10 commands to be simultaneously processed at various stages of execution. This uses the generation of register addresses in parallel with the execution of arithmetic operations, as well as multiport memory access. This also includes a technique characteristic of universal microprocessors with EPIC architecture, such as the use of very long word length (VLIW) instructions generated at the compilation stage of the program. The above discussed also serves the same purpose. Harvard architecture processor, typical for single-chip microcontrollers.
The presence of internal memory on the processor chip, which makes DSPs similar to single-chip MKs. Memory built into the processor is usually much faster than external memory. The presence of built-in memory can significantly simplify the system as a whole, reducing its size, power consumption and cost. Internal memory capacity is the result of a certain compromise. Increasing it leads to higher prices for the processor and increases power consumption, and the limited capacity of program memory does not allow storing complex algorithms. Majority D.S. Fixed-point P-processors have small internal memory capacities, typically from 4 to 256 KB, and low-width external data buses connecting the processor to external memory. At the same time, floating-point DSPs usually involve working with large data sets and complex algorithms and have either large-capacity built-in memory or large address buses for connecting external memory (and sometimes both).
Wide possibilities for hardware interaction with external devices, including:
- a wide variety of interfaces, including CAN industrial local network controllers, built-in communication (SCI) and peripheral (SPI) interfaces, I2C, UART;
- several inputs for analog signals and, accordingly, a built-in ADC;
- output channels pulse width modulation (PWM);
- developed system of external interrupts;
- direct memory access controllers.
Some DSP families provide special hardware that facilitates the creation of multiprocessor systems with parallel data processing to increase productivity.
DSP processors are widely used in mobile devices where power consumption is the main characteristic. To reduce energy consumption Signal processors use a variety of techniques, including reducing the supply voltage and introducing power management functions such as dynamic clock frequency, switching to sleep or standby mode, or turning off peripherals not currently in use. It should be noted that these measures have a significant impact on the speed of the processor and, if used incorrectly, can lead to inoperability of the designed device (as an example, we can mention some cell phones that, as a result of errors in control programs, reduced command set, DSP processors also use hardware-supported instructions that are typical for MMX processing, such as commands for finding the minimum and maximum, obtaining an absolute value, adding with saturation, in which, in the event of an overflow when adding two numbers, the result is assigned the maximum possible value in a given bit grid . This leads to fewer pipeline conflicts and improves processor efficiency.
On the other hand, DSPs contain a number of commands, the presence of which is determined by the specifics of their application and which, as a result, are rarely present in other types of microprocessors. First of all, this is, of course, an instruction to multiply and accumulate the sum of address bits.
Programming microprocessors of this class also have their own characteristics. The significant developer convenience usually associated with using high-level languages often results in less compact and faster code. Since the features of the DSP require real-time operation, this leads to the need to use more powerful and expensive DSPs to solve the same problems. This situation is especially critical for high-volume products, where the difference in the cost of a more powerful DSP or additional processor plays an important role. At the same time, in modern conditions, the speed of development (and, therefore, the release of a new product to the market) can bring more benefits than the time spent optimizing code when writing a program in assembly language.
A compromise approach here is to use assembler to write the most time-critical and resource-intensive sections of the program, while the main part of the program is written in a high-level language, usually C or C++.

Digital signal processing DSP (digital signal processor)

PeculiaritiesDSP

DSPs are specialized processors for computationally intensive applications.
If we take a closer look, for example, at the process of multiplying two numbers with storing the result in traditional microprocessors, we can see how computer time is spent: first, a command is fetched (the command address is set to the address bus), then the first operand (the operand address is set to the address bus ), then the operand is transferred to the accumulator, then the second operand is fetched, etc. Acceleration of this process in a general purpose processor is impossible due to the presence of a single address bus and a single data bus, as well as a single data bank. In view of this, all operations to retrieve operands from memory, fetch an instruction, and store an operand are performed sequentially using the same data bus and address bus. In addition, if we consider the operation of cyclic summation of an arithmetic series, we can see that here the time overhead is associated with remembering the address of the first command of the cycle, checking the condition of the cycle (counter) and returning to the first command. Also, large overheads exist during subroutine jump and return operations (writing and restoring register values from the stack) and many other operations. If we take into account the huge number of mathematical operations when performing digital signal processing, it becomes clear that very sensitive losses in the accuracy of calculations during rounding are inevitable, which cannot but affect the overall result. This occurs due to the same width of all registers of general-purpose processors.
With digital signal processing, all these costs are unacceptable. In order to overcome this shortcoming of general-purpose processors, digital signal processors (DSP - Digital Signal Processor) were developed.

Three-bus Harvard architecture

Its peculiarity lies primarily in the fact that, unlike the two buses we are used to: the address bus and the data bus, as well as one memory bank, the DSP has at least 6-7 different buses and 2-3 memory banks. This feature aims to maximally speed up the execution of the multiplication operation while storing the result, which is undoubtedly the most used and resource-intensive in digital signal processing. DSP architecture allows in one machine cycle produce:

fetching a command via the program address bus and the program data bus;
fetching two operands for a multiplication operation via two data address lines;
entering operands into accumulators via two data buses;
multiplication operation;
save the result in the accumulator.

Thus, the three-bus Harvard architecture allows almost any operation to be performed in one machine cycle.
As an example of the effectiveness of using DSP in implementing digital signal processing algorithms, the execution time of a complex 1024-point Fourier transform is 20 ms for a 486DX2 66 MHz (32-bit) and 3.23 ms for a 24-bit 33 MHz DSP56001 from Motorola or 3.1 ms for 32-bit 33 MHz DSP TMS320C30 with floating arithmetic from Texas Instruments.
However, as already mentioned, digital signal processors are distinguished not only by high performance, measured in the speed of multiplication/accumulation operations (MIPS - millions of instructions per second), but also by such characteristics as the sequence of program execution, arithmetic operations and memory addressing, allowing you to reduce unproductive time to a minimum. In general, DSP differs from other types of microprocessors and microcontrollers in the following five main ways:

Fast arithmetic.

The DSP processor must perform multiplication, multiplication with accumulation, cyclic shift, as well as standard arithmetic and logical operations in one cycle.

Extended dynamic volume for multiply/accumulate operation.

The operation of calculating the sum of a certain sequence of values is fundamental for algorithms implemented on the DSP. Overflow protection is necessary to avoid data loss.

Fetching two operands in one cycle.

Obviously, most operations performed by a DSP require two operands. Thus, to achieve maximum performance, the processor must be able to fetch two operands simultaneously, which also requires a flexible addressing system.

Availability of hardware-implemented cyclic buffers (built-in and external).

A wide class of algorithms implemented on the DSP requires the use of cyclic buffers. Hardware support for address pointer cycling, or modular addressing, reduces CPU overhead and simplifies algorithm implementation.

Organize loops and branches without loss of performance.

DSP algorithms involve a lot of repetitive operations that can be implemented as loops. The ability to sequence the execution of a code program in a loop without loss of performance distinguishes the DSP from other processors. Likewise, wasting time when performing a conditional branch operation is also unacceptable in digital signal processing.
However, one should not think that DSPs can completely replace general-purpose processors. Typically, digital signal processors have a simplified instruction set that does not allow non-mathematical operations to be performed as efficiently as general-purpose processors. An attempt to combine power for mathematical calculations and flexibility for other types of operations in one processor leads to an unjustified increase in cost. Therefore, DSPs are often used in the form of coprocessors (mathematical, graphic, accelerators, etc.) with the main processor or as an independent processor, if this is sufficient.

DSPcompaniesMotorola

Motorola currently produces three families of Digital Signal Processors. These are the DSP56100, DSP56000 and DSP96000 series. All microcircuits of the given series are based on the DSP56000 architecture and differ in bit depth (16, 24, 32 bits, respectively) and some built-in devices. In this way, compatibility of chips from all three families is achieved from bottom to top. All Motorola DSPs are built on the identical three-bus Harvard architecture described earlier, with a large number of components, ports, controllers, memory banks and buses operating in parallel to achieve maximum performance.
Data transfer occurs on bidirectional data buses (one for the DSP56100 (XDB) and two for the DSP56000 and DSP96000 (XDB and YDB)), the program data bus (PDB), and the general data bus (GDB). In addition, the DSP96000 has a separate direct memory access bus (DDB). Data transfer between buses occurs via internal tire management device.
Addressing carried out over two unidirectional buses: the data address bus and the program address bus.
Bit manipulation block allows you to flexibly control the state of any bit in registers and memory cells. Having this capability is an advantage over other users' DSPs.
Arithmetic Logic Unit (ALU) performs all arithmetic and logical operations and includes input registers, accumulators, accumulator extension registers (8-bit, allowing 256 overflows without loss of precision), a parallel single-cycle multiply and store unit (MAS), as well as shift registers. Flexible system commands allows you to execute the ALU in one cycle of multiplication instructions, multiplication with saving the result, summation, subtraction, shift and logical operations. A characteristic feature of the Motorola DSP is the ability to double the input registers of the ALU and thus increase the bit depth of the processed numbers. Another important feature is the presence of a division operation, which is often absent from other manufacturers and is replaced by a multiplication operation by the inverse number, which leads to a loss of accuracy.
Address generation block performs all calculations related to determining addresses in memory. This block operates independently of other processor blocks. In one cycle, two read operations from memory or one write operation can be performed. Motorola's DSPs have an extremely powerful addressing system that allows almost any data manipulation to be performed in a single command. This important feature distinguishes DSPs produced by the company from their analogues. Modulo addressing is useful for organizing ring buffers without out-of-bounds checking, thereby avoiding time overhead. The ability to address with significant bit inversion facilitates the implementation of FFT.
Block management execution programs contains 6 registers, among which Loop address pointer And Cycle counter, allowing you to organize hardware support for organizing loops in the Motorola DSP, which does not require additional machine cycles to check the conditions for exiting the loop and changing the loop counter. The DO cycle command explicitly specifies the number of repetitions.
The system stack is a separate part of 15 words of RAM, and can store information about 15 interrupts, 7 loops, or 15 subroutine exits. Data from the stack is read in one cycle, thereby reducing processor overhead.
The main distinguishing feature of Motorola's DSP is that all chips have on-chip emulator, allowing you to debug programs without the use of additional hardware. Thus, there is no need to purchase expensive debugging tools. The emulator allows you to write/read registers and memory cells, set breakpoints, step-by-step execution of programs and other actions by sending commands over a 4-wire bus.
To reduce energy consumption when not computing, two low-power modes are provided: STOP And WAIT.
To work in conjunction with other processors and direct memory access channels, a built-in HOST interface.
Possessing all of the above properties necessary for digital signal processing, Motorola DSPs have an extremely powerful and flexible command system that allows the user to work conveniently and efficiently with processors.

DSP96000 Family

The DSP96000 family of DSPs has a 32-bit architecture and supports floating point operations. The family's microcircuits are designed for Multimedia computer systems. DSPs of this series can operate as independent chips, and through two independent 32-bit ports they can sequentially exchange data with other processors.
The family's microcircuits include 6 memory banks, 8 buses and 4 autonomous computing units: an ALU, a program control unit, a double address generation unit and a built-in two-channel direct memory access controller.
Characteristics of DSP96000 family chips:

49.5 MIPS at 40 MHz
60 MFLOPS at 40 MHz, 50 ns cycle
32-bit organization
2 banks of data memory RAM 512x32 bits
2 banks of ROM data memory 512x32 bits
Program RAM 1024x32 bit
56 byte boot ROM
addressable external memory 2x232 32-bit words of data and program memory
built-in emulator
2 channels DMA
2 channels of exchange with external processors
223-pin PGA or QFP package

DSPcompaniesTexasInstruments

The DSPs of this company are represented by the following microprocessors: TMS 32010, TMS 320C20, TMS 320C25, TMS 320C30, TMS 320C40, TMS 320C50.

Features of the TMS320C25 architecture

The TMS320C2x architecture is based on the TMS32010 architecture, the first member of the microprocessor DSP family. In addition, its instruction set overlaps the instruction set of the TMS32010 microprocessor, which maintains software compatibility from the bottom up.
The TMS320C2x microprocessor has a single battery and uses Harvard architecture in which data memory and program memory are separated into different address spaces. This allows you to completely block the call and execution of a command in time. The instruction system includes commands for exchanging data between two memory areas. Outside the microprocessor, the data and program memory spaces are combined on the same bus in order to maximize the address range in both memory areas while simultaneously minimizing the number of pins. Within the microprocessor, program and data spaces are routed to separate buses to increase processor power and program execution speed.
Increased system design flexibility is provided by two large on-chip RAM blocks, one of which can be used as both program memory and data memory. Most processor instructions are executed in a single machine cycle using both external fast access program memory and internal RAM memory. The flexibility of the TMS320C2x microprocessor also allows for connection of slow external memory or peripheral devices using the READY signal; but in this case the commands are executed in several machine cycles.

Memory organization

The TMS32020 chip contains 544 16-bit words of RAM, of which 288 words (blocks B1 and B2) are always allocated for data, and 256 words (block B0) can be used either as data memory or as program memory in different processor configurations. The TMS320C25 is also equipped with a 4K word maskable ROM, and the TMS320E25 has a 4K word UV erasable EPROM memory.
TMS320C2x is provided with three separated address spaces - for program memory, for data memory and for I/O devices, as shown in Fig. 6.5. These off-chip spaces are distinguished using the -PS, -DS, -IS signals (for program, data, I/O spaces, respectively). Memory blocks B0, B1, B2, located on the chip, cover a total of 544 words of random access memory (RAM). RAM block B0 (256 words) is located on pages 4 and 5 of data memory, if it is allocated for data, or at addresses >FF00 - >FFFF, if it is part of program memory. Block B1 (data only) is located on pages 6 and 7, and block B2 occupies the highest 32 words of page 0. Note that the remainder of page 0 is occupied by 6 addressable registers and a spare area; 1 - 3 pages also represent a reserve area. Reserve areas cannot be used to store information; their contents are undefined when read.
The internal program memory (ROM) located on the processor chip can be used as the lower 4K words of program memory. To do this, a low level signal must be applied to the MP/*MC pin. To prohibit the use of the internal ROM area, the MP/*MC must be set to a high level.

External memory and I/O interface

The TMS32020 microprocessor supports a wide range of interface systems. The data, program, and I/O address space provides interfaces to memory and external devices, increasing system capabilities. The local memory interface consists of:

16-bit data bus (D0-D15);
16-bit address bus (A0-A15);
address spaces of data, programs and input/output selected by signals (*DS, *PS and *IS);
various system control signals.

The R/*W signal controls the direction of transmission, and the *STRB signal controls the transmission.
The I/O space contains 16 input ports and 16 output ports. These ports provide a full 16-bit interface to external devices over the data bus. One-time I/O using IN and OUT instructions is completed in two instruction cycles; however, using a repetition counter reduces the time of one port access to 1 cycle.
The use of I/O is simplified by the fact that I/O is performed in the same way as memory access. I/O devices are mapped to the I/O address space using external processor addresses and the data bus, in the same way as memory. When addressing the internal memory, the data bus is in the third state, and the control signals are in a passive state (high).
Interaction with memory and I/O devices at various speeds is accompanied by a READY signal. When communicating with slow devices, TMS320C2x waits until the device completes its operation and signals the processor about this via the READY line, after which the processor continues operation.

Central Arithmetic Logic Unit

The Central Arithmetic Logic Unit (CALU) contains a 16-bit scaling shift register, a 16 x 16 parallel multiplier, a 32-bit arithmetic logic unit (ALU), a 32-bit accumulator, and several additional shift registers located both at the output of the multiplier. and at the battery outlet.
Any ALU operation is performed in the following sequence:

data is captured from RAM to the data bus,
the data passes through the scaling shift register and through the ALU, in which arithmetic operations are performed,
the result is transferred to the accumulator.

One input to the ALU is always connected to the accumulator output, and the second can receive information either from the multiplier product register (PR) or loaded from memory via a scaling shift register.

Conveyor operations

The command pipeline consists of a sequence of external bus access operations that occur during command execution. The prefetch-decode-execute pipeline is usually invisible to the user, except in some cases where the pipeline must be interrupted (for example, during a branch). While the pipeline is running, prefetching, decoding, and instruction execution are independent of each other. This allows teams to overlap. So during one cycle two or three commands can be active, each at different stages of work. Therefore, we get a two-level conveyor for the TMS32020 and a three-level for the TMS320C25.
The number of pipeline levels does not always affect the speed of command execution. Most instructions execute in the same number of cycles regardless of whether the instructions are fetched from external memory, internal RAM, or internal ROM.
Additional hardware available on the TMS320C25 processor allows the number of pipeline levels to be expanded to three, which improves processor performance. These facilities include the Precapture Counter (PFC), the 16-bit Microcall Stack (MCS), the Instruction Register (IR), and the Queue Register (QIR).
With a three-level pipeline, the PFC contains the address of the next instruction to be pre-captured. Once the pre-capture is completed, the command is loaded into the IR. If IR stores a command that has not yet been executed, then the pre-captured command is placed in QIR. After this, PFC is increased by 1. As soon as the current command is executed, the command from QIR will be overloaded into IR for further execution.
The program counter (PC) contains the address of the instruction to be executed next and is not used for capture operations.
But usually PC is used as a pointer to the current position in the program. The contents of the PC are increased after each command executed. When an interrupt or subroutine call occurs, the contents of the PC are pushed onto the stack so that a return to the desired location in the program can be performed later.
The pre-capture, decoding and pipeline execution cycles are independent of each other, this allows the executable commands to overlap in time. During any cycle, three commands can be active simultaneously, each at a different stage of completion.