OLAP tools. OLAP technology. Storage of active data in relational database

The OLAP mechanism is one of the most popular data analysis methods. There are two main approaches to solving this task. The first one is called Multidimensional OLAP (MOLAP) - implementation of the mechanism using a multidimensional database on the server side, and the second Relational Olap (ROLAP) - building cubes "on the fly" based on SQL requests to relational DBMS. Each of these approaches has its pros and cons. Them comparative analysis Entering beyond this article. We describe our realization of the kernel of the desktop rolap module.

Such a task has arisen after applying a ROLAP system based on the Decision Cube components that make up Borland Delphi. Unfortunately, the use of this set component has shown low performance on large amounts of data. The severity of this problem can be reduced by trying to cut off as much data as possible before feeding them to build cubes. But this does not always happen enough.

On the Internet and the press you can find a lot of information about OLAP systems, but almost nowhere is stated about how it is arranged inside. Therefore, the solution of most problems was given by the method of trial and errors.

Scheme of work

The general scheme of the work of the desktop OLAP system can be represented as follows:

The work algorithm is as follows:

  1. Obtaining data in the form of a flat table or the result of the execution of the SQL query.
  2. Data caching and transformation to multidimensional Cuba.
  3. Displays the constructed cube using cross-table or chart, etc. In the general case, an arbitrary number of mappings can be connected to one Cuba.

Consider how such a system can be arranged inside. Let's start this from the side that you can see and feel, that is, with mappings.

Display used in OLAP systems, most often there are two types - cross-tables and charts. Consider a cross-table, which is the main and most common way to display Cuba.

Cross Table

In the figure below, the rates and columns containing aggregated results are displayed, the cells are marked with light gray cells in which the facts and dark gray cells contain the dimensions.

Thus, the table can be divided into the following elements with which we will work in the future:

Filling a matrix with facts, we must act as follows:

  • Based on measurement data, determine the coordinates of the added element in the matrix.
  • Determine the coordinates of columns and lines of results to which the added element is affected.
  • Add item to the matrix and the corresponding columns and lines of results.

It should be noted that the resulting matrix will be strongly sparse, why its organization in the form of a two-dimensional array (the option lying on the surface) is not only irrational, but most likely, and is not possible due to the large dimension of this matrix, which is not Ensure no amount of RAM. For example, if our cube contains sales information in one year, and if it is only 3 measurements - customers (250), products (500) and date (365), then we will get the matrix of the facts of the following sizes:

Number of elements \u003d 250 x 500 x 365 \u003d 45,625,000

And this is despite the fact that the filled elements in the matrix can only be a few thousand. Moreover, the greater the amount of measurements, the more rarefied the matrix will be.

Therefore, to work with this matrix you need to apply special mechanisms for working with rarefied matrices. Various options for organizing a rarefied matrix are possible. They are quite well described in programming literature, for example, in the first volume of the classic book "Art of Programming" Donald Knuta.

Consider now how you can determine the factory coordinates, knowing the corresponding measurements. To do this, consider in more detail the header structure:

In this case, you can easily find a way to determine the numbers of the corresponding cell and the results in which it falls. Here you can offer several approaches. One of them is the use of a tree to search for the corresponding cells. This tree can be built when passing the sample. In addition, you can easily determine the analytical recurrent formula to calculate the desired coordinates.

Preparation of data

The data stored in the table must be converted to use them. So, in order to increase productivity in constructing a hypercuba, it is desirable to find unique elements stored in columns that are measuring cube. In addition, you can prior aggregation of facts for entries that have the same dimensions. As mentioned above, unique values \u200b\u200bavailable in the measurement fields are important. Then, for storage, you can offer the following structure:

When using such a structure, we significantly reduce the need for memory. What is pretty relevant, because To increase the speed of operation, it is advisable to store data in RAM. In addition, you can store only an array of elements, and their values \u200b\u200bare unloaded to the disk, as they will only be required when the cross-table is derived.

Library of components CubeBase

The ideas described above were maintained when creating a CubeBase component library.

Tsubesource. Carries out caching and transformation of data into the internal format, as well as pre-aggregation of data. Component Tsubeengine. Carries out the calculation of the hypercube and operations with it. In fact, it is an OLAP machine that transforms a flat table into a multidimensional data set. Component TCUBEGRID Outputs the cross-table screen and control the hypercube display. Tsubechart. Allows you to see the hypercubus in the form of graphs, and the component Tsubepivote Manages the work of the cube of Cuba.

Comparison of productivity

This component set showed a much higher speed than Decision Cube. So on the set of 45 thousand records, the Decision Cube components required 8 minutes. on construction summary Table. CubeBase has loaded data for 7sek. and building a consolidated table for 4 seconds. When testing for 700 thousand recordings of Decision Cube, we did not wait for the response within 30 minutes, after which they removed the task. CubeBase has loaded data for 45 sec. and building a cube in 15 seconds.

On the volume of data in thousands of entries CubeBase has been practiced in tens of times faster than Decision Cube. On tables hundreds of thousands of records - hundreds of times faster. And high performance - one of the most important indicators of OLAP systems.

Application OLAP system allows you to automate the strategic level of organization management. OLAP (Online Analytical Processing - Analytical Processing of Data in Real Time) is a powerful data processing and research technology. Systems built on the basis of OLAP technology provide practically infinite features To compile reports, the implementation of complex analytical calculations, the construction of forecasts and scenarios, the development of a variety of options for plans.

Full OLAP systems appeared in the early 90s, as the result of the development of information support information systems. They are intended for converting various, often scattered, data, useful information. OLAP systems can arrange data in accordance with some set of criteria. It is not necessary that the criteria have clear characteristics.

OLAP systems have been found in many issues of strategic management organization: business efficiency management, strategic planning, budgeting, development forecasting, preparation of financial statements, work analysis, imitation modeling of the external and internal environment, data storage and reporting.

Structure OLAP Systems

The OLAP system is based on the processing of multidimensional data arrays. Multidimensional arrays are arranged so that each element of the array has many connections with other elements. To form a multidimensional array, the OLAP system should obtain source data from other systems (for example, ERP or CRM system), or through an external input. The OLAP system user receives the necessary data in structured form in accordance with its request. Based on the specified procedure, you can submit to the structure of the OLAP system.

In general, the structure of the OLAP system consists of next elements:

  • database . The database is a source of information for the operation of the OLAP system. The type of database depends on the type of OLAP system and the operation algorithms of the OLAP server. Typically, relational databases are used, multidimensional databases, data warehouses, etc.
  • OLAP server. It provides the management of the multidimensional data structure and the relationship between the database and users of the OLAP system.
  • custom applications. This element of the OLAP system structure controls user requests and generates the results of accessing the database (reports, graphs, tables, etc.)

Depending on the method of organizing, processing and storing data, OLAP systems can be implemented on local computers Users or using selected servers.

There are three basic ways to store and process data:

  • locally. Data is posted on user computers. Processing, analysis and data management is performed on local workplaces. Such a structure of the OLAP system has significant disadvantages associated with the rate of data processing, data security and limited use of multidimensional analysis.
  • relational databases. These databases are used when working together by the OLAP system with a CRM system or ERP system. Data is stored on the server of these systems as relational databases or data warehouses. OLAP server refers to these databases to form the necessary multidimensional structures and analysis.
  • multidimensional databases. In this case, the data is organized as a special data warehouse on a dedicated server. All data transactions are carried out on this server that converts the initial data into multidimensional structures. Such structures are called OLAP Cuba. Sources of data for the formation of OLAP cube are relational databases and / or client files. The data server provides preliminary preparation and processing of data. OLAP server works with OLAP cube without direct access to data sources (relational databases, client files, etc.).

Types of OLAP Systems

Depending on the storage and data processing method, all OLAP systems can be divided into three main types.


1. ROLAP (Relational Olap - relational OLAP systems) - This type of OLAP system works with relational databases. Data appeal is carried out directly to the relational database. Data is stored as relational tables. Users have the ability to carry out multi-dimensional analysis as in traditional OLAP systems. This is achieved by applying SQL tools and special queries.

One of the advantages of ROLAP is the ability to more effectively process a large amount of data. Another advantage of ROLAP is the ability to effectively process both numeric and text data.

The disadvantages of ROLAP include low performance (compared to traditional OLAP systems), because Data processing is performed by the OLAP server. Another disadvantage is the restriction of functionality due to the use of SQL.


2. MOLAP (Multidimensional Olap - multidimensional OLAP system). This type of OLAP systems refers to traditional systems. The difference between the traditional OLAP system, from other systems, is to preliminarily prepare and optimize the data. These systems, as a rule, use a dedicated server on which the pre-processing of data is performed. The data is formed into multidimensional arrays - OLAP Cuba.

Molap systems are the most effective in data processing, because They allow you to easily reorganize and structure data under various user requests. MOLAP analytical instruments allow you to perform complex calculations. Another advantage of MOLAP is the ability to quickly generate requests and obtain results. This is ensured by pre-formation of OLAP cubes.

The disadvantages of the MOLAP system refers to limit the volumes of data from the data and data redundancy, because To form multidimensional cubes, according to various aspects, data has to duplicate.


3. HOLAP (Hybrid Olap - hybrid OLAP systems). Hybrid OLAP systems are a union of ROLAP and MOLAP systems. In hybrid systems, they tried to combine the benefits of two systems: using multidimensional databases and managing relational databases. HOLAP systems allow you to store a large amount of data in relational tables, and the processed data is placed in pre-constructed multidimensional OLAP cubes. The advantages of this type of systems are to scalable data, quick data processing and flexible access to data sources.

There are other types of OLAP systems, but they are more important than the marketing course of manufacturers than the independent view of the OLAP system.

Such species include:

  • WOLAP (Web Olap). View OLAP system with support web interface. In these OLAP systems, it is possible to access the databases through the Web interface.
  • Dolap (Desktop OLAP). This view of the OLAP system allows users to download to local workplace Database and work with it locally.
  • Mobileolap. This is an OLAP system function that allows you to work with a database remotely using mobile devices.
  • Solap (Spatial Olap). This type of OLAP systems is designed to process spatial data. It appeared as a result of integrating geographic information systems and OLAP systems. These systems allow you to process data not only in alphanumeric format, but also in the form of visual objects and vectors.

Advantages of OLAP system

The use of OLAP system provides an organization opportunity to predict and analyze various situations related to current activities and development prospects. These systems can be viewed as an addition to the enterprise level automation systems. All advantages of OLAP systems are directly dependent on the accuracy, reliability and volume of the source data.

The main advantages of OLAP systems are:

  • consistency of source information and analysis results. If there is an OLAP system, there is always the ability to trace the source of the information and determine the logical connection between the results obtained and the source data. The subjectivity of the analysis results is reduced.
  • conducting a multivariate analysis. Application of the OLAP system allows you to get a variety of event development scenarios based on the set of source data. Due to the analysis tools, you can simulate situations on the principle of "what will happen if".
  • detail management. Details of the presentation of the results may vary depending on the needs of users. At the same time, there is no need to carry out complex system settings and repeat the calculations. The report may contain exactly the information that is necessary for making decisions.
  • detection of hidden dependencies. Due to the construction of multidimensional connections, it is possible to identify and determine hidden dependencies in various processes or situations that affect production activities.
  • creating a single platform. Due to the use of OLAP system, it is possible to create single platform For all processes of forecasting and analyzing the enterprise. In particular, the data OLAP system are the basis for building budget forecasts, sales forecast, procurement forecast, strategic development plan, etc.

The conditions for high competition and the growing dynamics of the external environment dictate increased requirements for enterprise management systems. The development of the theory and practice of management was accompanied by the emergence of new methods, technologies and models focused on improving the efficiency of activity. Methods and models in turn contributed to the emergence of analytical systems. The demand for analytical systems in Russia is high. Most interesting in terms of application of these systems in the financial sector: banks, insurance business, investment companies. The results of the work of analytical systems are required primarily to people whose decisions depends on the development of the company: managers, experts, analysts. Analytical systems allow you to solve consolidation tasks, reporting, optimization and forecasting. To date, it has not been a final classification of analytical systems, as there is no general system of definitions in terms used in this direction. The information structure of the enterprise can be represented by a sequence of levels, each of which is characterized by its processing and information management method, and has its own function in the management process. Thus, analytical systems will be located hierarchically at different levels of this infrastructure.

Level of transactional systems

Data warehouse level

The level of data showcases

OLAP level - systems

Level of analytical applications

OLAP - Systems - (Online Analytical Processing, Analytical Treatment In the present Time) - are the technology of comprehensive multidimensional data analysis. OLAP - Systems are applicable where there is a task of analyzing multifactor data. There are an effective means of analyzing and generating reports. The above data warehouses, data showcases and OLAP systems refer to business intelligence systems (BUSINESS INTELLIGENCE, BI).

Very often, information and analytical systems created on the direct use of decision-making persons are extremely simple in use, but are rigidly limited in functionality. Such static systems are called in the literature of the information systems of the manager (IPR), or Executive Information Systems (EIS). They contain predefined multiple requests and, being sufficient for casual revieware unable to answer all questions to available data that may arise when making decisions. The result of such a system, as a rule, are multi-page reports, after a thorough study of which the analyst has a new series of issues. However, each new request, unforeseen when designing such a system, should be formally described formally, encoded by a programmer and is then executed. Waiting time in this case can make hours and days that is not always acceptable. Thus, the external simplicity of static SPPR, for which most of the customers of information and analytical systems are actively fighting, turns on the catastrophic loss of flexibility.



Dynamic SPPRs, on the contrary, are focused on the processing of non-elected (AD HOC) of analysts to data. The most deeply requirements for such systems reviewed E. F. Codd in the article, which posted the beginning of the concept of OLAP. The work of analysts with these systems is the interactive sequence of querying and studying their results.

But dynamic SPPRs can act not only in the field of operational analytical processing (OLAP); Support for making management decisions based on accumulated data can be performed in three basic areas.

Sphere of detailed data. This is the area of \u200b\u200baction of most systems aimed at finding information. In most cases, relational DBMSs are perfectly coping with tasks arising here. The generally accepted standard of manipulation language with relational data is SQL. Information and search engines that provide the end-user interface in the search tasks of detailed information can be used as add-ons both over separate transaction system databases and over common data storage.

Sphere of aggregated indicators. A comprehensive look at the information collected in the data warehouse, its generalization and aggregation, hypercubic representation and multidimensional analysis are tasks of operational analytical data processing systems (OLAP). Here you can or focus on special multidimensional DBMS, or remain within relational technologies. In the second case, pre-aggregated data can be collected in the database of a star-like type, or the information aggregation can be carried out on the fly in the process of scanning detailed tables of the relational database.

Sphere of patterns. Intelligent processing is performed by the methods of intelligent data analysis (Jaad, Data Mining), the main tasks of which are the search for functional and logical patterns in the accumulated information, the construction of models and rules that explain the found anomalies and / or predict the development of some processes.

Operational analytical data processing

The basis of the concept of OLAP lies the principle of multidimensional data presentation. In 1993, the EF Codd article considered the deficiencies of the relational model, first of all specifying the inability to "combine, view and analyze data from the point of view of the multiplicity of measurements, that is, the most understandable for corporate analysts in the way," and identified general requirements for OLAP systems expanding The functionality of relational DBMS and includes multi-dimensional analysis as one of its characteristics.

Classification of OLAP products according to the data representation method.

Currently, a large number of products are present on the market, which to varying degrees provide OLAP functionality. About 30 most famous are listed in the list of the review Web server http://www.olapreport.com/. Providing a multidimensional conceptual representation by user interface To the source database, all OLAP products are divided into three classes by type of source database.

The most first operational analytical processing systems (for example, Essbase ARBOR Software, Oracle's Oracle Express Server Company) belonged to the MOLAP class, that is, they could only work with their own multidimensional databases. They are based on proprietary technologies for multidimensional DBMS and are the most expensive. These systems provide a complete OLAP processing cycle. They either include, in addition to the server component, their own integrated client interface is either used to communicate with the user external work programs with spreadsheets. To maintain such systems, a special staff is required by installing, accompanied by system, the formation of data views for end users.

Operational Analytical Relational Data Processing Systems (ROLAP) allow data stored in relational base, in multidimensional form, ensuring the transformation of information into a multidimensional model through an intermediate layer of metadata. Rolap systems are well adapted to work with large storage. Like MOLAP systems, they require considerable service costs for information technology professionals and provide multiplayer operation.

Finally, hybrid systems (Hybrid Olap, Holap) are designed to combine advantages and minimize the shortcomings inherent in previous classes. Speedware Media / MR includes this class. According to developers, it combines analytical flexibility and MOLAP response speed with constant access to real data peculiar to ROLAP.

Multidimensional OLAP (MOLAP)

In specialized DBMS based on multidimensional data presentation, the data is not organized in the form of relational tables, but in the form of ordered multidimensional arrays:

1) hypercubes (all the cells stored in the database must have the same dimension, that is, to be in the maximum full measurement basis) or

2) polycubes (each variable is stored with its own set of measurements, and all the associated complexity of processing is shifted to the internal mechanisms of the system).

The use of multidimensional databases in systems of operational analytical processing has the following advantages.

In the case of using multidimensional DBMS, the search and sample of data is carried out much faster than with a multidimensional conceptual look at the relational database, since the multidimensional database is denormalized, contains pre-aggregated indicators and provides optimized access to the requested cells.

Multidimensional DBMS easily cope with the tasks of inclusion in information model A variety of built-in functions, whereas objectively existing limitations of the SQL language make the execution of these tasks based on relational DBMSs quite complex, and sometimes impossible.

On the other hand, there are significant limitations.

Multidimensional DBMSs do not allow working with large databases. In addition, due to the denormalization and pre-performed aggregation, the amount of data in a multidimensional base, as a rule, corresponds to (by assessing the code) in 2.5-100 times the smaller volume of source detailed data.

Multidimensional DBMSs compared with relational are very inefficiently used external memory. In the overwhelming majority of cases, the information hypercube is strongly rarefied, and since the data is stored in an ordered form, uncertain values \u200b\u200bare deleted only by selecting the optimal sorting order, which allows you to organize data into the maximum continuous groups. But even in this case, the problem is solved only in part. In addition, the sorting procedure is most likely optimal from the point of view of storage, the order of sorting will most likely not coincide with the order that is most often used in queries. Therefore, in real systems, it is necessary to search for a compromise between the speed and redundancy of the disk space occupied by the database.

Consequently, the use of multidimensional DBMS is justified only under the following conditions.

The amount of source data for analysis is not too large (no more than a few gigabytes), that is, the data aggregation level is quite high.

The set of information measurements is stable (since any change in their structure almost always requires a complete hypercube restructuring).

The response time of the system for non-elected requests is the most critical parameter.

A wide use of complex built-in functions is required to perform cross-dimensional calculations over the cells of the hypercube, including the possibility of writing user functions.

Relation OLAP (ROLAP)

Direct use of relational databases in systems of operational analytical processing has the following advantages.

In most cases, corporate data warehouses are implemented by means of relational DBMS, and ROLAP tools make it possible to analyze directly above them. In this case, the storage size is not such a critical parameter as in the case of MOLAP.

In the case of a variable dimension of the task, when changes to the measurement structure have to be made quite often, the ROLAP system with a dynamic representation of dimension is optimal decisionSince these modifications do not require physical reorganization of the database.

Relational DBMSs provide a significantly higher level of data protection and good access rights to delimitation.

The main drawback of ROLAP compared to multidimensional DBMS is less performance. To ensure performance comparable to MOLAP, relational systems require a thorough study of the database diagram and index settings, that is, great efforts from the database administrators. Only when using star-shaped schemes, the performance of well-configured relational systems can be approached by the performance of systems based on multidimensional databases.

Online analytical processing, or OLAP is an effective data processing technology, resulting on the basis of huge arrays of all kinds of data output. This is a powerful product that helps access, extract and view information on the PC, analyzing it from different points of view.

OLAP is a tool that provides a strategic position of long-term planning and considers the basic information of operational data to perspective 5, 10 or more. The data is stored in the database with dimension, which is their attribute. Users can view the same data set with different attributes, depending on the analysis objectives.

History OLAP.

OLAP is not a new concept and has been used for decades. In fact, the origin of the technology is tracked since 1962. But the term was invented only in 1993 by the author of the database by Tedododdom, who also installed 12 rules for the product. As in many other applications, the concept was subjected to several stages of evolution.

The history of the OLAP technology itself dates back to 1970, when they were released informational resources Express and first OLAP server. They were acquired by Oracle in 1995 and subsequently became the basis of online analytical processing of a multidimensional computing mechanism, which a well-known computer brand provided in its database. In 1992, another well-known online analytical processing product Essbase was released by Arbor Software (purchased Oracle in 2007).

In 1998, Microsoft has released an online analytical MS ANALYSIS SERVICES data processing server. This contributed to the popularity of technology and prompted the development of other products. Today there are several world-famous providers offering OLAP applications, including IBM, SAS, SAP, Essbase, Microsoft, Oracle, Iccube.

Online analytical processing

OLAP is a tool that allows you to make decisions about planned events. Atypical OLAP calculation may be more complicated than just data aggregation. Analytical queries per minute (AQM) are used as a standard standard for comparing the characteristics of various tools. These systems should maximize users from the syntax of complex queries and ensure the consistent response time for all (regardless of how difficult they are).

The following main characteristics of OLAP are exist:

  1. Multidimensional data representation.
  2. Support for complex computing.
  3. Temporary intelligence.

The multidimensional representation provides the basis for analytical processing by means of flexible access to corporate data. It allows users to analyze data in any measurement and at any level of aggregation.

Support for complex calculations is the basis of OLAP software.

Temporary intelligence is used to assess the effectiveness of any analytical application for a certain period of time. For example, this month compared with the past month, this month compared with the same month last year.

Multidimensional data structure

One of the main features of online analytical processing is a multidimensional data structure. Cube can have several measurements. Thanks to such a model, the entire process of intelligent OLAP analysis is simple for managers and managers, since the objects presented in the cells are business objects of the real world. In addition, this data model allows users to process not only structured arrays, but also unstructured and semi-structured. All this makes them especially popular for data analysis and BI applications.

The main characteristics of OLAP systems:

  1. Use multidimensional data analysis methods.
  2. Provide advanced database support.
  3. Create easy-to-use end-user interfaces.
  4. Support the client / server architecture.

One of the main components of the OLAP concepts is the server on the client side. In addition to aggregation and pre-processing of data from the relational base, it provides advanced calculation and recording parameters, additional functions, Basic enlarged requests and other functions.

Depending on the example of the application selected by the user, various data models and tools are available, including real-time notification, the function for using scripts "What, if", optimization and complex OLAP reports.

Cubic form

The concept is based on a cubic form. The location of the data in it shows how OLAP adheres to the principle of multidimensional analysis, as a result of which the data structure is created for quick and effective analysis.

OLAP cube is also called "hypercub." It is described as consisting of numerical facts (measures), classified by facets (measurements). Dimensions belong to attributes that define a business problem. Simply put, the measurement is a label describing the measure. For example, in sales reports, the sales will be sales, and the size will include a period of sales, sellers, a product or service, as well as a sales region. In reporting on industrial operations, the measure may be general production costs and products. Dimensions will be the date or production time, the stage of production or phase, even workers involved in the production process.

OLAP-cube data is the cornerstone of the system. The data in Cuba is organized using either a star or snowflake schemes. The center has a table of facts containing aggregates (measures). It is associated with a number of measurement tables containing information about measures. Dimensions describe how these measures can be analyzed. If the cube contains more than three dimensions, it is often called hypercubus.

One of the main functions belonging to Cuba is its static character that implies that the cube cannot be changed after its development. Consequently, the process of assembling the cube and the data model settings is a decisive step towards the appropriate data processing in the OLAP architecture.

Data Combining

The use of aggregations is the main reason for which requests are processed much faster in OLAP tools (compared to OLTP). Aggregations are reports of data that were previously calculated during their processing. All members stored in OLAP measurement tables determine requests that the cube can get.

In Cuba, the accumulation of information is stored in cells whose coordinates are specified by specific sizes. The number of units that may contain a cube depends on all possible combinations of measurement elements. Therefore, a typical cube in the application may contain an extremely large number of aggregates. The preliminary calculation will be performed only for key units that are distributed throughout Analytical Cuba online analytics. This will significantly reduce the time required to determine any aggregations when performing a query in the data model.

There are also two options related to aggregations with which you can increase the performance of the finished cube: create a cache aggregation of capabilities and use aggregation based on user query analysis.

Principle of operation

Typically, the analysis of operational information obtained from transactions can be performed using a simple spreadsheet (data values \u200b\u200bare presented in lines and columns). This is good, given the two-dimensional nature of the data. In the case of OLAP there are differences, which is associated with multidimensional array data. Since they are often obtained from different sources, the spreadsheet may not always effectively process them.

The cube solves this problem, as well as ensures the operation of the OLAP data storage of the data is logical and ordered. Business collects data from numerous sources and is presented in different formats, such as text files, multimedia files, electronic excel tables, Base access data And even OLTP databases.

All data are collected in the repository, filled directly from sources. It has untreated information obtained from OLTP and other sources, will be cleaned from any erroneous, incomplete and inconsistent transactions.

After cleaning and converting the information will be stored in the relational database. It will then be downloaded to a multidimensional OLAP server (or OLAP cube) for analysis. End users responsible for business applications, intelligent data analysis and other business operations will access the information they need from OLAP cube.

Advantages of the array model

OLAP is a tool providing quick queries performance, which is achieved due to optimized storage, multidimensional indexing and caching, which refers to a significant advantage of the system. In addition, the benefits are:

  1. Smaller data on disk.
  2. Automated calculation of aggregates more high level data.
  3. The array models provide natural indexation.
  4. Effective data extraction is achieved due to pre-structuring.
  5. Compactness for low dimension data sets.

The disadvantages of OLAP include the fact that some solutions (processing step) can be rather long, especially with large amounts of information. It is usually fixed by performing only incremental processing (the data that has been changed) is being studied.

Main analytical operations

Convolution (Roll-Up / Drill-Up) is also known as "Consolidation". Cutting includes collecting all data that can be obtained and calculating all in one or more dimensions. Most often, this may require the use of mathematical formula. As an OLAP example, you can consider a retail chain with outlets in different cities. To determine the models and foresee future sales trends, data about them from all points "rolled" to the main sales department of the company for consolidation and calculation.

Disclosure (Drill-DowN). This is the opposite of coagulation. The process begins with a large data set, and then divided into its smaller parts, thereby allowing users to view the details. In the example with the retail network analyst will analyze data on sales and view individual brands or products that are considered bestsellers in each of each shopping points In different cities.

Section (Slice and Dice). This process is when analytical operations include two actions: output a specific set of data from OLAP cube ("cutting" aspect of the analysis) and view it from different points of view or corners. This can happen when all the data of the outlets are obtained and entered into the hypercub. Analyst carries out of the OLAP Cube a set of data related to sales. Further, it will be viewed when analyzing sales of individual units in each region. At this time, other users can focus on assessing the economic efficiency of sales or evaluating the effectiveness of the marketing and advertising campaign.

Turn (Pivot). It turns the data axis to ensure the replacement of information presentation.

Database varieties

In principle, this is a typical OLAP cube that implements the analytical processing of multidimensional data using the OLAP Cube or any cube of data so that the analytical process can add dimensions as needed. Any information loaded into a multidimensional database will be stored or archived and can be caused when it is necessary.

Value

Relational OLAP (ROLAP)

ROLAP is an extended DBMS together with multidimensional data display to perform a standard relational operation.

Multidimensional OLAP (MOLAP)

Molap - implements work in multidimensional data

Hybrid online analytical processing (HOLAP)

In the HOLAP approach, aggregated final values \u200b\u200bare stored in a multidimensional database, and detailed information Stored in the relational base. This provides both the efficiency of the ROLAP model and the productivity of the MOLAP model.

OLAP Desk (Dolap)

In Desktop OLAP, the user downloads part of the data from the database locally or on its desktop and analyzes it. DOLAP is relatively cheaper for deployment because it offers very little functionality Compared to other OLAP systems

Web OLAP (WOLAP)

Web OLAP is an OLAP system available through a web browser. WOLAP is a three-level architecture. It consists of three components: client, intermediate software and database server

Mobile OLAP.

Mobile OLAP helps users receive and analyze OLAP data using their mobile devices.

Spatial Olap.

SOLAP is created to facilitate control of both spatial and non-spatial data in geographical information system (GIS)

There are less well-known OLAP systems or technologies, but these are the main ones that are currently used large corporations, business structures and even government.

OLAP Tools

Tools for online analytical processing are very well presented on the Internet in the form of both paid and free versions.

The most popular of them:

  1. Dundas Bi from Dundas Data Visualization is a browser-based platform for business analysts and data visualization, which includes integrated information panels, OLAP reports and data analytics.
  2. Yellowfin - Business Analytics Platform, which is a single integrated solution developed for companies of various industries and scales. This system is configured for enterprises in the field. accounting, advertising, agriculture.
  3. CLICDATA is a solution for business analysts (BI), intended for use in the main enterprises of small and medium-sized businesses. The tool allows end users to create reports and information panels. Board is designed to combine business analytics, corporate efficiency management and is a full-featured system that serves the company of the average and corporate level.
  4. DOMO is a cloud business management packager, which combines multiple data sources, including spreadsheets, databases, social networks and any existing cloudy or local software solution.
  5. INETSOFT STYLE INTELLIGENCE is software platform For business analysts, which allows users to create information panels, visual OLAP analysis technology and reports using the Mashup mechanism.
  6. BIRST from Infor COMPANY is network solution For business analysts and analysis, which combines ideas of various teams and helps to make informed decisions. The tool allows decentralized users to enlarge the corporate command model.
  7. Halo is a comprehensive supply chain management system and business analysts that helps in business planning and forecasting stocks for supply chain management. The system uses data from all sources - large, small and intermediate.
  8. Chartio is a cloud solution for business analysts, which provides founders, business groups, analytics of data and groups of products tools of the organization for everyday work.
  9. Exago Bi is a web solution intended for implementation in a web application. The implementation of Exago BI allows companies of all sizes to provide its customers with special, operational and interactive reporting.

Impact on business

The user will find OLAP in most business applications in different industries. An analysis is used not only to business, but also other stakeholders.

Some of its most common applications include:

  1. Marketing OLAP analysis of data.
  2. Financial reporting, which covers sales and expenses, budgeting and financial planning.
  3. Business process management.
  4. Sales analysis.
  5. Marketing databases.

The industries continue to grow, which means that soon users will see more OLAP applications. Multidimensional adapted processing provides more dynamic analysis. It is for this reason that these OLAP systems and technologies are used to evaluate the scripts "that, if" and alternative business scenarios.

The concept of multidimensional data analysis is closely associated with operational analysis, which is performed by the OLAP systems.

OLAP (on-line Analytical Processing) - Technology of operational analytical data processing using methods and means for collecting, storing and analyzing multidimensional data to support decision-making processes.

Main appointment OLAP Systems - Support analytical activity, arbitrary (often used the term Ad-Hoc) requests for analyst users. The goal of OLAP analysis is to check the emerging hypotheses.

At the origokov oLAP technology The founder of the relationship approach E. Codd. In 1993, he published an article entitled "OLAP for analyst users: what should it be." This paper presents the main concepts of operational analytical processing and the following 12 requirements are defined, which must be satisfied with the products allowing operational analytical processing. Tokmakov G.P. Database. Concept of databases, relational data model, SQL languages. P. 51.

The following rules set forth by the code and defining OLAP are listed below.

1. Multidimensionality - the OLAP system at the conceptual level should submit data in the form of a multidimensional model, which simplifies the processes of analysis and perception of information.

2. Transparency - the OLAP system must hide from the user a real implementation of a multidimensional model, a method of organization, sources, processing and storage facilities.

3. Availability - the OLAP system should provide the user with a single, consistent and holistic data model, providing access to data regardless of how and where they are stored.

4. Constant performance when developing reports - the performance of OLAP systems should not be significantly reduced by increasing the number of measurements for which the analysis is performed.

5. Client-server architecture - the OLAP system must be able to work in the "Client-Server" environment, because Most of the data that is required today to be subject to operational analytical processing are stored distributed. The main idea here is that the OLAP tool server component should be sufficiently intelligent and allow us to build a general conceptual scheme based on generalization and consolidation of various logical and physical schemes of corporate databases to ensure transparency effect.

6. Measurement equal rights - the OLAP system must support a multidimensional model in which all measurements are equal. If necessary, additional characteristics can be provided with separate measurements, but this possibility must be provided to any dimension.

7. Dynamic control of racked matrices - the OLAP system should ensure optimal processing of sparse matrices. The access speed should be stored regardless of the location of the data cells and be a constant value for models having a different number of measurements and a different degree of data productivity.

8. Support for multiplayer mode - the OLAP system should provide an opportunity to work to several users together with one analytical model or create various models from uniform data for them. It is possible both reading and record data, so the system should ensure their integrity and safety.

9. Unlimited cross-operations - the OLAP system should ensure the preservation of the functional relations described using a certain formal language between the cells of the hypercube when performing any cut operation, rotation, consolidation or detail operations. The system must independently (automatically) perform the conversion of the set relationship, without requiring the user to redefine them.

10. Intuitive data manipulation - the OLAP system should provide a method for performing operations of cut, rotation, consolidation and detail over a hyperkub without having to make a variety of actions with the interface. Measurements defined in the analytical model must contain all the necessary information to perform the above operations.

11. Flexible reporting capabilities - the OLAP system must support various methods Data visualization, i.e. Reports must be submitted in any possible orientation. Reporting tools must provide synthesized data or information that is the following from the data model in its possible orientation. This means that strings, columns or pages should be shown simultaneously from 0 to n measurements, where N-- Number Measurements of the entire analytical model. In addition, each measurement of the contents shown in one entry, column or page must allow any subset of the elements (values) contained in the dimension in any order.

12. Unlimited dimension and number of aggregation levels - research on the possible number of necessary measurements required in the analytical model showed that up to 19 measurements can be used at the same time. It follows the ultimate recommendation to ensure that the analytical tool can simultaneously provide at least 15, and preferably 20 measurements. Moreover, each of the total dimensions should not be limited by the number of user-defined levels of aggregation levels and consolidation paths.

Additional regulations of the code.

The set of these requirements served as a de facto definition of OLAP, quite often causes various complaints, for example, rules 1, 2, 3, 6 are the requirements, and rules 10, 11 - informalized wishes. Tokmakov G.P. Database. Concept of databases, relational data model, SQL languages. P. 68 Thus, the listed 12 Code requirements do not allow to accurately determine OLAP. In 1995, the code for the list added the following six rules:

13. Batch extraction against interpretation - the OLAP system should equally efficiently provide access to both its own and external data.

14. Support all OLAP-analysis models - the OLAP system must maintain all four data analysis models defined by the code: categorical, interpreting, speculative and stereotypical.

15. Processing of abnormalized data - the OLAP system must be integrated with abnormal data sources. Data modifications made in OLAP medium should not lead to changes in data stored in the source external systems.

16. Saving OLAP results: storing them separately from the source data - an OLAP system operating in the read-write mode, after modifying the source data, the results should be saved separately. In other words, the security of the source data is ensured.

17. The exclusion of missing values-- OLAP-system, presenting these to the user, must discard all the missing values. In other words, missing values \u200b\u200bshould differ from zero values.

18. Processing of missing values \u200b\u200b- the OLAP system must ignore all the missing values \u200b\u200bwithout taking into account their source. This feature is associated with the 17th rule.

In addition, the Codd broke all 18 rules for the next four groups, calling them features. These groups received names in, S, R and D.

The main features (B) include the following rules:

Multidimensional conceptual representation of data (rule 1);

Intuitive data manipulation (rule 10);

Availability (rule 3);

Batch extraction against interpretation (rule 13);

Support for all OLAP analysis models (rule 14);

Architecture "Client-server" (rule 5);

Transparency (rule 2);

Multiplayer support (rule 8)

Special features (s):

Processing of abnormalized data (rule 15);

Saving OLAP results: storing them separately from the source data (rule 16);

Elimination of missing values \u200b\u200b(rule 17);

Processing of missing values \u200b\u200b(rule 18). Features of reporting (R):

Reporting flexibility (rule 11);

Standard report performance (rule 4);

Automatic setting physical level (Changed Original Rule 7).

Measurement Management (D):

Universality of measurements (rule 6);

Unlimited number of measurements and aggregation levels (rule 12);

Unlimited operations between dimensions (rule 9).