Relational database elements. The basic concepts of relational databases. Basic concepts of relational databases

Relational database and its features. Types of ties between relational tables

Relational database - This is a set of interrelated tables, each of which contains information about the objects of a certain type. The table row contains data on one object (for example, the product, client), and the table columns describe the various characteristics of these objects - attributes (for example, name, product code, customer information). Records, i.e., the table lines, have the same structure - they consist of fields that store the attributes of the object. Each field, i.e. column, describes only one characteristic of the object and has a strictly defined data type. All records have the same fields, only different in them are displayed. information Properties object.

In the relational database, each table must have a primary key - a field or combination of fields that identify each line of the table. If the key consists of several fields, it is called composite. The key must be unique and unambiguously determine the record. By key value, you can find a single entry. Keys also serve to streamline information in the database.

Table of relational databases must meet the requirements of the normalization of relations. The normalization of relations is a formal device for restrictions on the formation of tables that allows you to eliminate duplication, ensures the consistency of stored in the database, reduces labor costs to maintain a database.

Let the table of a student, containing trail-racing fields: Group number, FULL NAME, dutch No., date of birth, specialty number, faculty name. Such a storage organization will have a number of shortcomings:

Duplication of information (the name of the specialty and faculty is repeated for each student), therefore, the volume of the database will increase;
The information update procedure in the table is difficult due to the need to edit eachtable entry.

Normalization of tables is designed to eliminate these shortcomings. Available three normal forms of relationships.

The first normal form. The relational table is shown to the first normal form if and only if no one of its rows contains more than one value in any field and none of its key fields is empty. So, if a student needs to receive information on the name of the student, the FIO field should be divided into part of the name, name, patronymic.

Second normal form. The relational table is set in the second normal form if it satisfies the requirements of the first normal form and all its fields that are not included in the primary key are associated with a full functional dependence with the primary key. To bring the table to the second normal form, it is necessary to determine the functional dependence of the fields. The functional dependence of the fields is the dependence, with a curtain in the instance of an information object, only one value of the descriptive props corresponds to a certain value of the key props.

Third normal form. The table is in the third normal form if it meets the requirements of the second normal form, none of its non-key fields depends on the functionally from any other non-sequence field. For example, in the table, the student (number of the Group, FULL NAME, the number of the test book, date of birth, older) three fields - the number of the test book, the number of the group, the elder are in transitive dependence. The number of the group depends on the number of the test book, and the elder depends on the number of the group. To eliminate transitive addiction, part of the student table fields are required to transfer a group to another table. Tables will take the following form: Student (group number, FULL NAME, CREDIT Book No., date of birth), Group (group number, headlight).

The following operations are possible over relational tables:

Combining tables with the same structure. Result - Total Table: First, the first, then the second (concatenation).
Crossing tables with the same structure. Result - those records that are in both tables are selected.
Subtract tables with the same structure. Result - those records that are not in subtractable are selected.
Sampling (horizontal subset). The result is selected entries that meet certain conditions.
Projection (vertical subset). The result is a ratio containing some of the fields from the source tables.
Cartesovo The product of the two tables of recording of the resulting table are obtained by combining each of the first table record with each entry of another table.

Relational tables can be associated with each other, therefore, the data can be removed simultaneously from several tables. Tables are associated with each other in order to ultimately reduce the volume of the database. The connection of each pair of tables is ensured if there are identical columns in them.

The following types of information links exist:

one to one;
one-to-many;
Many-co-many.

Communication one-to-one It assumes that only one attribute of the second table corresponds to one attribute of the first table and vice versa.

Communication one-to-manyit assumes that one attribute of the first table corresponds to several attributes of the second table.

Communication many-co-many It assumes that one attribute of the first table corresponds to several attributes of the second table and vice versa.

Relational databases are the most common at present, although along with generally accepted advantages has a number of shortcomings. The advantages of the relational approach can be attributed to:

The presence of a small set of abstractions, which allow you to relatively simply simulate most of the common object areas and allow accurate formal definitions, remaining intuitive;

The presence of a simple and at the same time powerful mathematical apparatus based mainly on the theory of sets and mathematical logic and ensuring the theoretical basis for a relational approach to the organization of databases;

The possibility of unique manipulation of data without the need to know the specific physical organization of databases in external memory.

Relational systems are not immediately widely widespread. While the main theoretical results in this area were obtained in the 70s, and at the same time the first prototypes of relational DBMS appeared, for a long time it was impossible to achieve the effective implementation of such systems. However, the advantages noted above and the gradual accumulation of methods and algorithms for the organization of relational databases and the management of them led to the fact that in the mid-1980s, relational systems have practically displaced early DBMS from the world market.

Currently, the main subject of the criticity of relational DBMS is not their insufficient effectiveness, but inherent in these systems some limitations (direct consequence of simplicity) when using in the so-called unconventional areas (the most common examples are the design automation systems) in which extremely complex data structures are required. Another commonly noted lack of relational databases is the impossibility of adequate reflection of the semantics of the subject area. In other words, the possibility of presenting knowledge about the semantic specifics of the subject area in relational systems is very limited. Modern research in the field of exploration systems is mainly devoted to the elimination of these shortcomings.

The main concepts of relational databases are data type, domain, attribute, tuple, primary key and attitude.

Concept data type The relational data model is fully adequately affected by data type in programming languages. Usually in modern relational databases allowed storage of symbolic, numeric data, bit lines, specialized numeric data (such as "money"), as well as special "temporal" data (date, time, time interval). The approach to expanding the relationships of relational systems by abstract data types (appropriate capabilities has, for example, the Ingres / Postgres family systems) is developing.

The structure of these relational model.The relational data model organizes and represents data in the form of tables or relatives. Relation - This is the term who came from mathematics and denoting a simple two-dimensional table. In a relational approach to building databases, the terminology of the theory of relations is used. The simplest two-dimensional table is defined as attitude.

Table It is the main type of data structure (object) of the relational model. The structure of the table is determined by the set columns. Each row of the table contains one value in the appropriate column. The table cannot be two identical lines. The total number of rows is not limited.

Column Corresponds to some data item - attribute which is the simplest data structure. The table cannot contain multiple elements, a group or a repeating group, as in the network and hierarchical models considered above. Each column table must have name The corresponding data element (attribute).

The column of the table with the values \u200b\u200bof the corresponding attribute is called domain and strings with values \u200b\u200bof different attributes - consignment.

Relational table-related. In fig. 9 shows a relationship table relationship R.. Formal definition relations R (relational table) relies on the idea of \u200b\u200bher domainsD. I, (columns) and cortech K. j (lines). R ratio, defined on sets of domains (D i), is called a subset decartova (direct) product of domainsD 1 * d 2 * ... .. * d n

Table-attitude (See Fig. 1) Contains columns with the names of data elements - attributes (A 1, A 2, ...). Attribute values \u200b\u200bD are in the content part of the table and form rows and columns. Many attribute values \u200b\u200bin one column forms one domainD I.. A plurality of attribute values \u200b\u200bin one line form one court To j. Attitude R is formed by many ordered tuples.

R \u003d (kj), j \u003d 1- m kj \u003d (d 1j, d 2 j, ... d nj),

where n is the number of relationship domains; Determines the dimension of the relationship;

j - Corut number;

m - the total number of tuples in relation to called coordinate numberrelations.

Fig.9. Illustration of relational table relationship

Domain. In the most general form, the domain is determined by the task of a certain basic type of data to which the domain elements and arbitrary logical expressionapplied to the data type item. If the calculation of this logical expression gives the result of "truth", the data element is an element of the domain.

The most correct intuitive interpretation of the concept of the domain is the understanding of the domain as a permissible potential set of values this type. For example, the domain "Names" in our example is defined on the basic type of strings of characters, but the number of values \u200b\u200bcan include only those lines that can portray the name (in particular, such lines cannot begin with a soft sign).

It should also be noted the semantic burden of the concept of the domain: the data is considered comparable only when they relate to one domain. In our example, the domains of the domains "pass numbers" and "group number" refer to the type of integers, but are not comparable. Note that in most relational DBMS, the concept of the domain is not used, although in OGAS1E V.7 it is already supported.

Relation scheme, database schema. The relationship scheme is a named set of pairs (attribute name, domain name (or type, if the concept of the domain is not supported)). The degree or "aria" scheme of the relationship is the power of this set. The degree of relationship of the example is six, that is, it is 6-arral. If all attributes of one relationship are defined on different domains, intelligently use to naming attributes the names of the respective domains (not forgetting, of course, that it is only in a convenient way Naming and does not eliminate differences between the concepts of the domain and attribute). The database scheme (in the structural sense) is a set of named relationship schemes.

The list in which the names of relational tables are given with the enumeration of their attributes (keys are underlined) and the definitions of external keys are called relational diagram of the database. It is the preliminary result of the creation of the stage of the life cycle of the relational database. Example:

Worker [ Worker-ID, Name, Hourly-Rate, Skill-Type, SVPV-ID]

External keys: Skill-Type refers Skill

SVPV-ID refers to worker

Assignment [ Worker-ID, Bldg-ID, Start-Date, Number-of-Days]

External keys: Worker-ID refers worker

BlDG-ID refers to Bvilding

Bvilding [ Bldg-ID, ADRESS, TYPE, QLTY-LEVEL, STATVS]

Skill [ Skill- Type., Bonus-Rate, Hours-Per-Week]

Tuple, attitude. The tuple corresponding to this relationship scheme is a set of pairs (attribute name, a value that contains one entry of each attribute name belonging to the relationship scheme. "Value" is the permissible value of the domain of this attribute (or data type, if the concept of the domain is not supported). Thus, the degree or "ARNOST" of the Corget, i.e. the number of elements in it coincides with the "Arity" of the corresponding scheme of the relationship. Simply put, a tuple is a set of named values \u200b\u200bof the specified type.

Attitude - This is a set of tuples corresponding to one relationship scheme. Sometimes, not to be confused, they say "ratio-scheme" and "attitude-copy", sometimes a scheme of relationship is called the title of the relationship, and the attitude as a set of tuples is a relationship. In fact, the concept of the relationship scheme is closest to the concept of a structural type of data in programming languages. It would be quite logically allowing to separately define the relationship scheme, and then one or more relations with this scheme.

However, this is not accepted in relational databases. The name of the relationship scheme in such databases always coincides with the name of the corresponding specimen. In classical relational databases, after determining the database schema, only settings are changed. They may appear new and removable or modify existing tales. However, many implementations are allowed to change the database schema: determining the new and change of existing relationship circuits. It is called evolution of the database schema.

The usual everyday representation of the relationship is the table, the title of which is the scheme of the relationship, and lines - the cortex of the specimen; In this case, the names of the attributes refer to the columns of this table. Therefore, sometimes they say the "Column of the Table", meaning "attribute of the relationship". When we proceed to consider practical issues of organizing relational databases and management tools, we will use this everyday terminology. This terminology adhere to most commercial relational DBMS.

The relational database is a set of relationships whose names coincide with the names of the relationship schemes in the database scheme.

As can be seen, the main structural concepts of the relational data model (except for the concept of domain) have a very simple intuitive interpretation, although in the theory of relational database they are all determined absolutely formally and accurately.

Key table relationship. Corgers should not be repeated inside tables-Relationships And, accordingly, they must have a unique identifier - primary key.One or more attributes whose values \u200b\u200bunambiguously identify the table row are key Tables.

The primary key is called simple , When it consists of one attribute, or composite When it consists of several attributes. In addition to the primary key, in relation to may exist and secondary keys.

Secondary key – This is such a key whose values \u200b\u200bcan be repeated in different rows-tales. There may be a group of rows with the same value of the secondary key.

External key - This is a set of attributes of one table, which is the key of another (or the same) table. External keys provide important links between tables. They are used to associate data from one table with data in another table. The external key attributes do not have to have the same names as the key attributes that they match.

Similar information.

This article we begin a new cycle dedicated to databases, modern data access technologies and their processing. Throughout this cycle, we plan to consider the most popular desktop and server management systems. databases (DBMS), data access mechanisms (OLD DB, ADO, BDE, etc.) and utilities for working with databases (administration tools, report generators, data graphical representation tools). In addition, we plan to pay attention to data publishing methods in the Internet, as well as such popular methods for processing and storing data, such as OLAP (on-Line Analytical Processing), and creating data warehouses (Data Warehouse).

In this article, we will consider the basic concepts and principles underlying database management systems. We will discuss the relational data model, the concept reference integrity and the principles of data normalization, as well as data design tools. Then we will tell, what are the DBMS, which objects can be contained in the databases and how requests are made to these objects.

The main concepts of relational databases

Let's start with the basic concepts of DBMS and a brief introduction to the theory of relational databases - the most popular data storage method is now.

Relational data model

Relational data model E.F. Koddd (Dr. E.F.CODD), a well-known database researcher, in 1969, when he was an IBM employee. For the first time, the basic concepts of this model were published in 1970. A RELATIONAL MODEL OF DATA FOR LARGE SHARED DATA BANKS, CACM, 1970, 13 N 6).

The relational database is a data warehouse containing a set of two-dimensional tables. A set of means to manage such a storage called relational database management system (RSUBD). RSUBD may contain utilities, applications, services, libraries, application creation tools and other components.

Any table of relational database consists of row (also referred to as well notes) I. column (also referred to as well fields). In this cycle, we will use both pairs of terms.

Table rows contain information about the facts presented in it (or documents, or people, in one word, - about the same type of objects). At the intersection of the column and lines are the specific values \u200b\u200bof the data contained in the table.

Data in tables meet the following principles:

Each value contained at the intersection of the string and column must be atomic (that is, not dismembered by several values).
Data values \u200b\u200bin the same column must belong to the same type available for use in this DBMS.
Each entry in the table is unique, that is, there are no two entries in the table with a fully coinciding set of values \u200b\u200bof its fields.
Each field has a unique name.
The sequence of fields in the table is insignificant.
The sequence of records is also insignificant.

Despite the fact that the rows of the tables are considered disordered, any database management system allows sorting strings and columns in samples from it by the desired user.

Since the sequence of columns in the table is insignificant, the appeal to them is made by name, and these names for this table are unique (but not necessarily be unique to the entire database).

So, now we know that relational databases consist of tables. To illustrate some theoretical provisions and to create examples, we need to choose any database. In order not to "invent the wheel", we will use the NorthWind database in the package of Microsoft SQL Server And Microsoft Access.

Now let's look at the links between the tables.

Keys and connections

Let's take a look at the Customers table fragment (clients) from the NorthWind database (we removed fields from it, insignificant to illustrate links between tables).

Since the lines in the table are disordered, we need a column (or a set of multiple columns) for the unique identification of each row. Such a column (or a set of columns) is called primary key (pRIMARY KEY.). The primary key of any table is required to contain unique non-empty values \u200b\u200bfor each row.

If the primary key consists of more than one column, it is called compound primary key (composite Primary Key).

A typical database usually consists of several related tables. Fragment of the Orders Table (orders).

The CustomerID field of this table contains a client identifier that plays this order. If we need to find out what is called the company that has been placed the order, we must search the same client ID value in the CustomerID field of the Customers table and in the found line, read the value of the CompanyNAME field. In other words, we need to tie two tables, customers and orders, along the CustomerID field. The column indicating the entry in another table associated with this entry is called external key (foreign Key.). As we see, in the case of the Orders table, the external key is the CustomerID column (Fig. 1).

In other words, the external key is a column or a set of columns whose values \u200b\u200bcoincide with the existing values \u200b\u200bof the primary key of another table.

This relationship between tables is called commonwealth (relalationship.). The connection between the two tables is set by assigning the external key values \u200b\u200bof one table by the values \u200b\u200bof the primary key.

If each client in the table Customers can accommodate only one order, they say that these two tables are associated with the relationship one to one (one-to-one relationship). If each client in the table Customers can place zero, one or many orders, say that these two tables are associated with the ratio one-to-many (one-to-Many Relationship) or by the ratio master-Detail. Such ratios between tables are used most often. In this case, the table containing the external key is called detail table, and a table containing the primary key that determines the possible values \u200b\u200bof the external key called master Table.

A group of related tables is called scheme Database ( database Schema.). Information about tables, their columns (names, data type, field length), primary and external keys, as well as other database objects, called metadata (metadata.).

Any manipulations with data in databases, such as the choice, insert, delete, update data, change or select metadata, are called request to the database ( query.). Typically, queries are formulated on a language that can be both standard for different DBMSs, and depending on the specific DBMS.

Refine integrity

Above, we have already talked about the fact that the primary key of any table must contain unique non-empty values \u200b\u200bfor this table. This statement is one of the rules. reference integrity (referential Integrity.). Some (but not all) DBMS can control the uniqueness of primary keys. If the DBMS controls the uniqueness of the primary keys, then when trying to assign a primary key to the value already existing in another record, the DBMS will generate a diagnostic message that usually contains the phrase pRIMARY KEY VIOLATION.. This message can later be transmitted to the application with which the end user manipulates data.

If two tables are associated with the relation master-Detail, external key detail-tables must contain only those values \u200b\u200bthat are already available among primary key values. mASTER-tables. If the correctness of the external key values \u200b\u200bis not monitored by the DBMS, we can talk about impaired reference integrity. In this case, if we delete a record from the Customers table having at least one associated with it detail-recording in the Orders table, this will lead to the fact that the Orders table will be recorded about orders placed by unknown by whom. If the DBMS controls the correctness of the external key values, then when trying to assign an external key to the value that is missing among the primary master key values, or when removing or modifying the Master-table entries leading to a reference integrity disorder, the DBMS will generate a diagnostic message that usually comprises a phrase foreign Key Violation.which can later be transmitted to the user application.

Most of the modern DBMS, such as Microsoft Access 97, Microsoft Access 2000 and Microsoft SQL Server 7.0, are able to monitor compliance with reference integrity rules, if any are described in the database. For this purpose, such DBMS uses various database objects (we will discuss them a little later). In this case, all attempts to violate reference integrity rules will be suppressed with simultaneous generation of diagnostic messages or exceptions ( database Exceptions.).

Introduction to data normalization

The data design process is the definition of metadata in accordance with the tasks of the information system in which the future database will be used. Details on how to make an analysis of the subject area, create "Essence-Communication" diagrams ( ERD - Entity-Relationship Diagrams) and data model, go beyond the scope of this cycle. Interested in these questions can contact, for example, to the book of K.J.Detit "Introduction to the database systems" ("Dialectics", Kiev, 1998).

In this article we will discuss only one of the basic principles of data design - the principle normalization.

Normalization It is a process for reorganizing data by eliminating repeated groups and other contradictions in the storage of data in order to bring the tables to the species that allows you to carry out consistent and correct data editing.

The theory of normalization is based on the concept of normal forms. It is said that the table is in this normal form, if it satisfies a certain set of requirements. There are theoretically five normal forms, but in practice only the first three are usually used. Moreover, the first two normal forms are essentially intermediate steps to bring the database to the third normal form.

First normal form

We illustrate the normalization process on the example using the data from the NorthWind base. Suppose we register all ordered products in the following table. The structure of this table has the form (Fig. 2).

So that the table corresponds to the first normal form, all values \u200b\u200bof its fields must be atomic, and

all records are unique. Therefore, any relational table, including the OrderedProducts table, by definition, is already in the first normal form.

Nevertheless, this table contains redundant data, for example, the same client information is repeated in records of each ordered product. The result of data redundancy are modification anomalies data- Problemsarising when adding, changing or deleting records. For example, when editing data in the OrderedProducts table, the following problems may occur:

The address of a particular client may be contained in the database only when the client ordered at least one product.
When you delete a recording about the ordered product simultaneously delete information about the order itself and about the client, which has been emphasized.
If, God forbid, the customer has changed the address, you will have to update all the records about the products ordered to them.

Some of these problems can be solved by bringing the database to second normal form.

Second normal form

It is said that the relational table is in second normal formif it is in the first normal form and its non-selection fields fully depend from the entire primary key.

The OrderedProducts table is in the first, but not in the second normal form, since the CustomerID field, Address and OrderDate depend on the ORDERID field, which is part of the composite primary key (Order, ProductID).

To move from the first normal form to the second, you need to perform the following steps:

Determine which parts you can break the primary key so that some of the non-field fields depend on one of these parts ( these parts are not required to consist of one column!).
Create a new table for each such part of the key and group-dependent fields and move them to this table. Part of the former primary key will be the primary key of the new table.
Remove from the source table of the field displaced to other tables, except those of them that will become external keys.

For example, to bring the OrderedProducts table to the second normal form, you need to move the CustomerID, Address and OrderDate fields to the new table (call it ordersInfo), while the ORDERID field will become the primary key of the new table (Fig. 3).

As a result, new tables will acquire such a species. However, the tables in the second, but not in the third normal form still contain data modification anomalies. Here are what they are, for example, for the OrdersInfo table:

The address of a specific client can still be contained in the database only when the client ordered at least one product.
Deleting an order for ordering in the OrderSinfo table will delete a record about the client itself.
If the customer has changed the address, you will have to update several records (although there are usually less than in the previous case).

Eliminate these anomalies by switching to third normal form.

Third normal form

It is said that the relational table is in third normal formIf it is in the second normal form and all of its non-selection fields depend only on the primary key.

The OrderDetails table is already in the third normal form. The non-selective Quantity field depends entirely on the composite primary key (OrderID, ProductID). However, the OrdersInfo table in the third normal form is not found, as it contains the relationship between the non-selective fields (it is called transitive addiction- transitivedependency) - The Address field depends on the CustomerID field.

To move from the second normal form to the third, you need to perform the following steps:

Determine all fields (or groups of fields), on which other fields depend.
Create a new table for each such field (or group of fields) and group-dependent fields and move them to this table. The field (or group of fields), on which all other displaced fields depend on this, with the primary key of the new table.
Remove displaced fields from the source table, leaving only those of them that will become external keys.

To bring the OrdersInfo table to a third normal form, create a new table of Customers and move the CustomerID and Address fields into it. The Address field from the source table is deleted, and the CustomerID field will leave - now it is an external key (Fig. 4).

So, after bringing the source table to the third normal form of the tables, three - Customers, Orders and OrderDetails.

Advantages of normalization

Normalization eliminates data redundancy, which reduces the amount of stored data and get rid of the abnormalities described above. For example, after bringing the above-considered database to the third normal form, the following improvements are evident:

The client's address information can be stored in the database, even if it is only a potential client that has not yet been placing a single order.
Information about the ordered product can be deleted without fear of deleting client data and order.

Changing the client's address or the order registration date now requires only one recording change.

How do databases design

Usually, modern DBMSs contain means to create tables and keys. There are also utilities supplied separately from the DBMS (and even serving several different DBMS simultaneously), allowing you to create tables, keys and communication.

Another way to create tables, keys and communication in the database is the writing of the so-called DDL script (DDL - Data Definition Language; We will talk about it a little later).

Finally, there is another way that is becoming more and more popular - this is the use of special tools called CASE (CASE means Computer-Aided System Engineering). There are several types of CASE tools, but to create databases most often use tools to create "Entity-Relationship Diagrams" (E / R Diagrams). Using these tools is created so-called logical The data model describing the facts and objects to be registered in it (in such models prototypes of tables are called entities (entilities), and the fields - their attributes (attributes). After establishing links between entities, defining attributes and normalization, the so-called is created. physical Data model for a specific DBMS in which all tables, fields and other database objects are defined. After that, you can generate either the database itself or the DDL scenario for its creation.

List of most popular CASE-funds.

Tables and fields

Tables are supported by all relational DBMS, and data can be stored in their fields. different types. The most common data types.

Indexes

Just above, we talked about the role of primary and external keys. In most relational DBMS, the keys are implemented using objects called indexes that can be defined as a list of record numbers indicating in which order to provide them.

We already know that entries in relational tables are disordered. However, any record at a specific point in time has a completely definite physical location in the database file, although it can change in the process of editing data or as a result of "internal activities" of the DBMS itself.

Suppose at some point the time recording in the Customers table was kept in such a manner.

Suppose we need to get this data ordered by the CustomerID field. Lowering technical details, we can say that the index on this field is a sequence of record numbers, according to which they need to be output, that is:

1,6,4,2,5,3

If we want to streamline the entry on the Address field, the sequence of record numbers will be different:

5,4,1,6,2,3

Storage of indexes requires significantly less space than storage of different sorted versions of the table itself.

If we need to find customer data from which Customerid begins with the "Bo" characters, we can find the location of these records using the index (in this case 2 and 5 (it is obvious that in the index of these records go in a row) and then read It is the second and fifth record that instead of viewing the entire table. Thus, the use of indexes reduces the data sample time.

We have already talked about the fact that the physical location of records can change in the process of editing these users, as well as as a result of manipulations with database files conducted by the DBMS itself (for example, data compression, garbage assembly, etc.). If the appropriate changes and index occur, it is called supported And such indexes are used in most modern DBMS. The implementation of such indexes leads to the fact that any data change in the table entails a change in its associated indices, and this increases the time required by the DBMS to conduct such operations. Therefore, when using such DBMS, only those indices should be created, which are really necessary and guided by what requests will be met most often.

Restrictions and rules

Most modern server DBMS contain special objects called restrictions (Constraints), or rules (Rules). These objects contain information about the constraints imposed on the possible values \u200b\u200bof the fields. For example, using such an object, you can set the maximum or minimum value for this field, and after that the DBMS will not allow you to save a record that does not satisfy this condition in the database.

In addition to the limitations associated with the installation range of data changes, there are also reference limitations (referential constraints, such as the Master-Detail connection between the Customers and Orders tables can be implemented as a restriction containing the requirement that the value of the CustomerID field (external key) in the Orders table was equal One of the existing values \u200b\u200bof the CustomerID Customer field field values.

Note that not all DBMS support restrictions. In this case, to implement similar functionality of the rules, you can either use other objects (for example, triggers), or store these rules in client applications that work with this database.

Representation

Almost all relational DBMS supports submissions (Views). This object is a virtual table providing data from one or more real tables. Really, it does not contain any data, but only describes their source.

Often such objects are created for storage in the databases of complex queries. In fact, VIEW is a storage request.

Creating views In most modern DBMS is carried out by special visual means, allowing you to display the necessary tables on the screen, to establish links between them, select the displayed fields, enters the restrictions on the record and others.

Often these objects are used to ensure data security, for example, by resolving data viewing with their help without providing access directly to the tables. In addition, some submission objects can return different data depending, for example, on behalf of the user, which allows it to receive only its data.

Triggers and stored procedures

Triggers and stored procedures supported in most modern server DBMS are used to store executable code.

Stored procedure is a special type of procedure that is performed by the database server. Stored procedures are written in a procedural language, which depends on the specific DBMS. They can call each other to read and change data in tables, and they can be called from the client application operating with the database.

Stored procedures are commonly used when performing common tasks (for example, the reduction of an accounting balance). They can have arguments, return values, error codes and sometimes row and speaker sets (such a data set is sometimes called the term DataSet). However, the last type of procedures is not supported by all DBMS.

Triggers also contain executable code, but, in contrast to the procedures, cannot be called from the client application or stored procedure. The trigger is always associated with a specific table and runs when when editing this table comes an event with which it is connected (for example, inserting, deleting or updating the record).

In most DBMS supporting triggers, you can define several triggers that are performed when the same event occurs and determine order from execution.

Objects to generate primary keys

Very often, the primary keys are generated by the DBMS itself. It is more convenient than their generation in the client application, since when multiplayer operation, key generation using DBMS is the only way to avoid duplication of keys and receive their consecutive values.

Different DBMSs use different objects to generate keys. Some of these objects store an integer and rules on which the value following it is generated, it is accomplished with the help of triggers. Such objects are supported, for example, in Oracle (in this case, they are called sequences sequences) and in IB Database (in this case, they are called generators - Generators).

Some DBMS support special types of primary keys. When adding records, such fields are filled with automatically sequential values \u200b\u200b(usually integer). In the case of Microsoft Access and Microsoft SQL Server, such fields are called Identity Fields, and in the case of Corel Paradox - Auto-Creque Fields (AutoInCrement Fields).

Users and Rolls

Preventing unauthorized data access is a serious problem that is solved in different ways. The easiest is the password protection or the entire table, or some of its fields (such a mechanism is supported, for example, in Corel Paradox).

Currently, another data protection method is more popular - Creating a list of users (Users) with names (Passwords). In this case, any database object belongs to a specific user, and this user provides other users to read or modify data from this object or to modify the object itself. This method is applied in all server and some desktop DBMS (for example, Microsoft Access).

Some DBMSs, mostly server, support not only the list of users, but also roles (Roles). The role is a set of privileges. If a specific user receives one or more roles, and with them - and all the privileges defined for this role.

Requests for databases

Modification and data selection, change in metadata and some other operations are carried out using queries (Query). Most of the modern DBMSs (and some application development tools) contain funds to generate such requests.

One of the ways to manipulate the data is called "Queries by Example" (QBE) - a pattern by sample. QBE is a means for visual binding tables and selecting fields to be displayed as a result of a query.

In most DBMS (with the exception of some desktop), the visual construction of a query using QBE leads to the generation of the query text using a special SQL query language (Structured Query Language). You can also write a request directly in SQL.

Cursors

Often the query result is a set of rows and columns (DataSet). In contrast to the relational table, in such a set of strings are ordered, and their order is determined by the source request (and sometimes - the presence of indexes). Therefore, we can determine the current line in such a set and pointer to it, which is called the cursor (CURSOR).

Most of the modern DBMS support the so-called bidirectional cursors (BI-Directional Cursors), allowing you to move along the resulting set of data as ahead and forth. However, some DBMS support only unidirectional cursors, allowing you to move on the data set only forward.

SQL language

STRUCTURED QUERY LANGUAGE (SQL) is an unprofitable language used to formulate queries to databases in most modern DBMS and is currently an industrial standard.

The necromance language means that it can be specified that you need to do with the database, but you can not describe the algorithm of this process. All SQL processing algorithms are generated by the DBMS itself and do not depend on the user. SQL language consists of a set of operators that can be divided into several categories:

Data Definition Language (DDL) - data definition language that allows you to create, delete and change objects in databases
Data Manipulation Language (DML) - data management language that allows you to modify, add and delete data in the available database objects
Data Control Languages \u200b\u200b(DCL) - language used to manage user privileges
Transaction Control Language (TCL) - language to manage changes made by groups of operators
CURSOR CONTROL LANGUAGE (CCL) - Operators to determine the cursor, the preparation of SQL operators to perform and some other operations.

You will tell more detail about the SQL language in one of the following articles of this cycle.

User-defined functions

Some DBMSs allow the use of user-defined features (UDF-User-Defined Functions). These functions are usually stored in external libraries and must be recorded in the database, after which they can be used in queries, triggers and stored procedures.

Since the functions defined by the user are contained in libraries, they can be created using any means of development that allows you to create libraries for the platform on which this DBMS functions.

Transactions

Transaction (Transaction) is a group of data operations that are either fulfilled together, or are completely canceled together.

Completion (Commit) transaction means that all operations included in the transaction are successfully completed, and the result of their operation is saved in the database.

Rollback of the transaction means that all the executed operations included in the transaction are canceled and all database objects affected by these operations are returned to its original state. To implement the ability to roll back the transaction, many DBMS support entry into log files that allow you to restore the source data when rollback.

The transaction may consist of several nested transactions.

Some DBMSs support the two-phase completion of transactions (TWO-Phase Commit) is a process that allows transactions over multiple databases related to the same DBMS.

To support distributed transactions (i.e. transactions over databases managed by different DBMS), there are special means called transaction monitors (Transaction Monitors).

Conclusion

In this article, we discussed the main concepts of constructing relational DBMS, the basic principles of data design, and also told about which objects can be created in databases.

In the next article we will introduce our readers with the most popular desktop DBMS: Dbase, Paradox, Access, Visual Foxpro, Works and discuss their main features.

ComputerPress 3 "2000

Home\u003e Lecture

Lecture BD Chapter 2 Relational databases 2.1. Terms and Definitions The development of relational databases began in the late 1960s, when the first works appeared, in which the possibility of using the methods familiar to the specialist for a specialist of the form-raised presentation of data in the form of tables was discussed. Some experts such a way of presenting information called solutions tables, other - tabular algorithms. Teorel ticks of relational databases tabular way of presenting information was called datalog models. The founder of the theory of relational databases is the employee of the company IVM Dr. E. F. Codd, published on June 6, 1970. The article "Relational data model for large collective data banks" "A Relational Model of Data for Large Shared Data Banks". In this article, the term "Relative Data Model" was used for the first time, which laid the beginning with relational databases. The theory of relational databases developed in the 1970s. In the United States, Dr. E. F. Codd, relied on the mathematical apparatus of Te-Oryol sets. He proved that any data set can be pre-put in the form of two-dimensional tables of a special species known in Matema-tick as a relationship. From the English word "Relation" "attitude") and the name "Relational data model" occurred. Currently, the theoretical basis for the design of databases (database) is the mathematical apparatus of relational algebra (see subdaz. 1.2). Thus, the relational database is inforing (data) on objects represented in the form of two-dimensional arrays - tables united by certain bonds. The database may consist of one table. Before the attractions, to further study the relational databases, consider the term and definitions used in theory and practice. Database table- a two-dimensional array containing the in-formation of one class of objects. In the theory of relational al-Gebra, a two-dimensional array (table) is called relation. The table consists of the following elements: field, cell, per-letter (Fig. 2.1). Fieldcontains the values \u200b\u200bof one of the signs characterizing the bod objects. The number of fields in the table corresponds to the number of at-characters characterizing the bod objects. 22. Cellcontains the specific value of the corresponding field (the feature of one object). Record- Table row. It contains the values \u200b\u200bof all signs characterizing one object. The number of records (lines) corresponds to the number of objects that are contained in the tab. In the theory of database terms recordcorresponds to the concept cor-Tem- Sequence of attributes related to each other and (s). In the theory of graphs courtthis means a simple branch of the oriented graph - wood. In tab. 2.1 The terms applied in theory and practice of developing relational databases are shown. One of the important concepts needed to build an op-tymal structure of relational databases is the concept of a key, or a key field. Keythe field is considered, the values \u200b\u200bof which are uniquely determine the values \u200b\u200bof all other fields in the table. For example, the "passport number" field, or the "tax point identification number (INN)", uniquely determines the characteristics of any individual (when drawing up the corresponding database tables for personnel departments or enterprise accounting).
23

The key key may not be one, but several fields. In this case, the plurality of fields can be a possible key key only when two conditions independent of time are satisfied: uniqueness and minimality. Each field, not in the primary key, is called not a key field of the table.

Uniquenessthe key means that at any time the database table cannot contain any two different records that have the same key fields. Performance conditions of uniqueness is mandatory. Condition minimalitythe key fields means that only the combination of the values \u200b\u200bof the selected fields meets the requirements of the Uni-calcity of the database table entries. This also means that none of the inbound fields cannot be excluded from it without disturbing uniqueness. When you form a database table key, consisting of multiple fields, you must be guided by the following positions: should not be included in the key field key, the values \u200b\u200bof which themselves definitely identify entries in the tab. For example, you should not create a key containing simultaneously "passport number" and "ID number", since each of these attributes can uniquely identify entries in the table; It is impossible to include a non-unique field in the key, i.e., the field whose values \u200b\u200bcan be repeated in the table. Each table must have at least one warning key that is selected as primary key.If there are fields in the table, each of which definitely define the records, these fields can be accepted as alternative keys.For example, if you select the identification number of the nano-patch as the primary key, then the passport number will be an alternative key. 2.2. Normalization of relational database tables The relational database is a certain marvel of tables interconnected. The number of tables in one file or one database depends on many factors, the main of which are: the composition of the database users, ensuring the integrity of the information (especially important in many people information systemsah), ensuring the smallest amount of memory required and mini-time data processing. 24.

Accounting for these factors in the design of relational databases is carried out by methods of normalizing tables and set-hoblehem connections between them.

Normalization of tablesit is the separation methods of one database table into several tables, in general, the responding requirements listed above. The normalization of the table is a sequential change in the structure of the table until it is satisfied with the requirements of the latest form of normalization. A total of six forms of normalization:

(FOIRTH

When describing normal forms, the following intensity is used: "Functional dependence between fields"; "Complete functional dependence between fields"; "Multivissal functional dependence between fields"; "Transitive functional dependence between fields"; "Mutual independence between fields." Functional dependencebetween the fields A and B is called a dependence at which each value A at any time corresponds to the only value in from all possible. An example of a functional dependence is the relationship between the taxpayer identification number and the number of his passport. Complete functional dependencebetween the composite field A and the field B called the dependence at which the field B depends on the functionally from the field A and does not depend on any subset of the field A. Multivissal functional dependencebetween the fields is determined in the following way. The field A multibingly determines the field B if for each field value and there is a "good defo-divided set" of the corresponding values \u200b\u200bof the V. Field Values, if we consider the student performance table in SCC-le, which includes the fields "subject" (field A ) and "Evaluation" (field B), then the field B has a "well-defined set" of up-letters: 1, 2, 3, 4, 5, i.e. For each field of the "Object" field, there is a multi-valued "well-defined many" field values \u200b\u200b"Evaluation". Transitive functional dependencebetween the fields A and C exists if the field with functionally depends on 25 fields B, and the field in functionally depends on the field A; In this case, there is no functional dependence of the field and from the field B. Mutual independence between fieldsdetermined as follows. Several fields are mutually independent if none of them is functionally dependent on the other. The first normal form.The table is in the first normal form if and only if none of the fields contain more than one value and any key field is not empty. The first normal form is the basis of the relational Mo-Deli. Any table in the relational database of auto-tically is in the first normal form, other is simply not-possible by definition. In such a table, it should not contain-Xia fields (signs), which could be divided into several fields (signs). Abnormalized, as a rule, there are tables, nonsense not intended for computer processing the information contained in them. For example, in Table. 2.2 shows the frag-cop of the table from the reference book "Universal metal-cutting machines", published by the experimental research and research institute of metal-cutting machines (EIRES). This table is abnormalized for the following reasons. 1. It contains lines having several values \u200b\u200bof one field in one cell: "The largest diameter of treatment, mm" and "spindle speed, rpm". 2. One field - " dimensions (Length x width x high), mm »can be divided into three fields:" Length, mm "," Shi-Rin, mm "and" height, mm ". The feasibility of such a separation may be substantiated by the need for subsequent settlements of areas or occupied volumes. The source table must be transformed into the first null shape. To do this, it is necessary: \u200b\u200bthe fields "the largest diameter of the processing, mm" and "the frequency of the spindle, rpm" divide into several fields in accordance with the number of values \u200b\u200bcontained in the same cell;

Field "Overall dimensions (length x width height), mm", divided into three fields: "Length, mm", "Width, mm", "height, mm". The key field of this table may be the field "Machine model" or "No. P / P" view of a normal form has a table. 2.3. Consider another example. In fig. 2.2 shows a fragment of the basement of the examination statement, which, as in the previous example, was originally intended for computers. Let we want to create a database for automated processing of the results of the testing and examination session in accordance
27

with the content of the examination statement. To do this, convert the contents of the form into the database table. Is-going out of the need to comply with the conditions of functional de-vissibility between the fields, it is necessary to form as mini-mums, two tables (Fig. 2.3) (key fields in each table are highlighted in bold). The first table contains the results of crediting (exam) by each student at a conk-retal object. The second table contains the resulting results of the credit (exam) of a particular group of students on a specific subject. In the first table, the key is the FIO FIO of the Student, and in the second table - the field of "discipline". Tab lines must be interconnected by the fields of "discipline" and "group cipher".

The presented table structures fully meets the requirements of the first normal form, but is characterized by the following disadvantages: adding new data to the table requires input values \u200b\u200bfor all fields; In each line of each table, it is necessary to introduce repeating values \u200b\u200bof the fields of "discipline", "FIO of the teacher", "group cipher". Consequently, with such composition of the tables and their structure, there is a clear redundancy of information, which, naturally, the strength-buet additional amounts of memory. To avoid listed deficiencies, it is necessary to test the tables to the second or third normal form. Second normal form.The table is in the second normal form if it meets the requirements of the first normal form and all its fields that are not included in the primary key are associated with a complete functional dependence with the primary key. 28.

If the table has a simple primary key consisting of only one field, it is automatically in the second normal form.

If the primary key of the composite, the table is optional, but is in the second normal form. Then it must be divided into two or more tables so that the primary key is unambiguously identified the value in any field. If there is at least one field that does not depend on the feathers, then up-to-full columns must be enabled to the primary key. If there are no such columns, you must add a new column. Based on these conditions that determine the second normal form, the following conclusions can be drawn according to the characteristics of the compiled tables (see Fig. 2.3). The first table has no direct link between the key field and the FIO FIO of the teacher, since different teachers can take a test or exam. In TWC, there is a complete functional dependence only between all other fields and a key field "Discipline". Similarly, there is no direct link in the second table between the key-out field and the FULL FIO of the teacher. To optimize the database, in particular, to reduce the required amount of memory due to the need to repeat in each record of the values \u200b\u200bof the "discipline" fields and FULL NAME, it is necessary to change the database structure - pre-form the source tables in the second normal form. The composition of the tables of the modified database structure is shown in Fig. 2.4. The transformed database structure consists of six tabs, two of which are interconnected (key fields in each table are highlighted in bold). All tables satisfy the requirements of the second normal form. The fifth and sixth tables have repeated values \u200b\u200bin the fields, but, given that these values \u200b\u200bare integers instead of text data, the total amount of memory required for storing information is significantly less than in the source tables (see Fig. 2.1). In addition, the new structure of the database will ensure the possibility of filling out the tables by various specialists (management services subiabilities). Further optimization of database tabs is reduced to bringing them to the third normal form. Third normal form.The table is in the third nor-mold, if it satisfies the definition of the second nor-mold and none of its key fields depend on the functionally from any other non-key field. 29.

You can also say that the table is in the third nor-mold, if it is in the second normal form and each not key field does not transitively depends on the primary key. The requirement of a third normal form is reduced to ensure that all key fields depend only on the primary key and did not depend on each other. In accordance with these requirements in the database tables (see Fig. 2.3), the third, third, and fourth table are the third normal form. To bring the fifth and sixth tables to the third normal form, we will create a new table containing information about the co-hundred items for which exams or tests in students groups are held. As a key, create a "counter" field, a mustache record number in the table, since each entry must be unique. thirty

As a result, we obtain a new database structure, which is shown in Fig. 2.5 (key fields in each table are highlighted in bold). This structure contains seven tabs that meet the requirements of the third normal form.

The normal form of boys - code.The table is in the normal form of the boys - the code only if any functional dependence between its fields is reduced to the full functional dependence on the possible key. According to this definition in the database structure (see Fig. 2.4), all tables meet the requirements of the normal form of boys - the code. Further optimization of database tables should be reduced to a complete table decomposition. Full decomposition of the tablethey call such a set of an arbitrary number of its projections, the connection of which is fully coincided with the contents of the table. The projection is called a copy of the table, into which one or more columns of the new table are not included. Fourth normal form.The fourth normal form is a special case of a fifth normal form, when the complete decomposition should be a compound of two projections.

It is very difficult to find such a table so that it is in the fourth normal form, but did not satisfy the definition of the fifth norm.

Fifth normal form.The table is in the fifth normal form if and only if all the projections contain all projections in each full dek position. A table that does not have a single complete decomposition is also in the fifth normal form. In practice, the optimization of the database tables ends with a third normal form. The creation of tables to the fourth and fifth normal forms represents, in our opinion, purely theoretical interest. Almost this problem decides, the development of requests for creating a new table. 2.3. Design of ties between tables The process of normalizing the source tables of databases allows you to create an optimal information system structure - once-working a database that requires the smallest memory resources and, as a result, ensuring the smallest access to information. At the same time, the separation of one source table into several COM requires the implementation of one of the most important conditions for the design of information systems - ensuring the integrity of the in-formation during the operation of the database. In the above example of the normalization of the source tabs (see Fig. 2.3), of the two tables, ultimately we received seven tables given to the third and fourth normal forms. As practice shows, in real production and business databases are multiplayer systems. This applies to both the creation and maintenance of data in sifting tables and the use of information for the decisions. In the example above, in the actually functioning system of managing the educational process in a university or college, the per-in-charge formation of training groups is made by acceptance commissions when enrolling applicants based on the results of entrance exams. Further maintenance of information on the composition of students in groups in universities is assigned to the deanants, and in colleges - on educational departments or relevant structures. The composition of educational disciplines by groups is determined by other services or specialists. Information about the teacher-space is formed in the personnel departments. The results of the test and examination sessions are necessary for the leaders of the dean and offices, including to make decisions on providing 32 scholarship students or "withdrawing with siphel-diy" of poor students. Any change in any of the database tables should be on to walk an adequate change in all other tables. This is the essence of the essence of the integrity of the database. Prak-tically this task is made by establishing links between database tables. We formulate the basic rules for establishing links between tables. 1. Select from two connected tables the main and subordinate. 2. In each table, select a key field. The key field is called the main table primary key.The key field of the sub-table is called external key. 3. The binding fields of the tables must have one data type. 4. There are the following types of links between tables: "one to one"; "One to many"; "Many to many": the "one to one" connection is established in cases where the specific line of the main table at any time is connected with only one line of the subordinate table; Communication "one to many" is established in cases where the specific line of the main table at any time

33 is associated with several strings of the subordinate table; In this case, any line of the slave table is connected only with a one line of the main table; The connection "Many to many" is established in cases, clang-yes, the specific line of the main table at any time of time is related to several strings of the subordinate table and at the same time one line of the slave table is associated with non-slip rows of the main table. When changing the primary key value in the main table, the following options for the dependent table are possible. Cascading.When changing the primary key data in the main table, the corresponding external key data is changed in the dependent table. All having the bonds are preserved. Restrict.When trying to change the value of the feathers, with which rows on the dependent table are associated, the changes are rejected. It is allowed to change only those values \u200b\u200bof the primary key, for which there is no connection with the depending table. Setting (Relation).When changing the primary key data, the external key is set to an indefinite value (NULL). Information about the belonging of strings of the dependent table is lost. If you change several values \u200b\u200bof the primary key, then in the dependent table there are several rows groups that were previously related to the changed keys. After this, it is impossible to determine which line with what primary key was connected. In fig. 2.6 shows the relationship circuits between the database tables shown in Fig. 2.5. Control questions 1. Give definitions to the following database table items: field, cell, recording. 2. What do the concepts "key" mean, "key field"? 3. What key field is called the primary key, and what is the external key? 4. What is the process of normalizing the database tables? 5. What are the five normal forms of database tables do you know? 6. Give the definitions by the following types of connections between the database tables: "one to one"; "One to many"; "Many to many".