About GCC, compilation and libraries. Optimal options for x86 GCC Initializing the LSB system

GCC is a freely available optimizing compiler for C, C++ languages.

Program gcc, launched from command line, is an add-on to a group of compilers. Depending on the file name extensions passed as parameters, and additional options, gcc launches the necessary preprocessors, compilers, linkers.

Files with the extension .cc or .C are considered as files in the C++ language, files with the extension .c as programs in C language, and files with the extension .o are considered objective.

To compile the C++ source code found in the file F.cc, and create an object file F.o, you need to run the command:

Gcc -c F.cc

The -c option means "compile only".

To link one or more object files derived from source code - F1.o, F2.o, ... - into a single executable file F, you need to enter the command:

Gcc -o F F1.o F2.o

The -o option specifies the name of the executable file.

You can combine two processing steps - compilation and linking - into one general stage using the command:

Gcc -o F F1.cc ... -lg++

- possible additional compilation and linking options. The –lg++ option indicates the need to include the standard library of the C++ language, - possible additional libraries.
After linking, an executable file F will be created, which can be run using the command

./F

- a list of command line arguments for your program.
Libraries are often used during the linking process. A library is a collection of object files grouped into a single file and indexed. When the link command encounters a library in the list of object files to link, it checks to see if the linked object files already contain calls to functions defined in one of the library files. If such functions are found, the corresponding calls are associated with the object file code from the library. Libraries can be included using the -lname option. In this case, in standard directories such as /lib , /usr/lib, /usr/local/lib the library will be searched for in a file named libname.a. Libraries should be listed after the source or object files containing calls to the corresponding functions.

Compilation options

Among the many compilation and linking options, the most commonly used are:

Option	Purpose
-c	This option means that only compilation is necessary. From the source files of the program, object files are created in the form name.o. No layout is performed.
-Dname=value	Define name in the compiled program as the value of v alue. The effect is the same as having a line #define name value at the beginning of the program. Part =value can be omitted, in which case the default value is 1.
-o file-name	Use file-name as the name for the created file.
-lname	Use libname.so library when linking
-Llib-path -Iinclude-path	Add lib-path and include-path to the standard search directories for libraries and header files, respectively.
-g	Place debugging information for the debugger in an object or executable file gdb. The option must be specified for both compilation and linking. In combination –g It is recommended to use the option to disable optimization –O0(see below)
-MM	Output dependencies on header files used in a C or C++ program in a format suitable for the utility make. No object or executable files are created.
-pg	Place profiling instructions in an object or executable file to generate information used by the utility gprof. The option must be specified for both compilation and linking. Assembled with option -pg The program generates a statistics file when launched. Program gprof based on this file, creates a transcript indicating the time spent performing each function.
-Wall	Displays messages about any warnings or errors that occur during program compilation.
-O1 -O2 -O3	Various levels of optimization.
-O0	Don't optimize. If you are using multiple -O options with or without level numbers, the last such option is valid.
-I	Used to add your own directories to search for header files during the build process
-L	Passed to the linker. Used to add your own library search directories during the build process.
-l	Passed to the linker. Used to add your own libraries to be searched during the build process.

Now that you know something about the C standard, let's look at the options that the gcc compiler offers to ensure compliance with the C standard in the language you write. There are three ways to ensure that your C code is standards-compliant and free of flaws: options that control the version of the standard you intend to conform to, definitions that control header files, and warning options that trigger stricter code checking. .

gcc has a huge range of options, and here we will consider only those that we consider the most important. A complete list of options can be found in the gcc online man pages. We'll also briefly discuss some of the #define directive options that can be used; Typically they should be specified in your source code before any #include lines or defined on the gcc command line. You might be surprised at how many options there are to select which standard to use, rather than simply checking a flag to force the current standard to be used. The reason is that many older programs rely on historical compiler behavior and would require significant work to update them to the latest standards. Rarely, if ever, will you want to update your compiler to cause it to break running code. As standards change, it is important to be able to work against a particular standard, even if it is not the most recent version of the standard.

Even if you're writing a small program for personal use, where standards compliance may not be that important, it often makes sense to include additional gcc warnings to force the compiler to look for errors in your code before executing the program. This is always more effective than executing the code step by step in the debugger and wondering where the problem might be. The compiler has many options that go beyond simple standards checking, such as the ability to detect code that meets the standard but may have questionable semantics. For example, a program may have an execution order that allows a variable to be accessed before it is initialized.

If you need to write a program for shared use, given the degree of compliance and types of compiler warnings that you consider sufficient, it is very important to spend a little more effort and get your code to compile without any warnings at all. If you allow some warnings to appear and get into the habit of ignoring them, one day a more serious warning may appear that you risk missing. If your code always compiles without warning messages, a new warning will inevitably attract your attention. Compiling code without warnings is a good habit to adopt.

Compiler options for standards tracking

Ansi is the most important standards option and forces the compiler to act according to the ISO C90 language standard. It disables some non-standard-compliant gcc extensions, disables C++-style comments (//) in C programs, and enables handling of ANSI trigraphs (three-character sequences). In addition, it contains the __ STRICT_ANSI__ macro, which disables some extensions in header files that are not compatible with the standard. The adopted standard may change in subsequent versions of the compiler.

Std= - This option provides finer control over the standard used, providing a parameter that specifies exactly the required standard. The following are the main possible options:

C89 - support the C89 standard;

Iso9899:1999 - support the latest version of the ISO standard, C90;

Gnu89 - Maintain the C89 standard, but allow some GNU extensions and some C99 functionality. In version 4.2 of gcc, this option is the default.

Options for standard tracking in define directives

There are constants (#defines) that can be specified as options on the command line or as definitions in the source code of the program. We generally think of these as using the compiler command line.

STRICT_ANSI__ - forces the ISO C standard to be used. Determined when the -ansi option is given on the compiler command line.

POSIX_C_SOURCE=2 - Enables functionality defined in IEEE Std 1003.1 and 1003.2. We will return to these standards later in this chapter.

BSD_SOURCE - Enables functionality of BSD systems. If they conflict with POSIX definitions, the BSD definitions take precedence.

GNU_SOURCE - Allows a wide range of properties and functions, including GNU extensions. If these definitions conflict with POSIX definitions, the latter take precedence.

Compiler options for warning output

These options are passed to the compiler from the command line. Again, we'll only list the main ones; a complete list can be found in the gcc online reference manual.

Pedantic is the most powerful option for checking the purity of C code. In addition to enabling the C standard check option, it disables some traditional C constructs prohibited by the standard and invalidates all GNU extensions to the standard. This option should be used to maximize the portability of your C code. The downside is that the compiler is very concerned about the cleanliness of your code, and sometimes you have to rack your brains to get rid of the few remaining warnings.

Wformat - checks the correctness of the argument types of printf family functions.

Wparentheses - checks for the presence of parentheses, even where they are not needed. This option is very useful for verifying that complex structures are initialized as intended.

Wswitch-default - checks for the presence of a default option in switch statements, which is generally considered good programming style.

Wunused - checks a variety of cases, for example, static functions declared but not described, unused parameters, discarded results.

Wall - Enables most gcc warning types, including all previous -W options (only -pedantic is not covered). With its help it is easy to achieve clean code.

Note

There are many more advanced warning options available, see the gcc web pages for details. In general we recommend using -Wall ; This is a good compromise between checking, which ensures high quality code, and the need for the compiler to issue a mass of trivial warnings that become difficult to reduce to zero.

It is a common belief that GCC lags behind other compilers in performance. In this article we will try to figure out what basic optimizations of the GCC compiler should be applied to achieve acceptable performance.

What are the default options in GCC?

(1) The default optimization level in GCC is “-O0”. It is clearly not optimal from a performance standpoint and is not recommended for compiling the final product.
GCC does not recognize the architecture on which compilation is run until the ”-march=native” option is passed. By default, GCC uses the option specified during its configuration. To find out the GCC configuration, just run:

This means that GCC will add “-march=corei7” to your options (unless another architecture is specified).
Most GCC compilers for x86 (baseline for 64-bit Linux) add: “-mtune=generic -march=x86-64” to the given options, since the configuration did not specify options that define the architecture. You can always find out all the options passed when GCC starts, as well as its internal options, using the command:

As a result, often used:

Specifying the architecture to use is important for performance. The only exception is those programs where calling library functions takes up almost the entire startup time. GLIBC can select the optimal function for a given architecture at runtime. It is important to note that with static linking, some GLIBC functions do not have versions for different architectures. That is, dynamic assembly is better if the speed of GLIBC functions is important..
(2) By default, most GCC compilers for x86 in 32-bit mode use the x87 floating-point model, since they were configured without “-mfpmath=sse”. Only if the GCC configuration contains “--with-mfpmath=sse”:

the compiler will use the SSE model by default. In all other cases, it is better to add the “-mfpmath=sse” option to the build in 32-bit mode.
So, often used:

Adding the option ”-mfpmath=sse” is important in 32-bit mode! The exception is the compiler, which has “--with-mfpmath=sse” in its configuration.

32 bit mode or 64 bit?

32-bit mode is usually used to reduce the amount of memory used and, as a result, speed up work with it (more data fits into the cache).
In 64-bit mode (compared to 32-bit), the number of available public registers increases from 6 to 14, XMM registers from 8 to 16. Also, all 64-bit architectures support the SSE2 extension, so in 64-bit mode there is no need to add the “-mfpmath” option =sse".
It is recommended to use 64-bit mode for counting tasks, and 32-bit mode for mobile applications.

How to get maximum performance?

There is no specific set of options to get the best performance, but GCC has many options that are worth trying. Below is a table with recommended options and growth forecasts for Intel Atom and 2nd Generation Intel Core i7 processors relative to the “-O2” option. The predictions are based on the geometric mean of the results of a specific set of problems compiled by GCC version 4.7. It is also assumed that the compiler configuration was carried out for x86-64 generic.
Forecast for increased performance on mobile applications relative to “-O2” (only in 32-bit mode, since it is the main one for the mobile segment):

Forecast for increased performance on computing tasks relative to “-O2” (in 64-bit mode):

-m64 -Ofast -flto	~17%
-m64 -Ofast -flto -march=native	~21%
-m64 -Ofast -flto -march=native -funroll-loops	~22%

The advantage of the 64-bit mode over 32-bit for computing tasks with the “-O2 -mfpmath=sse” options is about ~5%
All data in the article is a forecast based on the results of a specific set of benchmarks.
Below is a description of the options used in the article. Full description (in English): http://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/Optimize-Options.html "

"-Ofast" similar to "-O3 -ffast-math" enables a higher level of optimizations and more aggressive optimizations for arithmetic calculations (e.g. real reassociation)
"-flto" inter-module optimizations
"-m32" 32 bit mode
"-mfpmath=sse" enables the use of XMM registers in real arithmetic (instead of the real stack in x87 mode)
"-funroll-loops" enables loop unrolling

GCC is included in every distribution Linux and is usually installed by default. The GCC interface is standard interface compiler on UNIX platform, with its roots in the late 60s, early 70s of the last century - command line interface. Do not be alarmed; over the past time, the mechanism of interaction with the user has been honed to the perfection possible in this case, and work with GCC (with several additional utilities and useful text editor) easier than with any of the modern visual IDEs. The authors of the set tried to automate the process of compiling and assembling applications as much as possible. The user calls the control program gcc, it interprets the passed command line arguments (options and file names) and for each input file, according to the programming language used, runs its compiler, then, if necessary, gcc automatically calls the assembler and linker (linker).

Interestingly, compilers are one of the few UNIX applications that care about file extensions. By extension, GCC determines what kind of file is in front of it and what needs (can) be done with it. Source code files in the language C must have a .c extension, in the language C++, as an option, .cpp , header files in the language C.h , .o object files and so on. If you use the wrong extension, gcc will not work correctly (if you agree to do anything at all).

Let's move on to practice. Let's write, compile and execute some simple program. Let's not be original, as the source file of an example program in the language C Let's create a file with the following content:

/* hello.c */

#include

Main( void )
{

Printf("Hello World \n " );

return 0 ;

Now in the directory c hello.c we issue the command:

$gcc hello.c

After a few fractions of a second, the file a.out will appear in the directory:

$ls
a.out hello.c

This is the finished executable file of our program. Default gcc assigns the output executable file the name a.out (once upon a time this name meant assembler output).

$file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped

Let's run the resulting software:

$./a.out
Hello World

Why is it necessary to explicitly specify the path to the file in the command to run a file from the current directory? If the path to the executable file is not specified explicitly, the shell, when interpreting commands, looks for the file in the directories specified by the PATH system variable.

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games

Directories in the list are separated by a colon character. When searching for files, the shell looks through the directories in the order in which they are listed. By default, for security reasons, the current directory is .

Why is it not recommended to contribute .

file
displays information about the type (from the system point of view) of the file passed on the command line; for some file types, displays any additional information regarding the contents of the file.
$file hello.c
hello.c: ASCII C program text $file annotation.doc annotation.doc: CDF V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1251, Author: MIH, Template: Normal.dot, Last Saved By: MIH, Revision Number: 83, Name of Creating Application:

Microsoft Office gcc :)

Word, Total Editing Time: 09:37:00, Last Printed: Thu Jan 22 07:31:00 2009, Create Time/Date: Mon Jan 12 07:36:00 2009, Last Saved Time/Date: Thu Jan 22 07 :34:00 2009, Number of Pages: 1, Number of Words: 3094, Number of Characters: 17637, Security: 0 gcc That's all that is required from the user for successful use The name of the output executable file (as well as any other file generated:

) can be changed using
$ls
options -o
$ gcc -o hello hello.c
Hello World

hello hello.c

$ gcc -o hello hello.c
Hello World
$./hello
0

control program

, designed to automate the compilation process. Let's see what actually happens as a result of executing the gcc hello.c command. gcc The compilation process can be divided into 4 main stages: processing by a preprocessor, compilation itself, assembly, linking.

Options

allow you to interrupt the process at any of these stages. with the -E option further actions gcc you can interrupt and view the contents of the file processed by the preprocessor.

$ gcc -E -o hello.i hello.c
$ls
hello.c hello.i
$less hello.i
. . .
# 1 "/usr/include/stdio.h" 1 3 4
# 28 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
. . .
typedef unsigned char __u_char;
typedef unsigned short int __u_short;
typedef unsigned int __u_int;
. . .
extern int printf (__const char *__restrict __format, ...);
. . .
#4 "hello.c" 2
main (void)
{
printf("Hello World\n");
return 0;
}

After processing by the preprocessor, the source text of our program swelled and became unreadable. The code that we once wrote ourselves was reduced to a few lines at the very end of the file. Reason - inclusion of the standard library header file C. The stdio.h header file itself contains a lot of different things and also requires the inclusion of other header files.

Note the file extension hello.i. According to agreements gcc the .i extension corresponds to files with source code in the language C not requiring processing by a preprocessor. Such files are compiled bypassing the preprocessor:

$ gcc -o hello hello.i
$ls
hello hello.c hello.i
$ gcc -o hello hello.c
Hello World

After preprocessing, it’s time for compilation. The compiler converts the source code of the program into the language high level into assembly language code.

The meaning of the word compilation is vague. Wikipedians, for example, believe, citing international standards, that compilation is “the conversion by a compiler program of the source text of a program written in a high-level programming language into a language close to machine code or into object code.” In principle, this definition suits us; assembly language is indeed closer to machine language than C. But in everyday life, compilation is most often understood as simply any operation that converts the source code of a program in any programming language into executable code. That is, a process that includes all four stages mentioned above can also be called compilation. Similar ambiguity is present in the present text. On the other hand, the operation of converting the source text of a program into code in assembly language can also be denoted by the word translation - “converting a program presented in one of the programming languages into a program in another language and, in a certain sense, equivalent to the first one.”

You can stop the process of creating an executable file after compilation is complete. option -S:

$ gcc -S hello.c
$ls
hello.c hello.s
$file hello.s
hello.s: ASCII assembler program text
$ less hello.s
.file "hello.c"
.section .rodata
.LC0:
.string "Hello World"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $.LC0, (%esp)
call puts
movl $0, %eax
leave
ret
.size main, .-main

The file hello.s appeared in the directory, containing an implementation of the program in assembly language. Please note, specify the output file name using The name of the output executable file (as well as any other file generated in this case it was not necessary, gcc generated it automatically by replacing the .c extension in the source file name with .s . For most basic operations gcc the name of the output file is formed by such a replacement. The .s extension is standard for assembly language source code files.

Of course, you can also get the executable code from the hello.s file:

$ gcc -o hello hello.s
$ls
hello hello.c hello.s
$ gcc -o hello hello.c
Hello World

The next stage of the assembly operation is the translation of assembly language code into machine code. The result of the operation is an object file. An object file contains blocks of ready-to-execute machine code, data blocks, and a list of functions and external variables defined in the file ( symbol table ), but it does not contain absolute addresses of links to functions and data. An object file cannot be launched for execution directly, but later (at the linking stage) it can be combined with other object files (in this case, in accordance with the symbol tables, the addresses of existing cross-references between files will be calculated and filled). Option gcc-c , stops the process when the assembly phase completes:

$ gcc -c hello.c
$ls
hello.c hello.o
$file hello.o
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

The standard extension for object files is .o.

If the resulting object file hello.o is passed to the linker, the latter will calculate the link addresses, add program startup and termination code, code for calling library functions, and as a result we will have a ready-made executable program file.

$ gcc -o hello hello.o
$ls
hello hello.c hello.o
$ gcc -o hello hello.c
Hello World

What we have just done (or rather gcc did for us) and is the content of the last stage - linking (linking, arrangement).

Well, that’s probably all about compilation. Now let's touch on some, in my opinion, important options. gcc.

Option -I path/to/directory/with/header/files - adds the specified directory to the list of search paths for header files. Directory added by option -I is viewed first, then the search continues in the standard system catalogs. If options -I several, the directories specified by them are viewed from left to right, as options appear.

Option -Wall- displays warnings caused by potential errors in the code that do not prevent the program from being compiled, but which, in the compiler’s opinion, can lead to certain problems during its execution. An important and useful option, developers gcc We recommend using it always. For example, a lot of warnings will be issued when trying to compile a file like this:

1 /* remark.c */
2
3 static int k = 0 ;
4 static int l( int a);
5
6 main()
7 {
8
9 int a;
10
11 int b, c;
12
13 b + 1 ;
14
15 b = c;
16
17 int*p;
18
19 b = *p;
20
21 }

$ gcc -o remark remark.c
$ gcc -Wall -o remark remark.c
remark.c:7: warning: return type defaults to 'int'

remark.c:13: warning: statement with no effect
remark.c:9: warning: unused variable ‘a’
remark.c:21: warning: control reaches end of non-void function
remark.c: At top level:
remark.c:3: warning: ‘k’ defined but not used
remark.c:4: warning: ‘l’ declared ‘static’ but never defined
remark.c: In function 'main':
remark.c:15: warning: ‘c’ is used uninitialized in this function
remark.c:19: warning: ‘p’ is used uninitialized in this function

Option -Werror- turns all warnings into errors. If a warning appears, it interrupts the compilation process. Used in conjunction with option -Wall.

$ gcc -Werror -o remark remark.c
$ gcc -Werror -Wall -o remark remark.c
cc1: warnings being treated as errors
remark.c:7: error: return type defaults to 'int'
remark.c: In function 'main':
remark.c:13: error: statement with no effect
remark.c:9: error: unused variable 'a'

Option -g- places information necessary for the debugger to work in an object or executable file gdb. When assembling a project for the purpose of subsequent debugging, option -g must be included both at compilation and linking stages.

Options -O1, -O2, -O3- set the level of optimization of the code generated by the compiler. As the number increases, the degree of optimization increases. The effect of the options can be seen in this example.

Original file:

/* circle.c */

Main( void )
{

int i;

for(i = 0; i< 10 ; ++i)
;

return i;

Compiling with default optimization level:

$ gcc -S circle.c
$ less circle.s
.file "circle.c"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $0, -4(%ebp)
jmp .L2
.L3:
addl $1, -4(%ebp)
.L2:
cmpl $9, -4(%ebp)
jle.L3
movl -4(%ebp), %eax
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",@progbits

Compilation with maximum optimization level:

$ gcc -S -O3 circle.c
$ less circle.s
.file "circle.c"
.text
.p2align 4.15
.globl main
.type main, @function
main:
pushl %ebp
movl $10, %eax
movl %esp, %ebp
popl %ebp
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",@progbits

In the second case, the resulting code does not even have a hint of any cycle. Indeed, the value of i can be calculated at the compilation stage, which was done.

Alas, for real projects the difference in performance at different optimization levels is practically unnoticeable...

Option -O0- cancels any code optimization. The option is required at the application debugging stage. As shown above, optimization can lead to a change in the structure of the program beyond recognition; the connection between the executable and source code will not be obvious, accordingly, step-by-step debugging the program will not be possible. When you enable the option -g, it is recommended to enable and -O0.

Option -Os- sets optimization not for code efficiency, but for the size of the resulting file. The performance of the program should be comparable to the performance of the code obtained during compilation with the optimization level specified by default.

Option -march= architecture- specifies the target processor architecture. The list of supported architectures is extensive, for example, for processors of the family Intel/AMD can be set i386, pentium, prescott, opteron-sse3 etc. Users of binary distributions should keep in mind that for programs with this option to work correctly, it is desirable that all included libraries be compiled with the same option.

The options passed to the linker will be discussed below.

Small addition:

In our example, the main() function returns the seemingly unnecessary value 0. In UNIX-like systems, upon completion of a program, it is customary to return an integer to the command shell - zero if completion is successful, any other number otherwise. The shell interpreter will automatically assign the resulting value to an environment variable called ? . You can view its contents using the command echo $? : gcc determines the type (programming language) of transferred files by their extension and, in accordance with the guessed type (language), performs actions on them. The user is required to monitor extensions created files, choosing them as required by agreement gcc. In fact gcc You can insert files with arbitrary names. gcc -x option allows you to explicitly specify the programming language of the compiled files. The effect of the option applies to all subsequent files listed in the command (up to the appearance of the next option -x). Possible option arguments:

c c-header c-cpp-output

c++ c++-header c++-cpp-output

objective-c objective-c-header objective-c-cpp-output

objective-c++ objective-c++-header objective-c++-cpp-output

assembler assembler-with-cpp

ada

f77 f77-cpp-input

f95 f95-cpp-input

java

The purpose of the arguments should be clear from their writing (here cpp has nothing to do with C++, this is a source code file preprocessed by a preprocessor). Let's check:

$ mv hello.c hello.txt
$ gcc -Wall -x c -o hello hello.txt
$ gcc -o hello hello.c
Hello World

Separate compilation

The strong point of languages C/C++ is the ability to divide the source code of the program into several files. One can even say more - the possibility of separate compilation is the basis of the language, without it efficient use C unthinkable. It is multi-file programming that allows you to implement C major projects, for example such as Linux(here under the word Linux This refers to both the core and the system as a whole). What does separate compilation give to a programmer?

1. Allows you to make the program (project) code more readable. The source file for several dozen screens becomes almost impossible to cover. If, in accordance with some (pre-thought-out) logic, you break it into a number of small fragments (each in separate file), it will be much easier to cope with the complexity of the project.

2. Allows you to reduce the time it takes to recompile a project. If changes are made to one file, there is no point in recompiling the entire project; it is enough to recompile only this changed file.

3. Allows you to distribute work on a project among several developers. Each programmer creates and debugges his own part of the project, but at any moment it will be possible to assemble (reassemble) all the resulting developments into the final product.

4. Without separate compilation, libraries would not exist. Through libraries, code reuse and distribution is implemented. C/C++, and binary code, which allows, on the one hand, to provide developers with a simple mechanism for including it in their programs, and on the other hand, to hide specific implementation details from them. When working on a project, you should always think about whether you will need something from what has already been done someday in the future? Maybe it’s worth separating and organizing part of the code as a library in advance? In my opinion, this approach greatly simplifies life and saves a lot of time.

GCC, of course, supports separate compilation, and does not require any special instructions from the user. In general, everything is very simple.

Here practical example(though very, very conditional).

Set of source code files:

/* main.c */

#include

#include "first.h"
#include "second.h"

int main( void )
{

First();
second();

Printf("Main function... \n " );

return 0 ;

/* first.h */

void first( void );

/* first.c */

#include

#include "first.h"

void first( void )
{

Printf("First function... \n " );

/* second.h */

void second( void );

/* second.c */

#include

#include "second.h"

void second( void )
{

Printf("Second function... \n " );

In general, we have this:

$ls
first.c first.h main.c second.c second.h

All this stuff can be compiled into one command:

$ gcc -Wall -o main main.c first.c second.c
$./main
First function...
Second function...
Main function...

But this will not give us practically any bonuses, well, with the exception of more structured and readable code, distributed over several files. All the advantages listed above will appear in the case of this compilation approach:

$ gcc -Wall -c main.c
$ gcc -Wall -c first.c
$ gcc -Wall -c second.c
$ls
first.c first.h first.o main.c main.o second.c second.h second.o
$ gcc -o main main.o first.o second.o
$./main
First function...
Second function...
Main function...

What have we done? From each source file (compiling with the option -c) received an object file. The object files were then linked into the final executable. Of course teams gcc there are more, but no one assembles projects manually; for this there are assembly utilities (the most popular make). When using assembly utilities, all of the above advantages of separate compilation will appear.

The question arises: how does the linker manage to put object files together while correctly calculating the addressing of calls? How does he even know that the file second.o contains the code for the second() function, and the code in the file main.o contains its call? It turns out everything is simple - the object file contains the so-called symbol table , including the names of some code positions (functions and external variables). The linker looks through the symbol table of each object file, looks for common (with matching names) positions, based on which it draws conclusions about the actual location of the code of the functions used (or data blocks) and, accordingly, recalculates the call addresses in the executable file.

You can view the symbol table using the utility nm.

$nm main.o
U first
00000000 T main
U puts
U second
$nm first.o
00000000 T first
U puts
$nmsecond.o
U puts
00000000 T second

The appearance of the puts call is explained by the use of the standard library function printf(), which turned into puts() at compile time.

The symbol table is written not only in the object file, but also in the executable file:

$nm main
08049f20d_DYNAMIC
08049ff4 d _GLOBAL_OFFSET_TABLE_
080484fc R_IO_stdin_used
w_Jv_RegisterClasses
08049f10 d __CTOR_END__
08049f0c d __CTOR_LIST__
08049f18 D __DTOR_END__
08049f14 d __DTOR_LIST__
08048538 r __FRAME_END__
08049f1c d __JCR_END__
08049f1c d __JCR_LIST__
0804a014 A __bss_start
0804a00c D __data_start
080484b0 t __do_global_ctors_aux
08048360 t __do_global_dtors_aux
0804a010 D __dso_handle
w __gmon_start__
080484aa T __i686.get_pc_thunk.bx
08049f0c d __init_array_end
08049f0c d __init_array_start
08048440 T __libc_csu_fini
08048450 T __libc_csu_init
U __libc_start_main@@GLIBC_2.0
0804a014 A_edata
0804a01c A_end
080484dc T_fini
080484f8 R_fp_hw
080482b8 T_init
08048330 T_start
0804a014 b completed.7021
0804a00c W data_start
0804a018 b dtor_idx.7023
0804840c T first
080483c0 t frame_dummy
080483e4 T main
U puts@@GLIBC_2.0
08048420 T second

Including a symbol table in the executable is particularly necessary for ease of debugging. In principle, it is not really needed to run the application. For executable files real programs, with many function definitions and external variables using a bunch of different libraries, the symbol table becomes quite extensive. To reduce the size of the output file, it can be removed using with the gcc -s option.

$ gcc -s -o main main.o first.o second.o
$./main
First function...
Second function...
Main function...
$nm main
nm: main: no symbols

It should be noted that during linking, the linker does not do any function call context checks; it does not monitor the type of the return value, nor the type and number of parameters received (and it has no place to get such information from). All checks for the correctness of calls must be done at the compilation stage. In the case of multi-file programming, it is necessary to use the language header file mechanism for this. C.

Libraries

Library - in language C, a file containing object code that can be attached to a program using the library at the linking stage. In fact, a library is a set of specially arranged object files.

The purpose of libraries is to provide the programmer with a standard mechanism for reusing code, and the mechanism is simple and reliable.

From the point of view of the operating system and application software, libraries are static And shared (dynamic ).

The code of static libraries is included in the executable file during the latter's linking. The library is “hardwired” into the file, the library code is “merged” with the rest of the file code. A program that uses static libraries becomes self-contained and can be run on almost any computer with a suitable architecture and operating system.

The shared library code is loaded and connected to the program code by the operating system, at the request of the program during its execution. The executable file of the program does not include the dynamic library code; only the link to the library is included in the executable file. As a result, a program that uses shared libraries is no longer self-contained and can only be successfully launched on a system where the involved libraries are installed.

The shared library paradigm provides three significant advantages:

1. The size of the executable file is reduced many times over. In a system that includes many binaries using the same code, there is no need to store a copy of that code for each executable file.

2. Shared library code used by multiple applications is stored in random access memory in one copy (in fact, it’s not that simple...), as a result, the system’s need for available RAM is reduced.

3. There is no need to rebuild each executable file if changes are made to the code of the library they share. Changes and corrections to the dynamic library code will automatically be reflected in each of the programs that use it.

Without the shared library paradigm there would be no precompiled (binary) distributions Linux(yes, none exist). Imagine the size of the distribution, in each binary file of which the code of the standard library would be placed C(and all other included libraries). Just imagine what you would have to do to update the system after eliminating a critical vulnerability in one of the widely used libraries...

Now for some practice.

For illustration, we will use the set of source files from the previous example. In our homemade library we will place the code (implementation) of the first() and second() functions.

Linux has the following naming scheme for library files (although it is not always followed): the library file name begins with the lib prefix, followed by the library name itself, and ends with the .a extension ( archive ) - for a static library, .so ( shared object ) - for shared (dynamic), after the expansion, the digits of the version number are listed through a dot (only for a dynamic library). The name of the header file corresponding to the library (again, as a rule) consists of the library name (without prefix and version) and the .h extension. For example: libogg.a, libogg.so.0.7.0, ogg.h.

First, let's create and use a static library.

The first() and second() functions will make up the contents of our libhello library. The library file name will accordingly be libhello.a. The header file hello.h is comparable to the library.

/* hello.h */

void first( void );
void second( void );

Of course, the lines:

#include "first.h"

#include "second.h"

in the files main.c , first.c and second.c must be replaced with:

#include "hello.h"

Well, now, let's enter the following sequence of commands:

$ gcc -Wall -c first.c
$ gcc -Wall -c second.c
$ ar crs libhello.a first.o second.o
$file libhello.a
libhello.a: current ar archive

As already mentioned, a library is a collection of object files. With the first two commands we created these object files.

Next, you need to arrange the object files into a set. An archiver is used for this ar- the utility “glues” several files into one; the resulting archive includes the information required to restore (extract) each individual file (including its attributes of ownership, access, time). There is no “compression” of the archive contents or any other transformation of the stored data.

Option carname- create an archive, if an archive with the name arname does not exist it will be created, otherwise the files will be added to the existing archive.

Option r- sets the archive update mode; if a file with the specified name already exists in the archive, it will be deleted, and new file added to the end of the archive.

Option s- adds (updates) the archive index. In this case, the archive index is a table in which for each symbolic name defined in the archived files (function name or data block) the corresponding object file name is associated. The archive index is necessary to speed up work with the library - in order to find necessary definition, there is no need to view the symbol tables of all archive files; you can immediately go to the file containing the name you are looking for. You can view the archive index using the already familiar utility nm taking advantage of it with the -s option(symbol tables of all archive object files will also be shown):

$ nm -s libhello.a
Archive index:
first in first.o
second in second.o

first.o:
00000000 T first
U puts

second.o:
U puts
00000000 T second

To create an archive index there is special utility ranlib. The library libhello.a could be created like this:

$ ar cr libhello.a first.o second.o
$ ranlib libhello.a

However, the library will work fine without an archive index.

Now let's use our library:

$ gcc -Wall -c main.c
$
$./main
First function...
Second function...
Main function...

Works...

Well now the comments... Two new options have appeared gcc:

Option -l name- passed to the linker, indicating the need to include the libname library in the executable file. To connect means to indicate that such and such functions (external variables) are defined in such and such a library. In our example, the library is static, all symbolic names will refer to the code located directly in the executable file. Please note in the option -l The library name is given as name without the lib prefix.

Option -L /path/to/directory/with/libraries - is passed to the linker, indicating the path to the directory containing the connected libraries. In our case, the point is given . , the linker will first look for libraries in the current directory, then in directories defined in the system.

A small note needs to be made here. The fact is that for a number of options gcc The order in which they appear on the command line is important. This is how the linker looks for code that matches the names specified in the file's symbol table in the libraries listed on the command line after the name of this file. The contents of libraries listed before the file name are ignored by the linker:

$ gcc -Wall -c main.c
$ gcc -o main -L. -lhello main.o
main.o: In function `main":
main.c:(.text+0xa): undefined reference to `first"
main.c:(.text+0xf): undefined reference to `second"

$ gcc -o main main.o -L. -lhello
$./main
First function...
Second function...
Main function...

This behavior feature gcc due to the desire of the developers to provide the user with the opportunity to combine files with libraries in different ways, use intersecting names... In my opinion, if possible, it is better not to bother with this. In general, included libraries should be listed after the name of the file that references them.

Exists alternative way indicating the location of libraries in the system. Depending on the distribution, the LD_LIBRARY_PATH or LIBRARY_PATH environment variable may store a colon-separated list of directories in which the linker should look for libraries. As a rule, by default this variable is not defined at all, but nothing prevents you from creating it:

$ echo $LD_LIBRARY_PATH

/usr/lib/gcc/i686-pc-linux-gnu/4.4.3/../../../../i686-pc-linux-gnu/bin/ld: cannot find -lhello
collect2: ld execution completed with return code 1
$ export LIBRARY_PATH=.
$ gcc -o main main.o -lhello
$./main
First function...
Second function...
Main function...

Manipulating environment variables is useful when creating and debugging your own libraries, as well as if there is a need to connect some non-standard (outdated, updated, changed - generally different from the one included in the distribution) shared library to the application.

Now let's create and use a dynamic library.

The set of source files remains unchanged. We enter the commands, see what happens, read the comments:

$ gcc -Wall -fPIC -c first.c
$ gcc -Wall -fPIC -c second.c
$ gcc -shared -o libhello.so.2.4.0.5 -Wl,-soname,libhello.so.2 first.o second.o

What did you get as a result?

$ file libhello.so.2.4.0.5
libhello.so.2.4.0.5: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped

The file is libhello.so.2.4.0.5, this is our shared library. We'll talk about how to use it below.

Now the comments:

Option -fPIC- requires the compiler, when creating object files, to generate position independent code (PIC - Position Independent Code ), its main difference is in the way addresses are presented. Instead of specifying fixed (static) positions, all addresses are calculated based on the offsets specified in global offset table (global offset table - GOT ). The position-independent code format allows you to connect executable modules to the main program code at the time it is loaded. Accordingly, the main purpose of position-independent code is the creation of dynamic (shared) libraries.

-shared option- indicates gcc, that as a result, not an executable file should be compiled, but a shared object - a dynamic library.

Option -Wl,-soname,libhello.so.2- sets soname libraries. We'll talk about soname in detail in the next paragraph. Now let's discuss the format of the option. This strange, at first glance, construction with commas is intended for direct interaction between the user and the linker. During compilation gcc the linker calls automatically, automatically, at its own discretion, gcc passes it the options necessary for successful completion of the task. If the user needs to intervene in the linking process himself, he can use a special option gcc -Wl, -option , value1 , value2 .... What does it mean to pass to the linker ( -Wl) option -option with arguments value1, value2 and so on. In our case, the linker was given the option -soname with argument libhello.so.2.

Now about soname. When creating and distributing libraries, the problem of compatibility and version control arises. In order for the system, specifically the dynamic library loader, to have an idea of which version of the library was used when compiling the application and, accordingly, is necessary for its successful operation, a special identifier was provided - soname , placed both in the library file itself and in the application executable file. The soname identifier is a string that includes the library name prefixed with lib , a dot, the extension so , again a dot, and one or two (dot-separated) digits of the library version - lib name .so. x. y. That is, soname matches the name of the library file up to the first or second digit of the version number. Let the executable file name of our library be libhello.so.2.4.0.5 , then the soname of the library can be libhello.so.2 . When changing the interface of a library, its soname must be changed! Any modification of the code that leads to incompatibility with previous releases must be accompanied by the appearance of a new soname.

How does it all work? Let the successful execution of some application require a library named hello , let there be one in the system, and the library file name is libhello.so.2.4.0.5 , and the library soname written in it is libhello.so.2 . At the application compilation stage, the linker, in accordance with the option -l hello, will search the system for a file named libhello.so . On a real system, libhello.so is a symbolic link to the file libhello.so.2.4.0.5. Having gained access to the library file, the linker reads the soname value written in it and, along with other things, places it in the application executable file. When the application is launched, the dynamic library loader will receive a request to include the library with soname read from the executable file, and will try to find a library on the system whose file name matches soname. That is, the loader will try to find the libhello.so.2 file. If the system is configured correctly, it should contain a symbolic link libhello.so.2 to the file libhello.so.2.4.0.5, the bootloader will have access to the required library and then without hesitation (and without checking anything else) will connect it to the application. Now imagine that we have transferred the application compiled in this way to another system, where only previous version libraries with soname libhello.so.1 . Trying to run the program will result in an error because there is no file named libhello.so.2 on this system.

Thus, at the compilation stage the linker needs to provide a library file (or a symlink to a library file) called lib name .so , at the runtime the loader needs a file (or a symlink) called lib name .so . x. y. What does lib name .so have to do with it? x. y must match the soname string of the library used.

In binary distributions, as a rule, the library file libhello.so.2.4.0.5 and a link to it libhello.so.2 will be placed in the libhello package, and the link libhello.so, necessary only for compilation, together with the library header file hello.h will be packaged in the libhello-devel package (the devel package will also contain a file for the static version of the libhello.a library; the static library can be used, also only at the compilation stage). When unpacking the package, all listed files and links (except hello.h) will be in the same directory.

Let's make sure that the specified soname line is actually written in our library file. Let's use the mega utility objdump with option -p :

$ objdump -p libhello.so.2.4.0.5 | grep SONAME
SONAME libhello.so.2

in PATH? It is believed that in a real multi-user system there will always be some bad person who will place in a public directory a malicious program with an executable file name that matches the name of some command, often called by a local administrator with superuser rights... The conspiracy will succeed if . objdump- a powerful tool that allows you to obtain comprehensive information about the internal content (and structure) of an object or executable file. The man page of the utility says that objdump First of all, it will be useful for programmers who create debugging and compilation tools, and not just write some application programs :) In particular, with the option -d this is a disassembler. We used the option -p- display various meta-information about the object file.

In the above example of creating a library, we strictly followed the principles of separate compilation. Of course, the library could be compiled like this, with one call gcc:

$ gcc -shared -Wall -fPIC -o libhello.so.2.4.0.5 -Wl,-soname,libhello.so.2 first.c second.c

Now let's try to use the resulting library:

$ gcc -Wall -c main.c
$
/usr/bin/ld: cannot find -lhello
collect2: ld returned 1 exit status

The linker swears. Let's remember what was said above about symbolic links. Create libhello.so and try again:

$ ln -s libhello.so.2.4.0.5 libhello.so
$ gcc -o main main.o -L. -lhello -Wl,-rpath,.

Now everyone is happy. Launch the created binary:

Error... The loader complains and cannot find the libhello.so.2 library. Let's make sure that the executable file actually contains a link to libhello.so.2:

$ objdump -p main | grep NEEDED
NEEDED libhello.so.2
NEEDED libc.so.6

$ ln -s libhello.so.2.4.0.5 libhello.so.2
$./main
First function...
Second function...
Main function...

It's working... Now comments on the new options gcc.

Option -Wl,-rpath,.- already familiar construction, pass the option to the linker -rpath with argument . . By using -rpath In the executable file of the program, you can write additional paths along which the shared library loader will search for library files. In our case, the path is written . - the search for library files will start from the current directory.

$ objdump -p main | grep RPATH
RPATH.

Thanks to this option, when starting the program there is no need to change environment variables. It is clear that if you move the program to another directory and try to run it, the library file will not be found and the loader will display an error message:

$mv main..
$ ../main
First function...
Second function...
Main function...

You can also find out which shared libraries your application needs using the utility ldd:

$ldd main
linux-vdso.so.1 => (0x00007fffaddff000)
libhello.so.2 => ./libhello.so.2 (0x00007f9689001000)
libc.so.6 => /lib/libc.so.6 (0x00007f9688c62000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9689205000)

In output ldd For each required library, its soname and the full path to the library file, determined in accordance with the system settings, are indicated.

Now is the time to talk about where library files are supposed to be placed in the system, where the loader tries to find them, and how to manage this process.

In accordance with the agreements FHS (Filesystem Hierarchy Standard) The system must have two (at least) directories for storing library files:

/lib - here are the main distribution libraries necessary for the operation of programs from /bin and /sbin;

/usr/lib - libraries needed by application programs from /usr/bin and /usr/sbin are stored here;

The header files corresponding to the libraries must be located in the /usr/include directory.

The loader will by default look for library files in these directories.

In addition to those listed above, the system must have a directory /usr/local/lib - this should contain libraries deployed by the user independently, bypassing the package management system (not included in the distribution). For example, in this directory by default there will be libraries compiled from sources (programs installed from sources will be placed in /usr/local/bin and /usr/local/sbin, of course we are talking about binary distributions). The library header files in this case will be placed in /usr/local/include .

In a number of distributions (in Ubuntu) the loader is not configured to view the /usr/local/lib directory, therefore, if the user installs the library from source, the system will not see it. This was done specifically by the authors of the distribution to teach the user to install software only through the package management system. What to do in this case will be described below.

In fact, to simplify and speed up the process of searching for library files, the loader does not look at the above directories each time it is accessed, but uses the database stored in the /etc/ld.so.cache file (library cache). This contains information about where in the system the library file corresponding to a given soname is located. The loader, having received a list of libraries required by a specific application (a list of soname libraries specified in the program executable file), uses /etc/ld.so.cache to determine the path to the file of each required library and load it into memory. Additionally, the bootloader can look at the directories listed in the LD_LIBRARY_PATH , LIBRARY_PATH system variables and in the RPATH field of the executable file (see above).

To manage and keep the library cache up to date, use the utility ldconfig. If you run ldconfig without any options, the program will look at the directories specified on the command line, the trusted directories /lib and /usr/lib, the directories listed in the /etc/ld.so.conf file. For each library file found in the specified directories, the soname will be read, a symbolic link based on the soname will be created, and the information in /etc/ld.so.cache will be updated.

Let's make sure of what has been said:

$ls
hello.h libhello.so libhello.so.2.4.0.5 main.c
$
$ sudo ldconfig /full/path/to/catalog/with/example
$ls
hello.h libhello.so libhello.so.2 libhello.so.2.4.0.5 main main.c
$./main
First function...
Second function...
Main function...

First call ldconfig We added our library to the cache and excluded it with the second call. Note that the option was omitted when compiling main -Wl,-rpath,., as a result, the loader searched for the required libraries only in the cache.

Now it should be clear what to do if, after installing the library from the source, the system does not see it. First of all, you need to enter the full path to the directory with the library files in the /etc/ld.so.conf file (by default /usr/local/lib). Format /etc/ld.so.conf - the file contains a list separated by colon, space, tab or symbol new line, directories in which libraries are searched. Then call ldconfig without any options, but with superuser rights. Everything should work.

Well, in the end, let's talk about how static and dynamic versions of libraries get along together. What exactly is the question? Above, when discussing the accepted names and locations of library files, it was said that the files of the static and dynamic versions of the library are stored in the same directory. How gcc finds out what type of library we want to use? By default, dynamic library is preferred. If the linker finds a dynamic library file, he without hesitation links it to the executable file of the program:

$ls
hello.h libhello.a libhello.so libhello.so.2 libhello.so.2.4.0.5 main.c
$ gcc -Wall -c main.c
$ gcc -o main main.o -L. -lhello -Wl,-rpath,.
$ldd main
linux-vdso.so.1 => (0x00007fffe1bb0000)
libhello.so.2 => ./libhello.so.2 (0x00007fd50370b000)
libc.so.6 => /lib/libc.so.6 (0x00007fd50336c000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd50390f000)
$ du -h main
12K main

Pay attention to the size of the program's executable file. It is the minimum possible. All used libraries are linked dynamically.

Exists gcc -static option- instructing the linker to use only static versions of all required by the application libraries:

$ gcc -static -o main main.o -L. -lhello
$file main
main: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.15, not stripped
$ldd main
is not a dynamic executable
$ du -h main
728K main

The size of the executable file is 60 times larger than in the previous example - standard language libraries are included in the file C. Now our application can be safely transferred from directory to directory and even to other machines, the hello library code is inside the file, the program is completely autonomous.

What to do if you need to statically link only part of the used libraries? Possible variant solutions - make the name of the static version of the library different from the name of the shared one, and when compiling the application, indicate which version we want to use this time:

$ mv libhello.a libhello_s.a
$ gcc -o main main.o -L. -lhello_s
$ldd main
linux-vdso.so.1 => (0x00007fff021f5000)
libc.so.6 => /lib/libc.so.6 (0x00007fd0d0803000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd0d0ba4000)
$ du -h main
12K main

Since libhello's code size is negligible,

$ du -h libhello_s.a
4.0K libhello.a

the size of the resulting executable file is practically the same as the size of the file created using dynamic linking.

Well, that's probably all. Many thanks to everyone who finished reading at this point.

Sections on this page:

Compiler options for standards tracking

Std= - This option provides finer control over the standard used, providing a parameter that specifies exactly the required standard. The following are the main possible options:

C89 - support the C89 standard;

Iso9899:1999 - support the latest version of the ISO standard, C90;

Gnu89 - support the C89 standard, but allow some GNU extensions and some functionality C99. In version 4.2 of gcc, this option is the default.

Options for tracking standard in directives define

STRICT_ANSI__ - forces the ISO C standard to be used. Determined when the -ansi option is given on the compiler command line.

POSIX_C_SOURCE=2 - Enables functionality defined in IEEE Std 1003.1 and 1003.2. We will return to these standards later in this chapter.

BSD_SOURCE - enables functionality BSD systems. If they conflict with POSIX definitions, the BSD definitions take precedence.

GNU_SOURCE - Allows a wide range of properties and functions, including GNU extensions. If these definitions conflict with POSIX definitions, the latter take precedence.

Compiler options for warning output

These options are passed to the compiler from the command line. And again we will list only the main ones, full list can be found in the interactive reference guide gcc.

About GCC, compilation and libraries. Optimal options for x86 GCC Initializing the LSB system

What are the default options in GCC?

32 bit mode or 64 bit?

How to get maximum performance?

Separate compilation

Libraries

Sections on this page:

Compiler options for standards tracking

Options for tracking standard in directives define

Compiler options for warning output

Latest

You need to know this