Concepts needed for UNIX SYSTEM PROGRAMMING Copyright (c) 2002, Gene Cooperman Rights to copy and distribute for non-commercial purposes only are freely granted, but only as long as this copyright statement remains. UNIX help The table of contents is: man 1 intro, man 2 intro, man 3 intro, etc. The index is done with something like: apropos stream Section 1 (UNIX commands), Section 2 (System calls), Section 3 (C library calls), etc. READ THEM. `man 2 intro', `man ps' and `man proc' Solaris: man -s 2 intro or SGI: man 2 intro or Linux: Where? (Linux doesn't apear to have the detailed info in intro) For documentation, consider: info, man, apropos, google on the web info: info is an older hypertext documentation system from before the web. It still has some of the best documentation on core UNIX/Linux tools. CCIS is missing some of the common info pages, but most Linux's have them. Try: `info info' for a help page on how to get started. emacs: Try: `emacs' and then type: ^Ht (where ^H is "control"-H) Consider also: grep KEYWORD /usr/man/man?/* For documentation on includes, try: grep KEYWORD /usr/include/* grep KEYWORD /usr/include/*/* find /usr/include -name '*.h' -exec grep KEYWORD {} \; find ~/. -name '*PARTIAL-FILENAME*' etc. For documentation on what shell scripts do internally, try: sh -x SHELL_SCRIPT PARAMS sh -v SHELL_SCRIPT PARAMS For documentation on what type a file is: file (1) - determine file type Network, and different hosts: nslookup nslookup (1m) - query name servers interactively ping ping (1m) - send ICMP (ICMP6) ECHO_REQUEST packets to network hosts traceroute traceroute (8) - print the route packets take to network host uname uname (1) - print system information [Especially, try: `uname -a'] Processes READ QUICKLY FROM BEGINNING TO END ONCE. THE CONCEPTS WILL REMAIN IN YOUR MIND IN CASE YOU NEED TO REFER BACK LATER. On non-Linux UNIX, `man 2 intro' or `man -s 2 intro' has a detailed explanation of processes READ man 2 intro [ VERY IMPORTANT] ps ps (1) - report process status proc (5) - process information pseudo-filesystem Handling libraries, executables, and linking READ `man gcc', `man ld', `man a.out', `man core'. READ QUICKLY FROM BEGINNING TO END ONCE. THE CONCEPTS WILL REMAIN IN YOUR MIND IN CASE YOU NEED TO REFER BACK LATER. Object files (executables, a.out, and sometimes libraries): nm (1) - list symbols from object files. objcopy (1) - copy and translate object files objdump (1) - display information from object files. [For disassembly, compile with -g, and `objdump -dglrC'] size (1) - print section sizes in bytes of object files strip (1) - Discard symbols from object files, makes file smaller. strings (1) - print the strings of printable characters in files (For example: strings a.out | grep / inspects all references to filepaths with '/' in a binary) Libraries: ld (1) - the linker (Try `man gcc' or `info gcc' to see how gcc calls `ld') Static libraries (.a): ar (1) - create, modify, and extract from archive (from .a file). ranlib (1) - generate index to archive (not required on some operating systems) Dynamic (shared) libraries (.so): ld.so (8) - a.out dynamic linker/loader ldconfig (8) - determine run-time link bindings [ LINUX ONLY? ] ldd (1) - print shared library dependencies Linux: ld.so/ld-linux.so (8) - dynamic linker/loader Solaris: ld.so.1 - runtime linker for dynamic objects READ ESPECIALLY ABOUT THE LD_LIBRARY_PATH ENVIRONMENT VARIABLE. YOU WILL NEED IT TO CONFIGURE SOME SOFTWARE AT RUNTIME. If you forget the name LD_LIBRARY_PATH, try `printenv | grep PATH' Debugging source code: GNU gdb OR SOME OTHER SYMBOLIC DEBUGGER IS ESSENTIAL. If you use C++, try 'foo( in gdb, and the tells gdb to show all completions. READ GDB HELP: gdb (gdb) help (gdb) help breakpoints SIMILARLY, READ help on: (gdb) help: data, files, internals, running, stack, status, user-defined READ QUICKLY FROM BEGINNING TO END ONCE. THE CONCEPTS WILL REMAIN IN YOUR MIND IN CASE YOU NEED TO REFER BACK LATER. Debugging calls to libraries: truss truss (1) - trace system calls and signals sotruss sotruss (1) - trace shared library procedure calls ptrace ptrace (2) - allows a parent process to control the execution of a child process apptrace apptrace (1) - trace application function calls to Solaris shared libraries Segmentation fault at beginning of execution: Do: limit Maybe stacksize or datasize or memoryuuse set too small. [ Try: limit datasize 100M ] FILE MANIPULATION locate locate (1) - list files in databases that match a pattern find find (1) - search for files in a directory hierarchy grep grep (1) - print lines matching a pattern [ On CCS, note that /arch/gnu/bin/grep has additional options, especially grep -b 2 -a 3 ... [also show 2 lines Before and 3 lines After] ] KNOW WHAT FILES ARE ON THE SYSTEM. IF locate WORKS, USE IT (locate database is out-of-date for CCS when I last checked) YOU CAN CREATE YOUR OWN DATABASE OF FILES SIMILAR TO locate: # WARNING: This can use many megabytes of disk space. echo "" > /tmp/myloechoe.db find /usr -exec echo {} >> /tmp/mylocate.db \; & find /share -exec echo {} >> /tmp/mylocate.db \; & find /etc -exec echo {} >> /tmp/mylocate.db \; & find /sbin -exec echo {} >> /tmp/mylocate.db \; & find /net -exec echo {} >> /tmp/mylocate.db \; & gzip /tmp/mylocate.db gzip -dc /tmp/mylocate.db.gz | grep - ANY_FILE which which (1) - locate a command; display its pathname or alias file file (1) - determine file type less less (1) - opposite of more head head (1) - display first few lines of files tail tail (1) - deliver the last part of a file [ NOTE ESPECIALLY: tail -f ... ] cat cat (1) - concatenate and display files echo cbreak (3/3x) - curses input options diff diff (1) - display line-by-line differences between pairs of text files xemacs: LOOK AT MENU: Tools > Compare > Two Files [ Make backup of files before experimenting with this. ] cat -v: I DO: alias see cat -v GET USED TO USING grep, diff, less, head, tail, cat, echo, file, which, find AND * (wild cards) FOR FAST SEARCHING FOR INFORMATION. CHECK OUT `cat -v' (I alias it to see) TO REMOVE NON-PRINTING CHARACTERS. AS AN EXAMPLE, ASSUME WE ARE LOKKING FOIR INFORMATION IN THE man FILES ABOUT zombie PROCESSES. IN LINUX, FILES ARE STORES AS .gz IN OTHER UNIXES, YOU MAY WANT TO REPLACE gzcat BY cat. FOR EXAMPLE, IF /usr/man/man2 stores its files as .gz (as on Linux): gzcat /usr/man/man2/*.gz | grep -b3 -a3 zombie ABOVE, I USED THE GNU/Linux VERSION OF GREP, WITH -b and -a for context. ANOTHER METHOD IS: find /usr/man -exec gzcat {} \| grep zombie \| cat -v \; -print IF IN csh, OR tcsh, YOU CAN DO THE FOLLOWING TO LOOK AT THE FIRST 10 LINES OF EVERY man FILE: foreach file(/usr/man/man2/*.gz) gzcat $file | head -10 end Disk space: THE FIRST THINGS TO CHECK ARE: `df -k', `quota -v', and `man du' KNOW ABOUT /scratch (on CCS, it's call /ccs/tmp) KNOW THE DIFFERENCE BETWEEN /tmp and /var/tmp (or /usr/tmp) TRICKS FOR UNDERSTANDING BINARIES: To make the binary smaller, strip a.out [ None of the symbols will be there. To test: strings a.out | grep main ] Was the binary compiled with "-g"? # Test if path to include files was saved: strings a.out | grep /usr/include # test if path to working directory of source file was saved: strings a.out | grep WORKING_DIR_WHERE_COMPILED # test if main() routine has location information: strings a.out | grep main [ Is there something like: main:F(0,1) ] Was the binary dynamically linked or statically linked with other libraries: (dynamically linked is shared libary, known as "dll" in Microsoft world) ldd a.out libc.so.6 => /lib/libc.so.6 (0x4001b000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) Are the dynamic libraries themselves dynamically linked with libraries: ldd /lib/libc.so.6 [ Note, dynamic library ends in .so.VERSION_NUMBER and static library ends in .a ] LINKING: gcc will automatically invoke a link if the `-c' option was not specified. If the required library is a static library, at link time the linker will search for any unresolved symbols in the libraries on the command line, specified by `-l'. If static libraries are used, at link time you will see an error if a symbol cannot be found in one of the libraries. The gcc option, `-L' specifies a directory where to search for static libraries at runtime. If the required library is a dynamic library, the search path for dynamic libraries is determined at runtime by: LD_LIBRARY_PATH See `ldd' for tracing dependencies of an object file on dynamic libraries. Are there variables created by the C compiler pointing to the beginning and end of the data segment? nm a.out | grep data 0804a5c0 D __data_start 0804a6b8 A _edata 0804a5c0 W data_start Similar ideas apply for "bss" (stack segment), and "text" (code segment). Understand the relationship of the preprocessor, compiler, assembler, linker, run-time library (libc.a), and include files. Read carefully the man page and/or info or manual on web. Study the command-line options for gcc, as, ld: cpp http://gcc.gnu.org/onlinedocs/cpp/ (preprocessor, can also be invoked in GNU as `gcc -E') gcc http://www.gnu.org/software/gcc/onlinedocs/ [ Note that gcc searches for missing symbols by scanning its command line _once_ only from left to right for clues. ] as (To generate assemble in GNU, `gcc -S'; To assemble and link a file: gcc file.s) ld (To link in GNU, just: gcc file1.o file2.o file3.a ) C library http://www.gnu.org/manual/glibc-2.0.6/libc.html (e.g.: libc.a, glibc.a (GNU libc), the version number is important) include files (/usr/include/, must correspond to version of libc and to local architecture) Understand and use `gdb' for debugging. It's essential for productivity. Understand ar and the concepts behind an archive file (.a file or .so file). Note that the archive files are file formats for libraries. Each original file is compiled into a module in a .a or .so file. When linking, one links all symbols in a module or none at all. RUNNING: gdb -args COMMAND-LINE > run > where [ To see in what routine a binary is crashing. See elsewhere for more on gdb. ] strace COMMAND-LINE shows results of all system calls to kernel as the binary executes. When a binary is crashing and gdb doesn't help, try this. Binary crashes: echo $status (128 + error code returned; see errno, perror, strerror) quota -v (overran quota) df (what disks are available, how much disk space?) limit (Resources) top - overran virtual memory Terminal messed up: vt100 works on all terminals; Try (in sh or bash): term=vt100; TERM=vt100; export TERM Try (in csh or tcsh): set term = vt100; setenv TERM vt100; If backspace or delete is not working, look at `stty' stty -a to print all parameters. Look at things like erase rows columns Use stty to change them: stty erase '^H' (backspace key) stty erase '^?' (delete key) stty rows 24 ... and so on ...; man stty for details Creating a public distribution: KNOW ABOUT `gzip -dc | tar tvf -' AND `gzip -dc | tar xvf -' AND tar cvf mydir.tar ./mydir; gzip mydir.tar (If in GNU/Linux, `gzip zxvf -' exists for .gz files.) TERMINOLOGY: source directory, build directory, install directory [ These can all be the same or different. The build directory is where all the compiling and linking takes place. the Install directory is where the public files (binaries, etc.) for multiple users are placed. Use of the install directory will require root privilege, but ideally the other directories will not require root privilege. ] make (Makefile) [ Typical targets for make: install, clean, distclean, dist (First target is default, usually it does the build.) automake [ optional, useful only in larger distributions with complicated build dependencies ] autoheader [ optional, generate config.h.in, config.h can then be included in all source files, and it declares all functions in source files ] autoscan autoconf (./configure configure.in) configure (run by end-user for configuring and installing. Most important options: ./configure --help ./configure --prefix=INSTALL_DIRECTORY It will then typically modify Makefile, so that `make install' will create and/or use one or more of: INSTALL_DIRECTORY/bin INSTALL_DIRECTORY/lib INSTALL_DIRECTORY/man INSTALL_DIRECTORY/info INSTALL_DIRECTORY/include Files used in preparing a software package for distribution: your source files --> [autoscan*] --> [configure.scan] --> configure.in configure.in --. .------> autoconf* -----> configure +---+ [aclocal.m4] --+ `---. [acsite.m4] ---' | +--> [autoheader*] -> [config.h.in] [acconfig.h] ----. | +-----' [config.h.top] --+ [config.h.bot] --' Makefile.in -------------------------------> Makefile.in Files used in configuring a software package: .-------------> config.cache configure* ------------+-------------> config.log | [config.h.in] -. v .-> [config.h] -. +--> config.status* -+ +--> make* Makefile.in ---' `-> Makefile ---' Browsing source code: ctags (for vi), etags (for emacs) In emacs: M-. to search for symbol, M-, to continue searching ("M-." means type meta or escape key, then type ".") For other emacs commends: M-xtags-SPACE (SPACE forces emacs to show all possible completions.) For help on functions, try something like: C-xftags-query-replace RETURN VALUES FROM SYSTEM CALLS Section 2 of the UNIX manual is for system calls (direct calls to the kernel), and Section 3 is for calls to the C library (and sometimes to other libraries that must be linked). This is why we have both: man 2 read (Section 2, direct kernel call using file descriptors) man 3 fread (Section 3, C library version using streams) Most systems calls and C library calls return -1 or NULL for error. Check the man page on this. Hence: #include if ( -1 == open Most system calls return -1 if there was an error, and perror is used to report the error. For example: SHELL READ: man sh This describes the Bourne shell. It is short enough to read comfortably, and is the shell of choice for writing shell scripts. You can later skim `man tcsh' or whatever other shell you use interactively. EMACS and LATEX Both are important in the UNIX culture. In emacs, type ^H (Control-H), and then start exploring from there according to what interests you. For LATEX, there are several short introductions. Try: google introduction latex, and choose one you like. Here's a reasonable choice: A Gentle Introduction to TeX (TeX format) ftp://ctan.tug.org/tex-archive/documentation/gentle.tex The Not So Short Introduction to LaTeX (.zip) ftp://ctan.tug.org/tex-archive/documentation/lshort/english/src.zip Excellent reference for LaTeX when you know it: http://www.giss.nasa.gov/latex/ltx-2.html (Try 'locate ltx-2.html' in Linux; This is part of the tetex distribution) In both cases, if you search on the web, you'll find versions already in .pdf or in .ps format. One nice trick I use is at the beginning of a LaTeX file, I write: % -*- mode: latex; mode: auto-fill -*- This is a TeX comment, but emacs reads it, and emacs sets its modes for latex and auto-fill insited the emacs editor. For manipulating text and converting formats, read about: dvips, pstops, psselect, dviselect (if on your system), pdflatex, pdftex, pstotext, ps2ascii, ps2pdf, ps2ps, and others. See also dos2unix (remove extra carriage-returns inserted by PC software; This is a special case of the more general software, recode. For converting from Microsoft .doc (Word), try: wvWare or StarOffice or OpenOffice