HOME >> Unix Shell >> Shell Scripting Recipes >> Chapter 1: [ < Introduction | Chapter 2 > ]
The POSIX shell is a descendant of the KornShell, which was a descendant of the Bourne shell. The basic syntax has remained the same, and Bourne shell scripts will usually run successfully in a POSIX shell. Not all of the KornShell features are included in POSIX (there are no arrays, for example), but the most important ones are. Those, for me, are string manipulation and arithmetic.
The scripts in this book make extensive use of features of the POSIX shell, and keep external commands to a minimum. This chapter presents an overview of the features of the POSIX shell and the external Unix commands used in this book, without going into great detail. Further information is available in the documentation for the shell and the various commands as well as on many web pages, a number of which are listed in the Appendix. I will also explain some of the idiosyncrasies used in the scripts, and present a library of functions that are used by many scripts.
These descriptions are brief overviews of the built-in commands; for a complete
description, see your shell's man page.
1 echo
The echo command prints its arguments separated by single spaces followed by a
newline. If an unquoted variable contains characters present in $IFS (see
"Parameters and Variables" later in this chapter), then the variable will be
split on those characters:
$ list="a b c d e f g h"
$ echo $list
a b c d e f g h
If the variable is quoted, all internal characters will be preserved:
$ echo "$list"
a b c d e f g h
In the early days of Unix, two different versions of echo appeared. One
version converted escape sequences, such as \t and \n, into the characters they
represent in the C language; \c suppressed the newline, and discarded any
further characters. The other used the -n option to suppress the trailing
newline and did not convert escape sequences. The POSIX standard for echo says
that "Implementations shall not support any options" and "If the first operand
is -n, or if any of the operands contain a backslash ( '\' ) character, the
results are implementation-defined." In other words, you cannot rely on echo's
behavior being one or the other. It is best not to use echo unless you know
exactly what echo is going to print, and you know that it will not contain any
problem characters. The preferred command is printf.
2 printf
This command may be built into the shell itself, or it may be an external
command. Like the C-language function on which it is based, printf takes a
format operand that describes how the remaining arguments are to be printed, and
any number of optional arguments. The format string may contain literal
characters, escape sequences, and conversion specifiers. Escape sequences (the
most common ones being \n for newline, \t for tab, and \r for carriage return)
in format will be converted to their respective characters. Conversion
specifiers, %s, %b, %d, %x, and %o, are replaced by the corresponding argument
on the command line. Some implementations support other specifiers, but they are
not used in this book. When there are more arguments than specifiers, the format
string is reused until all the arguments have been consumed.
The %s specifier interprets its argument as a string and prints it literally:
$ printf "%s\n" "qwer\ty" 1234+5678
qwer\ty
1234+5678
The %b specifier is like %s, but converts escape sequences in the argument:
$ printf "%b\n" "qwer\ty" "asdf\nghj"
qwer y
asdf
ghj
The %d, %x, and %o specifiers print their arguments as decimal, hexadecimal,
and octal numbers, respectively.
$ printf "%d %x %o\n" 15 15 15
15 f 17
The conversion specifiers may be preceded by flags for width specification,
optionally preceded by a minus sign indicating that the conversion is to be
printed flush left, instead of flush right, in the specified number of columns:
$ printf "%7d:\n%7s:\n%-7s:\n" 23 Cord Auburn
23:
Cord:
Auburn :
In a numeric field, a 0 before the width flag indicates padding with zeroes:
$ printf "%07d\n" 13
0000013
3 set
In the Oxford English Dictionary, the longest entry is for the word set-thirty-
two pages in my Compact Edition. In the Unix shell, the set command is really
three commands in one. Without any arguments, it prints the names and values of
all shell variables (including functions). With one or more option arguments, it
alters the shell's behavior. Any non-option arguments are placed in the
positional parameters.
Only three options to set are used in this book:
* -v: Print shell input lines as they are read.
* -x: Print commands and their arguments as they are executed.
* -f: Disable file name generation (globbing).
Given this script, which I call xx.sh:
echo "Number of positional parameters: $#"
echo "First parameter: ${1:-EMPTY}"
shift $(( $# - 1 ))
echo "Last parameter: ${1:-EMPTY}"
Its output is:
$ xx.sh the quick brown fox
Number of positional parameters: 4
First parameter: the
Last parameter: fox
If set -v is added to the top of the script, and the standard output
redirected to oblivion, the script itself is printed:
$ xx.sh the quick brown fox >/dev/null
echo "Number of positional parameters: $#"
echo "First parameter: ${1:-EMPTY}"
shift $(( $# - 1 ))
echo "Last parameter: ${1:-EMPTY}"
If set -v is replaced with set -x, variables and arithmetic expressions are
replaced by their values when the lines are printed; this is a useful debugging
tool:
$ xx.sh the quick brown fox >/dev/null
++ echo 'Number of positional parameters: 4'
++ echo 'First parameter: the'
++ shift 3
++ echo 'Last parameter: fox'
++ exit
To demonstrate the set -f option, and the use of + to reverse the operation, I
ran the following script in an empty directory:
## Create a number of files using brace expansion (bash, ksh)
touch {a,b,c,d}${RANDOM}_{e,r,g,h}${RANDOM}
## Turn off filename expansion
set -f
printf "%-22s%-22s%-22s\n" * ; echo ## Display asterisk
printf "%-22s%-22s%-22s\n" *h* ; echo ## Display "*h*"
## Turn filename expansion back on
set +f
printf "%-22s%-22s%-22s\n" * ; echo ## Print all filenames
printf "%-22s%-22s%-22s\n" *h* ; echo ## Print filenames containing "h"
When the script is run, this is the output:
$ xx.sh
*
*h*
a12603_e6243 a28923_h23375 a29140_r28413
a5760_g7221 b17774_r4121 b18259_g11343
b18881_e10656 b660_h32228 c22841_r19358
c26906_h14133 c29993_g6498 c6576_e25837
d11453_h12972 d25162_e3276 d7984_r25591
d8972_g31551
a28923_h23375 b660_h32228 c26906_h14133
d11453_h12972
You can use set to split strings into pieces by changing the value of $IFS. .
For example, to split a date, which could be 2005-03-01 or 2003/09/29 or
2001.01.01, $IFS can be set to all the possible characters that could be used as
separators. The shell will perform word splitting on any character contained in
$IFS:
$ IFS=' -/.'
$ set 2005-03-01
$ printf "%s\n" "$@"
2005
03
01
When the value to be set is contained in a variable, a double dash should be
used to ensure that the value of the variable is not taken to be an option:
$ var="-f -x -o"
$ set -- $var
4 shift
The leading positional parameters are removed, and the remaining parameters are
moved up. By default, one parameter is removed, but an argument may specify
more:
$ set 1 2 3 4 5 6 7 8
$ echo "$* ($#)"
1 2 3 4 5 6 7 8 (8)
$ shift
$ echo "$* ($#)"
2 3 4 5 6 7 8 (7)
$ shift 3
$ echo "$* ($#)"
5 6 7 8 (4)
Some shells will complain if the argument to shift is larger than the number
of positional parameters.
5 type
The POSIX standard says type "shall indicate how each argument would be
interpreted if used as a command name." Its return status may be used to
determine whether a command is available:
if type stat > /dev/null 2>&1 ## discard the output
then
stat "$file"
fi
If the command is an executable file, type prints the path to the file;
otherwise, it prints the type of command that will be invoked: function, alias,
or shell builtin. Its output is not standard across different shells, and
therefore cannot be used reliably in a shell script.
The four arguments to type in the following example represent an executable
file, a function, a nonexistent command, and an alias.
$ type ls pr1 greb ecoh
ls is hashed (/bin/ls)
pr1 is a function
pr1 ()
{
case $1 in
-w)
pr_w=
;;
*)
pr_w=-.${COLUMNS:-80}
;;
esac;
printf "%${pr_w}s\n" "$@"
}
bash: type: greb: not found
ecoh is aliased to `echo'
Unlike most shells, bash will print the definition of a function.
6 getopts
The command getopts parses the positional parameters according to a string of
acceptable options. If an option is followed by a colon, an argument is expected
for that option, and will be stored in $OPTARG. This example accepts -a, -b, and
-c, with -b expecting an argument:
while getopts ab:c opt
do
case $opt in
a) echo "Option -a found" ;;
b) echo "Option -b found with argument $OPTARG" ;;
c) echo "Option -c found" ;;
*) echo "Invalid option: $opt"; exit 5 ;;
esac
done
7 case
A workhorse among the shell's built-in commands, case allows multiple branches,
and is the ideal tool, rather than grep, for determining whether a string
contains a pattern or multiple patterns. The format is:
case STRING in
PATTERN [| PATTERN ...]) [list] ;;
[PATTERN [| PATTERN ...]) [list] ;; ...]
esac
The PATTERN is a pathname expansion pattern, not a regular expression, and the
list of commands following the first PATTERN that matches is executed. (See the
"Patterns" section further on for an explanation of the two types of pattern
matching.)
8 eval
The command eval causes the shell to evaluate the rest of the line, then execute
the result. In other words, it makes two passes at the command line. For
example, given the command:
eval "echo \${$#}"
The first pass will generate echo ${4} (assuming that there are 4 positional
parameters). This will then print the value of the last positional parameter,
$4.
9 local
The local command is used in functions; it takes one or more variables as
arguments and makes those local to the function and its children. Though not
part of the POSIX standard, it is built into many shells; bash and the ash
family have it, and pdksh has it as a standard alias for typeset (which is also
not included in POSIX). In KornShell 93 (generally referred to as ksh93), if a
function is defined in the portable manner (as used throughout this book), there
is no way to make a variable local to a function.
In this book, local is used only in the few scripts that are written
specifically for bash, most often for setting $IFS without having to restore it
to its original value:
local IFS=$NL
Parameters and Variables
Parameters are names used to represent information; there are three classes of
parameters: Positional parameters are the command-line arguments, and are
numbered beginning with $1; variables are parameters denoted by a name that
contains only letters, numbers and underscores, and that begins with a letter or
an underscore; and special parameters that are represented by non-alphanumeric
characters.
1 Positional Parameters
Positional parameters are the command-line arguments passed to a script or a
function, and are numbered beginning with 1. Parameters greater then 9 must be
enclosed in braces: ${12}. This is to preserve compatibility with the Bourne
shell, which could only access the first nine positional parameters; $12
represents the contents of $1, followed by the number 2. The positional
parameters can be assigned new values, with the set command. (See the example
under "Special Parameters.")
2 Special Parameters
The parameters $* and $@ expand to all the positional parameters, and #
represents the number of positional parameters. This function demonstrates the
features of these parameters:
demo()
{
printf "Number of parameters: %d\n" $#
printf " The first parameter: %s\n" "$1"
printf "The second parameter: %s\n" "$2"
printf "\nAll the parameters, each on a separate line:\n"
printf "\t%s\n" "$@"
printf "\nAll the parameters, on one line:\n"
printf "\t%s\n" "$*"
printf "\nEach word in the parameters on its own line:\n"
printf "\t%s\n" $*
}
Here, the demo function is run with three arguments:
$ demo The "quick brown" fox
Number of parameters: 3
The first parameter: The
The second parameter: quick brown
All the parameters, each on a separate line:
The
quick brown
fox
All the parameters, on one line:
The quick brown fox
Each word in the parameters on its own line:
The
quick
brown
fox
The decimal exit code of the previous command executed (0 for success, non-
zero for failure) is stored in $?:
$ true; echo $?
0
$ false; echo $?
1
The shell's current option flags are stored in $-; the shell's process ID is
in $$; $! is the process ID of the most recently executed background command,
and $0 is the name of the current shell or script:
$ sleep 4 &
[1] 12725
$ printf "PID: %d\nBackground command PID: %d\n" $$ $!
PID: 12532
Background command PID: 12725
$ printf "Currently executing %s with options: %s\n" "$0" "$-"
Currently executing bash with options: fhimBH
3 Shell Variables
These are the variables that are assigned values at the command line or in a
script. The system or the shell itself also set a number of variables; those
used in this book are:
* $HOME: The path name of user's home directory (e.g., /home/chris).
* $IFS: A list of characters used as internal field separators for
word splitting by the shell. The default characters are space, tab, and
newline. Strings of characters can be broken up by changing the value of
$IFS:
$ IFS=-; date=2005-04-11; printf "%s\n" $date
2005
04
11
* $PATH: This colon-separated list of directories tells the shell
which directories to search for a command. To execute a command in other
directories, including the current working directory, an explicit path must
be given (/home/chris/demo_script or ./demo_script, not just demo_script).
* $PWD: This is set by the shell to the pathname of the current
working directory:
$ cd $HOME && echo $PWD
/home/chris
$ cd "$puzzles" && echo $PWD
/data/cryptics
2 standard-vars-A Collection of Useful Variables
My standard-vars file begins with these lines:
NL='
'
CR='
'
TAB=' '
You might be able to guess that these three variables represent newline,
carriage return, and tab, but it's not clear, and cannot be cut and pasted from
a web site or newsgroup posting. Once those variables are successfully assigned,
however, they can be used, without ambiguity, to represent those characters. The
standard-vars file is read by the shell and executed in the current environment
(known as sourcing, it is described later in the chapter) in most of my shell
scripts, usually via standard-funcs, which appears later in this chapter.
I created the file with the following script, then added other variables as I
found them useful:
printf "%b\n" \
"NL=\"\n\"" \
"CR=\"\r\"" \
"TAB=\"\t\"" \
"ESC=\"\e\"" \
"SPC=\"\040\" \
"export NL CR TAB ESC SPC" > $HOME/scripts/standard-vars-sh
The -sh extension is part of the system I use for working on scripts without
contaminating their production versions. It is explained in Chapter 20.
Patterns
Two types of patterns are used in shell scripts: pathname expansion and regular
expressions. Pathname expansion is also known as globbing, and is done by the
shell; regular expressions are more powerful (and much more complicated), and
are used in external commands such as sed, awk, and grep.
1 Pathname Expansion
Three special characters tell the shell to interpret an unquoted string as a
pattern:
*: Matches any string, including an empty one. By itself, an asterisk matches
all files in the current directory, except those that begin with a dot.
?: Matches any single character. By itself, a question mark matches all files
in the current directory whose name is a single character, other than a dot.
[: When matched with a closing bracket, ], matches any of the characters
enclosed. These may be individual characters, a range of characters, or a
mixture of the two.
These patterns can be combined to form complex patterns for matching strings
in case statements, and for building lists of files. Here are a few examples
executed in a directory containing these files:
a b c
d ee ef
eg eh fe
ff fg fh
ge gf gg
gh he hf
hg hh i_158_d
i_261_e i_502_f i_532_b
i_661_c i_846_g i_942_a
j_114_b j_155_f j_248_e
j_326_d j_655_c j_723_g
j_925_a k_182_a k_271_c
k_286_e k_292_f k_294_g
To display all files with single-character names:
$ echo ?
a b c d
The next example prints all files whose names end with f:
$ echo *f
ef ff gf hf i_502_f j_155_f k_292_f
All files containing a number whose first digit is in the range 3 to 6 can be
shown with:
$ echo *_[3-6]*
i_502_f i_532_b i_661_c j_326_d j_655_c
2 Regular Expressions
When I started writing shell scripts, I had problems with grep. I used the
asterisk as a wildcard, expecting it to match any string. Most of the time, all
was well, but occasionally grep would print a line I didn't want. For instance,
when I wanted lines that contained call, I might get calculate as well, because
I used 'call*' as the search pattern.
At some point, it dawned on me that the patterns used by grep were not the
wildcards I had been using for years to match files, but regular expressions, in
which * stood for "zero or more occurrences of the preceding character or range
of characters". To match any string, the pattern is .*, as the period matches
any character, and the combination matches "zero or more occurrences of any
character." As with pathname expansion, [...] matches any of the characters
enclosed in the brackets.
To match non-empty lines, search for any single character; that is, a dot:
$ printf "%s\n" January February March "" May June July | grep .
January
February
March
May
June
July
To print lines containing a b or a c, brackets are used:
$ printf "%s\n" January February March " " May June July | grep '[bc]'
February
March
In addition, the caret, ^, matches the expression only at the beginning of a
line, and the dollar sign, $, matches only at the end of a line. Combining the
two, ^...$, matches only the entire line. By anchoring the match to the
beginning of the line, we can match lines with a as the second letter (the first
letter can be anything):
$ printf "%s\n" January February March " " May June July | grep '^.a'
January
March
May
Using both the caret and the dollar sign, we can match lines beginning with J
and ending with y:
$ printf "%s\n" January February March " " May June July | grep '^J.*y'
January
July
There are various flavors of regular expressions, including basic (BREs) and
extended (EREs). The Perl language has its own set (which has been incorporated
into Python), but the basics are common to all versions.
Regular expressions can be very complex (the example in the "Notes" to the
printat function in Chapter 12 is daunting at first glance, but actually fairly
simple), and are sometimes described as "write only"; once a regex (or regexp,
the common abbreviations for regular expression) is written, it can be very hard
to read it and understand how it works. A.M. Kuchling put it well in his Regular
Expression HOWTO[1] (replace Python with whatever language you are using):
"There are also tasks that can be done with regular expressions, but the
expressions turn out to be very complicated. In these cases, you may be
better off writing Python code to do the processing; while Python code will
be slower than an elaborate regular expression, it will also probably be
more understandable."
If you want to delve deeper into regular expressions, the classic book from
O'Reilly, sed & awk, has a very good section, and they are covered
comprehensively in the Apress book, Regular Expression Recipes: A Problem-
Solution Approach . There are also some links in the Appendix to online
resources. In this book, you will find very few regular expressions, and none
that cannot be easily understood.
Parameter Expansion
At its most basic, parameter expansion substitutes the value of the variable
when it is preceded by a dollar sign ($). The variable may be enclosed in braces
(${var}), and if the variable is a positional parameter greater than 9, the
braces must be used. You can use three other forms of expansion within the
braces: Bourne, POSIX, and shell specific.
The original Bourne shell parameter expansions tested whether the variable was
set or empty, and acted on the results of that test. The KornShell added
expansions to return the length of the variable's contents, and to remove the
beginning or end of the value if it matched a pattern; these have been
incorporated into the POSIX standard. Korn Shell 93 (ksh93) added the search-and-
replace and substring capabilities that have also been included in bash.
1 The Bourne Shell Expansions
The original Bourne shell expansions have two forms. With a colon, they test
whether a variable is null or unset; without the colon, the test is only whether
the variable is unset.
1 ${var:-DEFAULT}
If $var is unset or null, the expression expands to DEFAULT; otherwise, it
expands to the contents of the variable:
$ var=
$ echo ${var:-y}
y
$ var=x
$ echo ${var:-y}
x
Without the colon, the variable must be unset, not just null, for DEFAULT to
be used (the result of the variable expansion is surrounded by slashes):
$ var=
$ echo /${var-y}/
//
$ unset var
$ echo /${var-y}//y/
2 ${var:=DEFAULT}
The only difference between this and the previous expansion, is that this also
assigns a value to var:
$ var=
$ echo "${var:=q}"
q
$ echo "${var:=z}"
q
3 ${var:+VALUE}
This expansion (which was not in the very first Bourne shell) is the opposite of
the previous two. If var is not null (or, without the colon, if it is set),
VALUE is used. In the first example, var is unset, so the variable expands to an
empty string, with or without the colon:
$ unset var
$ echo /${var:+X}/
//
$ echo /${var+X}/
//
In the next example, var is set but null. With the colon, the test is for a
non-null string, so X is not printed. Without it, X is printed, because the test
is for whether the variable is set.
$ var=
$ echo /${var:+X}/
//
$ echo /${var+X}//X/
Finally, when the variable is set and not null, VALUE is used, with or without
the colon:
$ var=A
$ echo /${var:+X}//X/
$ echo /${var+X}//X/
A common use for this type of expansion is when building a list in which a
separator character is wanted between items. If we just used concatenation, we'd
end up with the separator at the beginning where it is not wanted:
$ for f in a b c d e
> do
> list=$list,$f
>done
$ echo $list
,a,b,c,d,e
With this expansion, we can insert the separator only if $list is not empty:
list=${list:+$list,}$f
This is equivalent to:
if [ -n "$list" ]
then
list=$list,$f
else
list=$f
fi
Using this expansion in place of the simple variable in the preceding example,
there is no initial comma:
$ for f in a b c d e
> do
> list=${list:+$list,},$f
>done
$ echo $list
a,b,c,d,e
4 ${var:?MESSAGE}
If var is unset (or, with the colon, null), an error or MESSAGE will be printed.
If the shell is not interactive (as in the case of a script), it will exit.
$ unset var
$ echo ${var?}
bash: var: parameter null or not set
$ echo ${1?No value supplied}
bash: 1: No value supplied
2 POSIX Parameter Expansions
The expansions introduced by ksh, and adopted by POSIX, perform string
manipulations that were once the province of the expr command. In these
expansions, PATTERN is a file-globbing pattern, not a regular expression.
1 ${#var}-Length of Variable's Contents
This expansion returns the length of the expanded value of the variable:
$ var=LENGTH
$ echo ${#var}
6
2 ${var%PATTERN}-Remove the Shortest Match from the End
The variable is expanded, and the shortest string that matches PATTERN is
removed from the end of the expanded value:
$ var=usr/local/bin/crafty
$ echo "${var%/*}"
usr/local/bin
3 ${var%%PATTERN}-Remove the Longest Match from the End
The variable is expanded, and the longest string that matches PATTERN from the
end of the expanded value is removed:
$ var=usr/local/bin/crafty
$ echo "${var%%/*}"
usr
4 ${var#PATTERN}-Remove the Shortest Match from the Beginning
The variable is expanded, and the shortest string that matches PATTERN is
removed from the beginning of the expanded value:
$ var=usr/local/bin/crafty
$ echo "${var#*/}"
local/bin/crafty
5 ${var##PATTERN}-Remove the Longest Match from the Beginning
The variable is expanded, and the longest string that matches PATTERN is removed
from the beginning of the expanded value:
$ var=usr/local/bin/crafty
$ echo "${var##*/}"
crafty
6 Combining Expansions
The result of one expansion can be used as the PATTERN in another expansion to
get, for example, the first or last character of a string:
$ var=abcdef
$ echo ${var%${var#?}}
a
$ echo ${var#${var%?}}
f
3 Shell-Specific Expansions, bash2, and ksh93
I use two shell-specific parameter expansions in this book, either in the
bash/ksh93 versions of functions (for example, substr in Chapter 3), or in bash-
only scripts.
1 ${var//PATTERN/STRING}-Replace All Instances of PATTERN with STRING
Because the question mark matches any single character, this example converts
all the characters to tildes to use as an underline:
$ var="Chapter 1"
$ printf "%s\n" "$var" "${var//?/~}"
Chapter 1
~~~~~~~~~
This expansion can also be used with a single slash, which means to replace
only the first instance of PATTERN.
2 ${var:OFFSET:LENGTH}-Return a Substring of $var
A substring of $var starting at OFFSET is returned. If LENGTH is specified, that
number of characters is substituted; otherwise, the rest of the string is
returned. The first character is at offset 0:
$ var=abcdefgh
$ echo "${var:3:2}"
de
$ echo "${var:3}"
defgh
Shell Arithmetic
In the Bourne shell, all arithmetic had to be done by an external command. For
integer arithmetic, this was usually expr. The KornShell incorporated integer
arithmetic into the shell itself, and it has been incorporated into the POSIX
standard. The form is $(( expression )), and the standard arithmetic operators
are supported: +, -, *, /, and %, for addition, subtraction, multiplication,
division, and modulus (or remainder). There are other operators, but they are
not used in this book; your shell's documentation will have all the details.
The standard order of operator precedence that we remember from high school
algebra applies here; multiplication and division are performed before addition
and subtraction, unless the latter are grouped by parentheses:
$ a=3
$ echo $(( $a + 4 * 12 ))
51
$ echo $(( ($a + 4) * 12 ))
84
The POSIX specification allows variables in arithmetic expressions to be used
without a leading dollar sign, like this: echo $(( a + 4 )) instead of echo $((
$a + 4 )). This was not clear from early versions of the standard, and a major
group of otherwise POSIX-compliant shells (ash, dash, and sh on BSD systems) did
not implement it. In order for the scripts in this book to work in those shells,
the dollar sign is always used.
Aliases
Aliases are the simple replacement of a typed command with another. In a POSIX
shell, they can only take arguments after the command. Their use in scripts and
on the command line) can be replaced entirely by functions; there are no aliases
in this book.
Sourcing a File
When a script is executed, it can obtain the values of variables that have been
placed in the environment with export, but any changes it makes to those or
other variables will not be visible to the script that called it. Functions
defined or changes to the current directory also will not affect the calling
environment. For these to affect the calling environment, the script must be
sourced. By using the dot command, the file is executed in the current shell's
environment:
. filename
This technique is used throughout the book, most often to define a library of
functions.
Functions
Functions group one or more commands under a single name. Functions are called
in the same way as any other command, complete with arguments. They differ from
ordinary commands in that they are executed in the current shell environment.
This means they can see all the variables of the calling shell; they do not need
to be exported. Variables set or changed in a function are also set or changed
in the calling shell. And, most of all, a function that does all its work with
shell commands and syntax is faster than an external command.
1 Functions Are Fast
In Chapter 6, the basename and dirname functions replace the external commands
of the same name, and do the job in a fraction of the time. Even a function more
than 70 lines long can execute much faster than an external command. In Chapter
5, the _fpmul function is faster than the calc function, which uses awk, unless
there are dozens of operands.
Under normal circumstances, I wouldn't think of writing a shell function for
floating-point multiplication; I'd let awk do it. I wrote _fpmul as a challenge,
just to show that it could be done. Now that it's done, and it has proved to be
faster than other methods, I do use it in scripts. A single line is all that's
needed to make the function available:
. math-funcs
Other operations on decimal fractions are more complicated, and therefore
aren't worth writing unless there's a specific need to do so.
2 Command Substitution Is Slow
When I discovered that using command substitution to store the results of a
function in a variable was so slow (in all shells except ksh93) that it severely
reduced the advantage of using functions, I started looking for ways to mitigate
the phenomenon. For a while I tried using a variable to tell a function whether
to print the result:
[ ${SILENT_FUNCS:-0} = 1 ] || echo "${_FPMUL}"
This worked, but I found it ugly and cumbersome; when I didn't want a function
to print anything, I had to set SILENT_FUNCS to 1 usually by preceding the call
with SILENT_FUNCS=1. Occasionally, I could set it at the beginning of a section
of code and have it in force for all subsequent function calls. I was well into
writing this book when the solution occurred to me, and I had to backtrack and
rewrite parts of earlier chapters to incorporate it.
Whenever a function returns a value (other than an exit status), I now write
two functions. One has the expected behavior of printing the result; the other,
which begins with an underscore, sets a variable that is the function's name
(including the underscore) converted to uppercase. To illustrate, here is a pair
of functions to multiply two integers:
_mul()
{
_MUL=$(( "$1" * "$2" ))
}
mul()
{
_mul "$@" && printf "%s\n" "$_MUL"
}
I can now print the result of the multiplication with
$ mul 12 13
156
Or, I can store the result in a variable with
$ _mul 12 13
$ product=$_MUL
The extra few milliseconds it takes to use command substitution...
$ time mul 123 456
56088
Real: 0.000 User: 0.000 System: 0.000
$ time { q=$(mul 123 456); }
Real: 0.005 User: 0.001 System: 0.003
...may not seem significant, but scripts often loop hundreds or even thousands
of times, and may perform several such substitutions inside a loop. The result
is a sluggish program.
3 Using the Functions in This Book
I use functions in three ways: at the command line, as commands in scripts, and
as a reference. For use at the command line, I source some of the function
libraries in my shell startup file; others I load at the command line when
I need them. In scripts, I usually source a single function library, and it will
load any other libraries it needs. At other times, I use the function library as
a reference, and I copy the code, sometimes modifying it, into the body of
another script or function.
The functions in this book are mostly stored in libraries of related
functions. You may find a different structure more suited to your coding style.
If so, go ahead and reorganize them. I would recommend that you avoid having the
same function in more than one library, unless the subsequent versions offer
additional features that are not usually needed.
The first library in this book is a collection of functions that I use in many
scripts. All are used in at least one script in the book.
standard-funcs: A Collection of Useful Commands
The functions in this library encapsulate commonly used tasks in a consistent
interface that makes using them easy. When I need to get a keystroke from the
user, I call get_key; when I want to display a date, I use show_date; when I
want to exit from a script because something has failed, I use die. With menu1,
I can display anything from a one-line to full-screen menu and execute a command
based on the user's response.
After defining the functions, the library loads standard_vars shown earlier in
this chapter:
. standard-vars
1 1.1 get_key-Get a Single Keystroke from the User
In some circumstances, such as when asking a user to select from a menu, only a
single key needs to be pressed. The shell read command requires a newline before
it exits. Is there a way to read a single key?
1 How It Works
The bash read command has an option, -n, to read only a specified number of
characters, but that is lacking in most other shells. The portable way uses stty
to turn off the terminal's buffering, and to set the minimum number of
characters required for a complete read. A single character can then be read by
dd.
1 Usage
get_key [VAR]
If a variable is specified, the key will be read into that variable. If not,
it will be in $_KEY.
2 The Script
get_key()
{
[ -t 0 ] && { ## Check whether input is coming from a terminal
[ -z "$_STTY" ] && {
_STTY=$(stty -g) ## Store the current settings for later restoration
}
## By default, the minimum number of keys that needs to be entered is 1
## This can be changed by setting the dd_min variable
## If the TMOUT variable is set greater than 0, the time-out is set to
## $TMOUT seconds
if [ ${TMOUT:--1} -ge 0 ]
then
_TMOUT=$TMOUT
stty -echo -icanon time $(( $_TMOUT * 10 )) min ${dd_min:-1}
else
stty -echo -icanon min ${dd_min:-1}
fi
}
## Read a key from the keyboard, using dd with a block size (bs) of 1.
## A period is appended, or command substitution will swallow a newline
_KEY=$(dd bs=1 count=1 2>/dev/null; echo .)
_KEY=${_KEY%?} ## Remove the period
## If a variable has been given on the command line, assign the result to it
[ -n "$1" ] &&
## Due to quoting, either ' or " needs special treatment; I chose '
case $_KEY in
"'") eval "$1=\"'\"" ;;
*) eval "$1='$_KEY'" ;;
esac
[ -t 0 ] && stty "$_STTY" ## reset terminal
[ -n "$_KEY" ] ## Succeed if a key has been entered (not timed out)
}
2 Notes
The get_key function is often redefined in other scripts to allow entry of
cursor and function keys-and even mouse clicks. For an example, which is too
long to include in this book, see the mouse_demo script on my web site.[2]
2 1.2 getline-Prompt User to Enter a Line
For interactive scripts, I like the editing capabilities that bash's readline
library offers, but I still want the script to work in other POSIX shells. I
want the best of both worlds!
1 How It Works
The getline function checks for the existence of the $BASH_VERSION variable, and
uses the readline library if it is set. If not, a POSIX read command is used.
1 Usage
getline "PROMPT" [VAR]
If no VAR is given, the line is read into the _GETLINE variable. If the
variable name used is password, the keystrokes will not be echoed to the
terminal.
2 The Script
_getline()
{
## Check that the parameter given is a valid variable name
case $2 in
[!a-zA-Z_]* | *[!a-zA-Z0-9_]* ) die 2 "Invalid variable name: $2" ;;
*) var=${2:-_GETLINE} ;;
esac
## If the variable name is "password" do not turn on echoing
[ -t 0 ] && [ "$2" != "password" ] && stty echo
case ${BASH_VERSION%%.*} in
[2-9]|[1-9][0-9])
read ${TMOUT:+-t$TMOUT} -ep "$1: " -- $var
;;
*) printf "%s: " "$1" >&2
IFS= read -r $var
;;
esac
[ -t 0 ] && stty -echo ## turn echoing back off
}
3 1.3 press_any_key-Prompt for a Single Keypress
Despite the now-trite comment, "But my keyboard doesn't have an ANY key," it is
often desirable to pause execution of a script until the user presses a key with
the message "PRESS ANY KEY TO CONTINUE."
1 How It Works
The get_key function (shown two functions previously) provides the mechanism to
read a single keypress, and printf and carriage returns display and erase the
message.
1 Usage
press_any_key
At one time, this script accepted an argument: the name of a variable in which
to store the key. I never used it, so I removed it. If you do want the
keystroke, you can get it from the $_KEY variable set by get_key.
2 The Script
press_any_key()
{
printf "\r