If you use the shell for serious programming, as I do, speed of execution is an important issue. A script should not appear sluggish; it should not be noticeably slower than a program written in Perl or Python — or even C. One of the major contributors to slowdown of scripts is starting a new process, whether it is an external command, or command substitution1 . (All shells except KornShell 93 create a new process for command substitution.)
When I started writing Unix shell scripts, I used a Bourne shell. It
was far more powerful than the Amiga or MS-DOS shells I had used
previously, but it still relied on external commands for most useful
work. There was no arithmetic in the shell; I used
expr and awk for calculations. I used
expr, cut, tr,
basename, and various other commands to manipulate
strings.
With the Korn shell, and the later POSIX/SUS standardization, string
chopping (via parameter expansion: ${var%PATTERN},
${var#PATTERN}, etc.) and integer arithmetic were
brought into the shell itself, speeding up many operations. It
became possible to write a large number of useful programs without
calling any external commands.
This still left many trivial operations requiring external commands
(converting uppercase letters to lowercase, for example).
Bash has a solution: commands that can be compiled and
loaded at run time if and when needed.
Compiling and Loading Bash Built-Ins
The bash source package has a directory full of
examples ready to be compiled. To do that, download the source from
ftp://ftp.cwru.edu/pub/bash/bash-3.1.tar.gz. Unpack the
tarball, cd into the top level directory, and run
the configure script.
wget ftp://ftp.cwru.edu/pub/bash/bash-3.1.tar.gz gunzip bash-3.1.tar.gz tar xf bash-3.1.tar cd bash-3.1 ./configure
The configure script creates Makefiles
throughout the source tree, including one in
examples/loadables. In that directory are the source
files for built-in versions of a number of standard commands "whose
execution time is dominated by process startup time". You can
cd into that directory, and run make:
cd examples/loadables
make -k ## I use -k because I get some errors.
You'll now have a number of commands ready to load into your
bash shell. These include:
logname basename dirname tee
head mkdir rmdir uname
ln cat id whoami
There are also some useful new commands:
print ## Compatible with the ksh print command
finfo ## Print file information
strftime ## Format date and time
These built-ins can be loaded into a running shell with:
enable -f filename built-in-name
They include documentation, and the help command can be
used with them, just as with other built-in commands:
$ enable ./strftime strftime
$ help strftime
strftime: strftime format [seconds]
Converts date and time format to a string and displays it on the
standard output. If the optional second argument is supplied, it
is used as the number of seconds since the epoch to use in the
conversion, otherwise the current time is used.
Modifying Loadable Built-Ins
With the strftime command, I can now do date arithmetic
without external commands. For example, to get yesterday's date (a
very frequently asked question in the newsgroups):
strftime %Y-%m-%d $(( $(strftime %s) - 86400 ))
That script has one drawback: it uses command substitution. The
timing of commands must not be taken too literally (they can vary a
great deal even on the same system, depending on what else is
running at the time), but they give a useful basis for comparison.
The difference between using the built-in strftime
(with command substitution) and the GNU date command is surprisingly
small:
$ time strftime %Y-%m-%d $(( $(strftime %s) - 86400 ))
2006-04-04
real 0m0.006s
user 0m0.000s
sys 0m0.005s
$ time date -d yesterday +%Y-%m-%d
2006-04-04
real 0m0.007s
user 0m0.000s
sys 0m0.007s
In absolute terms, it's not very long, but in a script there may be
many such commands and they may be repeated many times. Since
built-in commands are executed in the current shell, why not have it
set a variable instead of printing the result? I added an option to
strftime to store the result in a variable rather than
printing it on stdout. The difference was significant:
$ time {
strftime -v now %s
strftime %Y-%m-%d $(( $now - 86400 ))
}
2006-04-04
real 0m0.000s
user 0m0.000s
sys 0m0.000s
The changes to strftime.c are relatively minor. First,
I included the header for bash's internal options
parser:
#include "bashgetopt.h"
Then I declared two variables:
int ch;
char *var = NULL;
The longest piece of code parses the options, which are passed as a
linked list and parsed by bash's own function:
reset_internal_getopt ();
while ((ch = internal_getopt (list, "p:")) != -1)
switch(ch) {
case 'p':
var = list_optarg; /* should add check for valid variable name */
break;
default:
builtin_usage();
return (EX_USAGE);
}
list = loptend;
The bind_variable function stores the result in a shell
variable if the -v option was used:
if ( var )
bind_variable (var, tbuf, 0);
else
Finally, two lines to add to the documentation. The first is added
to the array of strings that are printed when help
strftime command is used:
"OPTION: -v VAR - Store the result in shell variable VAR",
The second is the short documentation or usage string, and modifies the existing string:
"strftime [-v VAR] format [seconds]", /* usage synopsis; becomes short_doc */
The final strftime.c file is in strftime.c.
Writing New Bash Built-Ins
To write your own loadable commands, create a directory for them and
copy the Makefile and the template.c files
from bash-3.1/examples/loadables into it. The
Makefile, which was created by running
./configure at the root of the bash source
tree, contains the location of that source so that header files can
be found. Make sure that top_dir points to the same
place as BUILD_DIR. I also strip out all that I don't
need. My resulting Makefile looks like this:
# # Simple makefile for the sample loadable builtins # # Copyright (C) 1996 Free Software Foundation, Inc. # Modified 2006, Chris F.A. Johnson# This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111 USA. # Include some boilerplate Gnu makefile definitions. prefix = /usr/local exec_prefix = ${prefix} bindir = ${exec_prefix}/bin libdir = ${exec_prefix}/lib infodir = ${prefix}/info includedir = ${prefix}/include ## The next line should point to your bash source tree BUILD_DIR = /home/chris/src/bash-3.1 topdir = ${BUILD_DIR} srcdir = . VPATH = . CC = gcc RM = rm -f SHELL = /bin/sh host_os = linux-gnuoldld host_cpu = i686 host_vendor = pc CFLAGS = -g -O2 LOCAL_CFLAGS = DEFS = -DHAVE_CONFIG_H LOCAL_DEFS = -DSHELL CPPFLAGS = BASHINCDIR = ${topdir}/include LIBBUILD = ${BUILD_DIR}/lib INTL_LIBSRC = ${topdir}/lib/intl INTL_BUILDDIR = ${LIBBUILD}/intl INTL_INC = LIBINTL_H = CCFLAGS = $(DEFS) $(LOCAL_DEFS) $(LOCAL_CFLAGS) $(CFLAGS) # # These values are generated for configure by ${topdir}/support/shobj-conf. # If your system is not supported by that script, but includes facilities for # dynamic loading of shared objects, please update the script and send the # changes to bash-maintainers@gnu.org. # SHOBJ_CC = gcc SHOBJ_CFLAGS = -fPIC SHOBJ_LD = ${CC} SHOBJ_LDFLAGS = -shared -Wl,-soname,$@ SHOBJ_XLDFLAGS = SHOBJ_LIBS = INC = -I. -I.. -I$(topdir) -I$(topdir)/lib -I$(topdir)/builtins \ -I$(BASHINCDIR) -I$(BUILD_DIR) -I$(LIBBUILD) \ -I$(BUILD_DIR)/builtins $(INTL_INC) .c.o: $(SHOBJ_CC) $(SHOBJ_CFLAGS) $(CCFLAGS) $(INC) -c -o $@ $< all: clean: $(RM) $(OTHERPROG) *.o
The template.c file is compilable and has the bare bones
necessary to write a dynamically loadable built-in, plus the skeleton
for adding command-line options. There are three necessary
sections:
- the function that implements the built-in,
- a struct containing the documentation to be printed with the
helpcommand, and - a
structtelling bash where to find the built-in and its documentation, and a short duocumenation or usage string.
These are outlined in the examples/loadables/hello.c:
/* Sample builtin to be dynamically loaded with enable -f and create a new builtin. */ /* See Makefile for compilation details. */ #include#if defined (HAVE_UNISTD_H) # include #endif #include #include "builtins.h" #include "shell.h" #include "bashgetopt.h" /* A builtin `xxx' is normally implemented with an `xxx_builtin' function. If you're converting a command that uses the normal Unix argc/argv calling convention, use argv = make_builtin_argv (list, &argc) and call the original `main' something like `xxx_main'. Look at cat.c for an example. Builtins should use internal_getopt to parse options. It is the same as getopt(3), but it takes a WORD_LIST *. Look at print.c for an example of its use. If the builtin takes no options, call no_options(list) before doing anything else. If it returns a non-zero value, your builtin should immediately return EX_USAGE. Look at logname.c for an example. A builtin command returns EXECUTION_SUCCESS for success and EXECUTION_FAILURE to indicate failure. */ int hello_builtin (list) WORD_LIST *list; { printf("hello world\n"); fflush (stdout); return (EXECUTION_SUCCESS); } /* An array of strings forming the `long' documentation for a builtin xxx, which is printed by `help xxx'. It must end with a NULL. */ char *hello_doc[] = { "this is the long doc for the sample hello builtin", (char *)NULL }; /* The standard structure describing a builtin command. bash keeps an array of these structures. The flags must include BUILTIN_ENABLED so the builtin can be used. */ struct builtin hello_struct = { "hello", /* builtin name */ hello_builtin, /* function implementing the builtin */ BUILTIN_ENABLED, /* initial flags for builtin */ hello_doc, /* array of long documentation strings. */ "hello", /* usage synopsis; becomes short_doc */ 0 /* reserved for internal use */ };
To write a new built-in command, I use newbi-sh
to change the references to template in
template.c to the name of my built-in, and add it to the
Makefile:
#! /bin/bash
#@ If no name is given on the command line, prompt the user for it
if [ -n "$1" ]
then
builtin=$1
else
read -ep "Name of builtin: " builtin
[ -z "$builtin" ] && exit 1
fi
#@ If a C file for the builtin exists, ask whether to overwrite it
if [ -s "$builtin.c" ]
then
ls -l "$builtin.c"
read -sn1 -p "Overwrite $builtin.c [y/N/c]? " ok
case $X in
[yY]) ;;
[cC]) exit 2 ;;
* ) ok= ;;
esac
printf "\n"
else
ok=1
fi
#@ Use template.c as the basis for the new builtin
if [ -n "$ok" ]
then
sed "s/template/$builtin/g" template.c > "$builtin.c"
fi
#@ Make a copy of the existing makefile
cp Makefile mkf || exit 2
{
#@ Add the new builtin to Makefile
#@ [should add check to see if it's already there]
fmt="\t\$(SHOBJ_LD) \$(SHOBJ_LDFLAGS) \$(SHOBJ_XLDFLAGS)"
fmt="$fmt -o \$@ %s.o \$(SHOBJ_LIBS)\n"
sed "/^all:/ s/\$/ $builtin/" mkf
printf "\n%s: %s.o\n" "$builtin" "$builtin"
printf "$fmt" "$builtin"
printf "\n%s.c: %s.o\n" "$builtin" "$builtin"
} > Makefile
One of the scripts most frequently requested in the Unix and Linux
newsgroups converts filenames from uppercase (or partly uppercase)
to lowercase. This usually means calling tr once for
every file. (ksh has typeset -u, but it's
non-standard and not implemented in bash.)
A shell function is quite efficient for converting short strings:
[an error occurred while processing this directive]
lcase()
{
word=$1
while [ -n "$word" ]
do
temp=${word#?}
case ${word%"$temp"} in
A*) _LWR=a ;;B*) _LWR=b ;;
C*) _LWR=c ;;D*) _LWR=d ;;
E*) _LWR=e ;;F*) _LWR=f ;;
G*) _LWR=g ;;H*) _LWR=h ;;
I*) _LWR=i ;;J*) _LWR=j ;;
K*) _LWR=k ;;L*) _LWR=l ;;
M*) _LWR=m ;;N*) _LWR=n ;;
O*) _LWR=o ;;P*) _LWR=p ;;
Q*) _LWR=q ;;R*) _LWR=r ;;
S*) _LWR=s ;;T*) _LWR=t ;;
U*) _LWR=u ;;V*) _LWR=v ;;
W*) _LWR=w ;;X*) _LWR=x ;;
Y*) _LWR=y ;;Z*) _LWR=z ;;
*) _LWR=${1%${1#?}} ;;
esac
printf "%s" "$_LWR"
word=$temp
done
}
...but it drags when used for long words, and approaches the
execution time of tr. A built-in command would be an
order of magnitude faster, so I wrote lcase:
/* lcase - convert string to lowercase */
#include %lt;config.h>
#if defined (HAVE_UNISTD_H)
# include %lt;unistd.h>
#endif
#include "bashansi.h"
#include %lt;stdio.h>
#include %lt;errno.h>
#include "builtins.h"
#include "shell.h"
#include "bashgetopt.h"
#if !defined (errno)
extern int errno;
#endif
extern char *strerror ();
lcase_builtin (list)
WORD_LIST *list;
{
int n = 0;
int ch;
char *string;
char *var = NULL;
reset_internal_getopt ();
while ((ch = internal_getopt (list, "v:")) != -1)
switch(ch) {
case 'v':
var = list_optarg;
break;
default:
builtin_usage();
return (EX_USAGE);
}
list = loptend;
if (list == 0 || list->next)
{
builtin_usage ();
return (EX_USAGE);
}
if (no_options (list))
return (EX_USAGE);
string = list->word->word;
while ( string[n] )
{
string[n] = tolower(string[n]);
++n;
}
if ( var )
bind_variable (var, string, 0);
else
printf ("%s\n", string);
return (EXECUTION_SUCCESS);
}
char *lcase_doc[] = {
"The STRING is converted to lower case and either:",
" stored in the variable supplied with -v",
" or",
" printed to stdout if no variable is given",
(char *)NULL
};
struct builtin lcase_struct = {
"lcase", /* builtin name */
lcase_builtin, /* function implementing the builtin */
BUILTIN_ENABLED, /* initial flags for builtin */
lcase_doc, /* array of long documentation strings. */
"lcase [-v VAR] STRING", /* usage synopsis */
0 /* reserved for internal use */
};
Having done that, I added the inverse, ucase to convert
lowercase to uppercase. Then icase to convert upper to
lower and lower to upper. Next came pattern creation to match either
upper or lower case:
$ icase "John Doe" jOHN dOE $ ncase qwerty [Qq][Ww][Ee][Rr][Tt][Yy]
Finally, I added cap, to capitalize the first letters
of words. I amalgamated all of these into a single file
(
case.c
), and they are all enabled with a single command:
enable -f $HOME/src/loadables/case ucase lcase icase ncase cap