POSIX shell idioms
You sit down before your obnoxiously clacky mechanical keyboard and
bang out the familiar #!/bin/sh
with that precious sense of optimism
like writing the first words in a new notebook: the margins are
aligned and you begin to think that your handwriting isn’t so bad
after all. Soon enough, though, it’s riddled with scribbled out
paragraphs and arrows that swing around the pages.
Maintaining safe and bug-free shell scripts can become troublesome as
they grow in complexity, and limiting one’s craft only to those few
blessed features defined by
However, what is lost in flexibility is gained in portability. With some casual experimentation and a handful of powerful idioms, perhaps the purely POSIX shell language can prove more elegant and expressive than it is given credit for.
What follows are three ways to use one such idiom: the read(1)-while loop.
The read(1) builtin
Built into any POSIX compatible shell is the read(1) command.
read [-r] var…
In its most basic usage, it can be used to gather a line of input from the user at the terminal; that might be to answer a yes/no prompt, or to provide a text string. But more versatile is its ability to parse text files in a rudimentary but reliable way.
The read(1) command always consumes exactly one line of
text—that’ll be from standard input or whatever you pipe or redirect
into it. When it hits
Below is a really simple example which simply prints each line read.
The entire content of each line, except the terminating newline
character, is assigned to the line
variable. We set the IFS
line
, but really you can call it anything you like.
while IFS= read -r line
do printf '%s\n' "$line"
done
Unless you know what you’re doing, you probably want to pass
the -r
option; its effect is described in the POSIX Programmer’s
Manual.
Beware that read(1) still fails if a non-empty line is
missing its newline character before EOF, so the loop will not be
entered for that final line. The solution is to check for emptiness
in the event that read fails
while IFS= read -r line || [ "$line" ]
do …
done
but you may instead prefer to assume that the input is well formed—garbage in, garbage out.
Case study 1 of 3: Linewise filtering
Let’s say we want to remove lines that begin with a hash sign
(#)—something akin to grep -v ^\#
. This can be achieved with the
#*
glob pattern in a case statement.
while IFS= read -r line
do
case $line in
\#*) ;;
*) printf '%s\n' "$line"
esac
done
The backslash escape in \#*
is necessary to protect the hash
sign from becoming a shell comment. Pattern matching each
line like this is a powerful way to interpret file contents. Let’s
also ignore leading white space:
while IFS= read -r line
do
case ${line#${line%%[![:space:]]*}} in
\#*) ;;
*) printf '%s\n' "$line"
esac
done <<EOF
one
# two
# three
four
EOF
The precise workings of this parameter expansion wizardry is described
in Dylan Arap’s excellent
pure-sh-bible,
but understanding it is not necessary going forward—just know that it
expands to the value of line
stripped of any leading white space.
This time I fed it a here
document, resulting in
the following two lines being printed:
one
four
Case study 2 of 3: Key-value pairs
Let’s now try to parse a file consisting of lines of the form
key=value
where key contains no equals signs and value can be
anything at all.
For this we set the IFS
to the equals sign and pass two variable
names as arguments to read(1).
while IFS='=' read -r key value
do printf 'Key ‘%s’ has value ‘%s’.\n' "$key" "$value"
done <<EOF
name=banana
type=fruit
colour=yellow
EOF
Setting IFS='='
tells read(1) to split the line into
fields at every equals sign. Since we named only two variables (key
and value
) to assign fields to, any fields right of the second are
merged into value
. In other words, all characters after the first
equals sign are considered part of the value
, so no quoting or
escaping is necessary.
Running the above code results in the following output:
Key ‘name’ has value ‘banana’.
Key ‘type’ has value ‘fruit’.
Key ‘colour’ has value ‘yellow’.
Case study 3 of 3: User’s full name
Since the original Unix the fifth field of a user’s passwd(5) record has been the GECOS field where users’ contact information is stored. Here’s mine:
$ grep ^$USER: /etc/passwd
greg:x:1000:1000:Gregory Chamberlain,,,:/home/greg:/bin/bash
Among the colon-separated fields, you can see that my full name is the
first of four comma-separated values within the
Assigning the first field to user
and the fifth to gecos
, we use
underscores in place of the other fields which we don’t care about.
while IFS=: read -r user _ _ _ gecos _
do [ x"$user" = x"$USER" ] && name="${gecos%%,*}"
done < /etc/passwd
Don’t be tempted to move the < /etc/passwd
file redirection closer
to the read statement; it needs to feed into the while loop itself so
that it progresses over successive lines.
After reading each line, the resulting values of user
and USER
are
tested for equality. It’s important to protect arbitrary
strings from [(1) options parsing by prefixing each with
any ol’ character, in this case x
—otherwise $user
could expand to
something like -z
and cause an error.
[ x"$user" = x"$USER" ]
If the two match, then we know we are looking at the right line and so
we begin parsing the gecos
string for a name
:
name="${gecos%%,*}"
We can’t use the same read(1) trick again because we
already absorbed the line. Anyway, we’re only interested in the first
field so we can just use a greedy suffix pattern in the expansion of
gecos
. You can read about Parameter expansion in
dash(1)
for the details but in short %%,*
means remove everything after
the first comma.
Below is a short script that illustrates how this could be integrated into a larger program. I’ve also thrown in a read(1) command that prompts interactively for a name if one is not found, demonstrating how read(1) can be used outside of a loop as well.
#!/bin/sh
while IFS=: read -r user _ _ _ gecos _
do [ x"$user" = x"$USER" ] && name="${gecos%%,*}"
done < /etc/passwd
if [ "$name" ]
then
printf 'Your real name is %s.\n' "$name"
else
printf 'What’s your name? '
IFS= read -r name
printf 'Hello %s!\n' "$name"
fi
Reasons not to use Bash
Most prominently, Bash scripts are not portable. From Drew Devaults’s Introduction to POSIX shell:
Any shell that utilizes features specific to Bash are not portable, which means you cannot take them with you to any other system. […] This is bad if your users wish to utilize your software anywhere other than GNU/Linux. If your build tooling utilizes bashisms, your software will not build on anything but GNU/Linux. If you ship runtime scripts that use bashisms, your software will not run on anything but GNU/Linux.
He goes on to argue that
you should stick to POSIX shell for your personal scripts, too. You might not care now, but when you feel like flirting with other Unicies you’ll thank me when all of your scripts work.
Also, Bash is monstrously complex; even its man
page
confesses it’s too big and too slow.
And let’s not forget
Shellshock,
the arbitrary code execution vulnerability in Bash responsible for
millions of attacks
on web-facing servers.
See also
- Shell Command Language—part of the POSIX specification published by The Open Group.
- Dylan Arap’s pure sh bible, the sacred holy text for disciples of purely POSIX shell scripting. See also his pash, pfetch and shfm. All MIT licensed.
- Rich’s sh (POSIX shell) tricks clears up common misconceptions and pitfalls.
- Comprehensive POSIX Shell Tutorial by the Grymoire.
- Joining strings in POSIX shell by Chris Lamb.