Testing awk
I’ve been testing wak
against the other awk implementations I’ve been able to obtain. I have recent versions of nnawk
(Kernighan’s One True Awk, the original Unix awk updated), gawk
, mawk
, goawk
, and bbawk
(busybox awk).
As of this writing, the versions are:
nnawk: awk version 20231228 (compiled 2024-01-23)
gawk: GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1 (source 2023-11-02; compiled 2023-11-19)
mawk: mawk 1.3.4 20231102 (compiled 2023-11-16)
goawk: V1.25.0 (compiled 2024-01-23)
bbawk: 2023-12-31 (compiled 2024-01-23)
I’m sure there are plenty of bugs. Brian Kernighan has until recently been maintaining the original Unix awk from the start almost 40 years ago and has a “FIXES” file with over 200 entries from 1987 to 2023, and continuing to the present in a separate file for the “second edition” of One True Awk. If Kernighan has been fixing bugs for 35+ years, I doubt I can make a bug-free awk.
Some “bugs” are incompatible interpretations of awk compared with what other implementations do with certain features. No two versions of awk (original awk, gawk, mawk, goawk, Busybox awk, etc.) agree completely.
Please report bugs to raygard at gmail.com.
Testing strategy
I have used the test files that come with existing awk implementations, plus some I’ve written. The original One True Awk comes with a folder testdir
of about 315 files. Kernighan’s README.TESTS file says there are about 60 small tests p.*
from the first two chapters of The AWK Programming Language (1st ed.) that are basic stuff; about 160 small tests t.*
that are “a random sampling of awk constructions collected over the years. Not organized, but they touch almost everything.”; about 20 tt.*
files that are timing tests.
The testdir
folder has also about 30 T.*
files that are “more systematic tests of specific language features”, but unfortunately these are shell scripts that can test a single awk program to see if it computes correct output as compared with known good data built into the scripts. This makes it difficult to use to compare my implementation against all the others in one pass, but I can run the scripts on wak
separately.
gawk
also comes with a folder of about 1475 files, and most of these are sets of foo.awk
, foo.in
, and foo.ok
files. In each case, the foo.awk
file is run with foo.in
input and the result can be compared with foo.ok
. Some are standalone tests that do not need an input file, so there is sometimes no foo.in
file.
I have a not-very-neat test driver test_awk.py
that I can use to run a batch of tests, such as all t.*
in testdir
, at one time against several awk implementations, and see how they compare. In the case of testdir
’s p.*
and t.*
files, they are intended to use certain input files (test.countries
and test.data
), and the outputs are compared via MD5 hashes. Each unique output is saved for later examination. For the gawk
-style tests, the program can compare the output against the foo.ok
file and give a pass/fail result. If there is a non-zero exit code or an exception, that is noted on the test_awk.py
output. The output looks like this:
======= ======= ======= ======= ======= ======= =======
====versions====>>>> gawk nnawk mawk goawk bbawk tbawk muwak
==== ==== ==== ==== ==== ==== ====
[...]
Test delarpm2.awk dd8e2e5 8841567 NNAWK c992867 gawk GOAWK GOAWK
Test dfacheck1.awk 03a19ad 0000000 NNAWK NNAWK gawk !3a19ad TBAWK
ERR: tbawk: awk: file tests/gawktests/dfacheck1.awk line 1: warning: '\<' -- unknown regex escape
ERR: muwak: muwak: file tests/gawktests/dfacheck1.awk line 1: warning: '\<' -- unknown regex escape
Test double1.awk 8c7dbdf 819e6db gawk 351564a d97b6e5 BBAWK BBAWK
Test double2.awk 4dbdb44 4941b67 gawk 7a665a2 acfcfe0 0124355 TBAWK
Test dtdgport.awk a916caa gawk gawk gawk !!00000 gawk gawk
RET: bbawk: 1
ERR: bbawk: awk: tests/gawktests/dtdgport.awk:37: %*x formats are not supported
The hex values are the first 7 digits of the MD5 of the output file. If the output is an empty file, the MD5 is replaced with all zeroes to make it easier to spot. If any stderr output occurs, the first digit is replaced with a ‘!’; if a non-zero exit code occurs, the second digit is replaced with ‘!’. In either case, the stderr output (ERR:) and exit code (RET:) are printed. These (non-pass-fail, non-timing) reports always display a hash value of the output for the first column (i.e. the first awk version tested). In subsequent columns, if the hash is different from the first column, that hash is listed; but if hash matches a hash from a previous column then the awk version of that column is listed, and if it differs from the first column it is up-cased.
So for example, for delarpm2.awk
, nnawk
has a different output from gawk
, mawk
matches nnawk
, goawk
has yet another different output, bbawk
matches gawk
, and both tbawk
(toybox awk
– my awk for toybox) and muwak
(my awk compiled with musl
libc) match goawk
. For dfacheck1.awk
, gawk
gave some output, nnawk
produced no output, mawk
and goawk
also produced no output (matching nnawk), bbawk
matched gawk
, tbawk
produced the same output as gawk
but had stderr output, and muwak
matched tbawk
, including having stderr output.
The gawk
tests were originally intended to be run via the supplied Makefile
, and some of them use special gawk
options, environment setup, etc., so that when the foo.awk
file is run by test_awk.py
it may not produce correct foo.ok
output even from gawk
. Because of this, I sifted the output from all the gawk
tests against all the awk versions into several parts and moved the tests into corresponding folders: gawktests/allfail
has tests that fail for all versions, including gawk
; gawktests/allpass
has tests that pass for all versions; gawktests/gawkonly
has tests that pass for gawk
and fail for all others (usually because they use gawk-only features); and gawktests
has all the remaining tests.
I also wrote a shell script and awk script to sift the resulting test output files into several categories. I usually have test results in colums for gawk, nnawk, mawk, goawk, bbawk, my awk within toybox (tbawk), and my awk standalone (may be compiled with ASAN sanitizer, or with musl lib, or some other version). The order is significant because I consider my result golden if it matches both gawk and nnawk, still good if it matches gawk or nawk, less good if it matches (only) mawk, goawk or bbawk.
If my awks (last two columns) differ, they go into a set_mismatch
file; that’s usually a result of the difference in random generators, or due to differences (bugs?) in the musl regex functions.
If any “allfail” tests do not all fail, or any “allpass” tests do not all pass, or any “gawkonly” tests do not pass only for gawk, they go into separate files.
If a test is pass/fail, then I put tests that my awk fails into a set_fail
file; if it passes and both gawk and nawk pass, it goes into an set_good
file; if it matches gawk or nawk it goes into a set_gawk
or set_nawk
file, otherwise it goes into a general set_pass
file (or set_passx if it passed but had stderr output or non-zero exit code).
If it’s not pass/fail, then if my result matches gawk and nawk, it goes into a set_good
file, else if it matches gawk it goes into the set_gawk
file; else if matches nawk it goes into the set_nawk
file, else if it matches mawk, goawk, or bbawk it goes into a set_mawk
, set_goawk
, or set_bbawk
file respectively.
If it doesn’t fit into any of those buckets, then it doesn’t match any other implementation, and goes into a set_odd
file. The set_odd
file needs the closest scrutiny, as those are possible bugs in my implementation, though some differ from gawk and/or nawk only in that they have stderr output, usually warnings. Some others are due to different implementations iterating over arrays in different orders. (An annoyance for testing is that goawk doesn’t usually traverse arrays the same way on different runs due to golang’s intentionally random hash behavior that apparently cannot be turned off.)
Currently, I have 30 tests in the set_odd
category out of 1182 tests run. I believe only a relative few of these are actual bugs.
Here is an approximate breakdown of the current test results:
category | count |
---|---|
set_good | 736 |
set_gawk | 87 |
set_nawk | 158 |
set_mawk | 50 |
set_goawk | 8 |
set_bbawk | 10 |
set_pass | 2 |
set_passx | 11 |
set_fail | 47 |
set_badpass | 2 |
set_badfail | 2 |
set_badgawkonly | 1 |
set_mismatch | 38 |
set_odd | 30 |
To put the 47 set_fail
results in perspective, all but two of those are also failed by at least one of gawk, nnawk, mawk, or goawk. Of those two others, both are also failed by bbawk. Still, I would like to make wak/toybox awk work on more of those cases as well.
Also, the tests need some cleanup; there is some overlap and duplication.