Age | Commit message (Collapse) | Author |
|
|
|
|
|
I botched readrec's definition of a record, when I implemented
RS regular expression support. This is the relevant hunk from the
old diff:
```
- return c == EOF && rr == buf ? 0 : 1;
+ isrec = *buf || !feof(inf);
+ dprintf( ("readrec saw <%s>, returns %d\n", buf, isrec) );
+ return isrec;
```
Problem #1
Unlike testing with EOF, `*buf || !feof(inf)` is blind to stdio
errors. This can cause an infinite loop whose each iteration fabricates
an empty record.
The following demonstration uses standard terminal access control
policy to produce a persistent error condition. Note that the "i/o
error" message does not come from readrec(). It's produced much later
by closeall() at shutdown.
```
$ trap '' SIGTTIN && awk 'END {print NR}' &
[1] 33517
$ # After fg, type ^D
$ fg
trap '' SIGTTIN && awk 'END {print NR}'
13847376
awk: i/o error occurred on /dev/stdin
input record number 13847376, file
source line number 1
```
Each time awk tries to read the terminal from the background,
while ignoring SIGTTIN, the read fails with EIO, getc returns EOF,
the stream's end-of-file indicator remains clear, and `!feof`
erroneously promotoes the empty buffer to an empty record. So long
as the error persists, the stream's position does not advance and
end-of-file is never set.
Problem #2:
When RS is a regex, `*buf || !feof(inf)` can't see an empty record's
terminator at the end of a stream.
```
$ echo a | awk 1 RS='a\n'
$
```
That pipeline should have found one empty record and printed a blank
line, but `*buf || !feof(inf)` considers reaching the end of the
stream the conclusion of a fruitless search. That's only correct when
the terminator is a single character, because a regex RS search can
set the end-of-file marker even when it succeeds.
The Fix
`isrec` must be 0 **iff** no record is found. The correct definition
of "no record" is a failure to find a record terminator and a
failure to find any data (possibly from a final, unterminated
record). Conceptually, for any RS:
```
isrec = (noTERM && noDATA) ? 0 : 1
```
noDATA is an expression that's true if `buf` is empty, false otherwise.
When RS is null or a single character, noTERM is an expression
that is true when the sought after character is not found, false
otherwise. Since the search for a single character can only end with
that character or EOF, noTERM is `c == EOF`.
```
isrec = (c == EOF && rr == buf) ? 0 : 1
```
When RS is a regular expression: noTERM is an expression that is
true if a match for RS is not found, false otherwise. This is simply
the inverse of the result of the function that conducts the search,
`!found`.
```
isrec = (found == 0 && *buf == '\0') ? 0 : 1
```
|
|
RS ^-anchoring needs to know if it's reading the first record of a file.
Unfortunately, innew, the flag that the main i/o loop uses to track
this, didn't make it from NetBSD unscathed. This commit restores the
last of the wayward lines.
Without this fix, when reading the first record of an input file named
on the command line, the regular expression machinery will be
misconfigured, precluding a successful match.
Relevant commits:
1. 643a5a3dad633431c6ce8831944c23059a6be309 (Initial import)
2. ffee7780fe08fa77f662a0903477545d9e26334f (Restoring innew)
|
|
This resulted in the NUL terminator being written to the end of the
buffer which was not the same as the end of the string. That in
turn caused garbage bytes from malloc() to be processed. Also
change the NUL termination to be less error prone by writing the
NUL immediately after the last byte copied.
Reproducible with the following under valgrind:
echo '#!/usr/bin/awk' | awk \
'/^#! ?\/.*\/[a-z]{0,2}awk/ {sub(/^#! ?\/.*\/[a-z]{0,2}awk/,"#! awk"); print}'
|
|
|
|
We cannot have a test that destroys eg. /etc/passwd if someone
runs it as root.
|
|
after dec 18 changes. updated FIXES, adjusted version date.
|
|
|
|
|
|
|
|
|
|
Though some implementations include this header indirectly through
string.h by default, the POSIX header that declares strcasecmp is
strings.h[0].
[0] https://pubs.opengroup.org/onlinepubs/9699919799/functions/strcasecmp.html
|
|
|
|
|
|
|
|
If awk prints an error message while when compile_time is still set
to ERROR_PRINTING, don't try to print the context since there is
none. This can happen due to a problem with, e.g., unknown command
line options.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When awk reaches EOF parsing the program file, curpfile is incremented.
However, cursource() uses curpfile without checking it against npfile
which can cause an out of bounds access of pfile[] if there is a syntax
error at the end of the program file.
|
|
Co-authored-by: Tim van der Molen <tim@kariliq.nl>
|
|
* Cast to uschar when storing a char in an int that will be used as an index.
Fixes a heap underflow when the input char has the high bit set and
FS is a regex.
* Add regress test for underflow when RS is a regex and input is 8-bit.
|
|
* In closeall(), skip stdin and flush std{err,out} instead of closing.
Otherwise awk could fclose(stdin) twice (it may appear more than once)
and closing stderr means awk cannot report errors with other streams.
For example, "awk 'BEGIN { getline < "-" }' < /dev/null" will call
fclose(stdin) twice, with undefined results.
* If closefile() is called on std{in,out,err}, freopen() /dev/null instead.
Otherwise, awk will continue trying to perform I/O on a closed stdio
stream, the behavior of which is undefined.
|
|
|
|
|
|
Commit 0d8778bbbb415810bfd126694a38b6bf4b71e79c reintroduced a
regression that was fixed in commit
97a4b7ed215ae6446d13fe0eab15b5b3ae4da7da. The length of SUBSEP needs to
be rechecked after calling execute(), in case SUBSEP itself has been
changed.
Co-authored-by: Tim van der Molen <tim@kariliq.nl>
|
|
The optimization in commit 1d6ddfd9c0ed7119c0601474cded3a385068e886
reintroduced the regression that was fixed in commit
e26237434fb769d9c1ea239101eb5b24be588eea.
Co-authored-by: Tim van der Molen <tim@kariliq.nl>
|
|
|
|
|
|
|
|
POSIX specifies a dprintf function that operates on an fd instead of
a stdio stream. Using upper case for macros is more idiomatic too.
We no longer need to use an extra set of parentheses for debugging
printf statements.
|
|
|
|
The errcheck() function treats an errno of ERANGE or EDOM as something
to report, so make sure errno is set to zero before invoking a
function to check so that a previous such errno value won't result
in a false positive. This could happen simply due to input line fields
that looked enough like floating-point input to trigger ERANGE.
Reported by Jordan Geoghegan, fix from Philip Guenther.
|
|
|
|
|
|
|
|
|
|
* Replace __attribute__((__noreturn__)) with _Noreturn.
* Change _Noreturn to noreturn and #include <stdnoreturn.h>
|
|
|
|
|
|
* LC_NUMERIC radix issue.
According to https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html
The period character is the character recognized in processing awk
programs. Make it so that during output we also print the period
character, since this is what other awk implementations do, and it
makes sense from an interoperability point of view.
* print "T.builtin" in the error message
* Fix backslash continuation line handling.
* Keep track of RS processing so we apply the regex properly only once
per record.
|
|
* Fix hwasan global overflow.
Crash found with https://source.android.com/devices/tech/debug/hwasan
but also detectable by regular ASan. Here's an ASan crash:
==215690==ERROR: AddressSanitizer: global-buffer-overflow on address
0x55d90f8da140 at pc 0x55d90f8b7503 bp 0x7ffd3dae6100 sp 0x7ffd3dae60f8
READ of size 4 at 0x55d90f8da140 thread T0
#0 0x55d90f8b7502 in word /tmp/awk/lex.c:496
#1 0x55d90f8b939f in yylex /tmp/awk/lex.c:191
#2 0x55d90f894ab9 in yyparse /tmp/awk/awkgram.tab.c:2366
#3 0x55d90f89edc2 in main /tmp/awk/main.c:216
#4 0x7ff263a78bba in __libc_start_main ../csu/libc-start.c:308
#5 0x55d90f8945a9 in _start (/tmp/awk/a.out+0x115a9)
0x55d90f8da141 is located 0 bytes to the right of global variable
'infunc' defined in 'awkgram.y:35:6' (0x55d90f8da140) of size 1
SUMMARY: AddressSanitizer: global-buffer-overflow /tmp/awk/lex.c:496 in word
Shadow bytes around the buggy address:
0x0abba1f133d0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x0abba1f133e0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x0abba1f133f0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x0abba1f13400: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
0x0abba1f13410: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9
=>0x0abba1f13420: 04 f9 f9 f9 f9 f9 f9 f9[01]f9 f9 f9 f9 f9 f9 f9
0x0abba1f13430: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9
0x0abba1f13440: 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9
0x0abba1f13450: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9
0x0abba1f13460: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9
0x0abba1f13470: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9
And here's the stack trace from hwasan:
Stack Trace:
RELADDR FUNCTION FILE:LINE
00000000000168d4 word external/one-true-awk/lex.c:496:18
000000000002d1ec yyparse y.tab.c:2460:16
000000000001c82c main external/one-true-awk/main.c:179:2
00000000000b41a0 __libc_init bionic/libc/bionic/libc_init_dynamic.cpp:151:8
As it says, we're doing a 4-byte read from a 1-byte global.
`infunc` is declared as an int but defined as a bool.
Signed-off-by: Evgenii Stepanov <eugenis@google.com>
* Add ASan cflags to makefile.
They're not used by default, but this way they're easily to hand next
time they're wanted.
|
|
|
|
|