aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-07-24Merge branch 'fix-readrec' of https://github.com/mpinjr/awk into stagingozan yigit
2021-07-24Merge branch 'fix-RS' of https://github.com/mpinjr/awk into stagingozan yigit
2021-04-23Fix readrec's definition of a recordMiguel Pineiro Jr
I botched readrec's definition of a record, when I implemented RS regular expression support. This is the relevant hunk from the old diff: ``` - return c == EOF && rr == buf ? 0 : 1; + isrec = *buf || !feof(inf); + dprintf( ("readrec saw <%s>, returns %d\n", buf, isrec) ); + return isrec; ``` Problem #1 Unlike testing with EOF, `*buf || !feof(inf)` is blind to stdio errors. This can cause an infinite loop whose each iteration fabricates an empty record. The following demonstration uses standard terminal access control policy to produce a persistent error condition. Note that the "i/o error" message does not come from readrec(). It's produced much later by closeall() at shutdown. ``` $ trap '' SIGTTIN && awk 'END {print NR}' & [1] 33517 $ # After fg, type ^D $ fg trap '' SIGTTIN && awk 'END {print NR}' 13847376 awk: i/o error occurred on /dev/stdin input record number 13847376, file source line number 1 ``` Each time awk tries to read the terminal from the background, while ignoring SIGTTIN, the read fails with EIO, getc returns EOF, the stream's end-of-file indicator remains clear, and `!feof` erroneously promotoes the empty buffer to an empty record. So long as the error persists, the stream's position does not advance and end-of-file is never set. Problem #2: When RS is a regex, `*buf || !feof(inf)` can't see an empty record's terminator at the end of a stream. ``` $ echo a | awk 1 RS='a\n' $ ``` That pipeline should have found one empty record and printed a blank line, but `*buf || !feof(inf)` considers reaching the end of the stream the conclusion of a fruitless search. That's only correct when the terminator is a single character, because a regex RS search can set the end-of-file marker even when it succeeds. The Fix `isrec` must be 0 **iff** no record is found. The correct definition of "no record" is a failure to find a record terminator and a failure to find any data (possibly from a final, unterminated record). Conceptually, for any RS: ``` isrec = (noTERM && noDATA) ? 0 : 1 ``` noDATA is an expression that's true if `buf` is empty, false otherwise. When RS is null or a single character, noTERM is an expression that is true when the sought after character is not found, false otherwise. Since the search for a single character can only end with that character or EOF, noTERM is `c == EOF`. ``` isrec = (c == EOF && rr == buf) ? 0 : 1 ``` When RS is a regular expression: noTERM is an expression that is true if a match for RS is not found, false otherwise. This is simply the inverse of the result of the function that conducts the search, `!found`. ``` isrec = (found == 0 && *buf == '\0') ? 0 : 1 ```
2021-04-16Fix regular expression RS ^-anchoringMiguel Pineiro Jr
RS ^-anchoring needs to know if it's reading the first record of a file. Unfortunately, innew, the flag that the main i/o loop uses to track this, didn't make it from NetBSD unscathed. This commit restores the last of the wayward lines. Without this fix, when reading the first record of an input file named on the command line, the regular expression machinery will be misconfigured, precluding a successful match. Relevant commits: 1. 643a5a3dad633431c6ce8831944c23059a6be309 (Initial import) 2. ffee7780fe08fa77f662a0903477545d9e26334f (Restoring innew)
2021-03-02Fix size computation in replace_repeat() for special_case REPEAT_WITH_Q.Todd C. Miller
This resulted in the NUL terminator being written to the end of the buffer which was not the same as the end of the string. That in turn caused garbage bytes from malloc() to be processed. Also change the NUL termination to be less error prone by writing the NUL immediately after the last byte copied. Reproducible with the following under valgrind: echo '#!/usr/bin/awk' | awk \ '/^#! ?\/.*\/[a-z]{0,2}awk/ {sub(/^#! ?\/.*\/[a-z]{0,2}awk/,"#! awk"); print}'
2021-02-15Fix compiling with g++.Arnold D. Robbins
2021-01-10Change T.errmsg print to file fail test.ozan s. yigit
We cannot have a test that destroys eg. /etc/passwd if someone runs it as root.
2021-01-06Fix a decision bug with trailing stuff in lib.c:is_valid_numberozan s. yigit
after dec 18 changes. updated FIXES, adjusted version date.
2020-12-25Merge branch 'staging' for README.mdozan s. yigit
2020-12-25updated: new maintainerozan s. yigit
2020-12-18Inf and NaN values fixed and printing improved. "This time for sure!"Arnold D. Robbins
2020-12-15Update FIXES and version.Arnold D. Robbins
2020-12-15Include <strings.h> for strcasecmp (#99)Michael Forney
Though some implementations include this header indirectly through string.h by default, the POSIX header that declares strcasecmp is strings.h[0]. [0] https://pubs.opengroup.org/onlinepubs/9699919799/functions/strcasecmp.html
2020-12-08Update FIXES and version in main.c.Arnold D. Robbins
2020-12-08Rework floating point conversions. (#98)Arnold Robbins
2020-12-03Update version and FIXES.Arnold D. Robbins
2020-12-03Don't print extra newlines on error before awk starts parsing. (#97)Todd C. Miller
If awk prints an error message while when compile_time is still set to ERROR_PRINTING, don't try to print the context since there is none. This can happen due to a problem with, e.g., unknown command line options.
2020-11-24Add .TF macro to man page. Closes Issue #96.Arnold D. Robbins
2020-10-13Make it compile with g++.Arnold D. Robbins
2020-08-16Additional fixes for DJGPP.Arnold D. Robbins
2020-08-07Update FIXES and version in main.c.Arnold D. Robbins
2020-08-07printf: The argument p shall be a pointer to void. (#93)Chris
2020-08-04Fix Issue #92; see FIXES.Arnold D. Robbins
2020-07-30Update version and FIXES.Arnold D. Robbins
2020-07-30Move exclusively to bison as parser generator.Arnold D. Robbins
2020-07-29Avoid accessing pfile[] out of bounds on syntax error at EOF. (#90)Todd C. Miller
When awk reaches EOF parsing the program file, curpfile is incremented. However, cursource() uses curpfile without checking it against npfile which can cause an out of bounds access of pfile[] if there is a syntax error at the end of the program file.
2020-07-29Fix the T.errmsg test (#91)Tim van der Molen
Co-authored-by: Tim van der Molen <tim@kariliq.nl>
2020-07-29Cast to uschar when storing a char in an int that will be used as an index (#88)Todd C. Miller
* Cast to uschar when storing a char in an int that will be used as an index. Fixes a heap underflow when the input char has the high bit set and FS is a regex. * Add regress test for underflow when RS is a regex and input is 8-bit.
2020-07-27Avoid using stdio streams after they have been closed. (#89)Todd C. Miller
* In closeall(), skip stdin and flush std{err,out} instead of closing. Otherwise awk could fclose(stdin) twice (it may appear more than once) and closing stderr means awk cannot report errors with other streams. For example, "awk 'BEGIN { getline < "-" }' < /dev/null" will call fclose(stdin) twice, with undefined results. * If closefile() is called on std{in,out,err}, freopen() /dev/null instead. Otherwise, awk will continue trying to perform I/O on a closed stdio stream, the behavior of which is undefined.
2020-07-02Add a note about low-level maintenance.Arnold D. Robbins
2020-07-02Add regression script for bugs-fixed directory.Arnold D. Robbins
2020-07-02Fix regression with changed SUBSEP in subscript (#86)Tim van der Molen
Commit 0d8778bbbb415810bfd126694a38b6bf4b71e79c reintroduced a regression that was fixed in commit 97a4b7ed215ae6446d13fe0eab15b5b3ae4da7da. The length of SUBSEP needs to be rechecked after calling execute(), in case SUBSEP itself has been changed. Co-authored-by: Tim van der Molen <tim@kariliq.nl>
2020-07-02Fix concatenation regression (#85)Tim van der Molen
The optimization in commit 1d6ddfd9c0ed7119c0601474cded3a385068e886 reintroduced the regression that was fixed in commit e26237434fb769d9c1ea239101eb5b24be588eea. Co-authored-by: Tim van der Molen <tim@kariliq.nl>
2020-06-25Update FIXES and date in main.c.Arnold D. Robbins
2020-06-25Merge branch 'staging'Arnold D. Robbins
2020-06-25Fix onetrueawk#83 (#84)awkfan77
2020-06-25Rename dprintf to DPRINTF and use C99 cpp variadic arguments. (#82)Todd C. Miller
POSIX specifies a dprintf function that operates on an fd instead of a stdio stream. Using upper case for macros is more idiomatic too. We no longer need to use an extra set of parentheses for debugging printf statements.
2020-06-12Fix Issue 78 and apply PR 80.Arnold D. Robbins
2020-06-12Clear errno before using errcheck() to avoid spurious errors. (#80)Todd C. Miller
The errcheck() function treats an errno of ERANGE or EDOM as something to report, so make sure errno is set to zero before invoking a function to check so that a previous such errno value won't result in a false positive. This could happen simply due to input line fields that looked enough like floating-point input to trigger ERANGE. Reported by Jordan Geoghegan, fix from Philip Guenther.
2020-06-05In fldbld(), check that inputFS is set.Arnold D. Robbins
2020-05-15Fix test for use of noreturn.Arnold D. Robbins
2020-04-16Fix noreturn for old compilers.Arnold D. Robbins
2020-04-05Update FIXES and version date.Arnold D. Robbins
2020-04-05Replace __attribute__((__noreturn__)) with _Noreturn. (#77)awkfan77
* Replace __attribute__((__noreturn__)) with _Noreturn. * Change _Noreturn to noreturn and #include <stdnoreturn.h>
2020-02-28Fixes from Christo Zoulas.Arnold D. Robbins
2020-02-28Merge branch 'master' into stagingArnold D. Robbins
2020-02-283 more fixes (#75)zoulasc
* LC_NUMERIC radix issue. According to https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html The period character is the character recognized in processing awk programs. Make it so that during output we also print the period character, since this is what other awk implementations do, and it makes sense from an interoperability point of view. * print "T.builtin" in the error message * Fix backslash continuation line handling. * Keep track of RS processing so we apply the regex properly only once per record.
2020-02-28Fix hwasan global overflow. (#76)enh-google
* Fix hwasan global overflow. Crash found with https://source.android.com/devices/tech/debug/hwasan but also detectable by regular ASan. Here's an ASan crash: ==215690==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55d90f8da140 at pc 0x55d90f8b7503 bp 0x7ffd3dae6100 sp 0x7ffd3dae60f8 READ of size 4 at 0x55d90f8da140 thread T0 #0 0x55d90f8b7502 in word /tmp/awk/lex.c:496 #1 0x55d90f8b939f in yylex /tmp/awk/lex.c:191 #2 0x55d90f894ab9 in yyparse /tmp/awk/awkgram.tab.c:2366 #3 0x55d90f89edc2 in main /tmp/awk/main.c:216 #4 0x7ff263a78bba in __libc_start_main ../csu/libc-start.c:308 #5 0x55d90f8945a9 in _start (/tmp/awk/a.out+0x115a9) 0x55d90f8da141 is located 0 bytes to the right of global variable 'infunc' defined in 'awkgram.y:35:6' (0x55d90f8da140) of size 1 SUMMARY: AddressSanitizer: global-buffer-overflow /tmp/awk/lex.c:496 in word Shadow bytes around the buggy address: 0x0abba1f133d0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0abba1f133e0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0abba1f133f0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0abba1f13400: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00 0x0abba1f13410: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 =>0x0abba1f13420: 04 f9 f9 f9 f9 f9 f9 f9[01]f9 f9 f9 f9 f9 f9 f9 0x0abba1f13430: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 0x0abba1f13440: 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0abba1f13450: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0abba1f13460: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0abba1f13470: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 And here's the stack trace from hwasan: Stack Trace: RELADDR FUNCTION FILE:LINE 00000000000168d4 word external/one-true-awk/lex.c:496:18 000000000002d1ec yyparse y.tab.c:2460:16 000000000001c82c main external/one-true-awk/main.c:179:2 00000000000b41a0 __libc_init bionic/libc/bionic/libc_init_dynamic.cpp:151:8 As it says, we're doing a 4-byte read from a 1-byte global. `infunc` is declared as an int but defined as a bool. Signed-off-by: Evgenii Stepanov <eugenis@google.com> * Add ASan cflags to makefile. They're not used by default, but this way they're easily to hand next time they're wanted.
2020-02-20Small fix to the man page.Arnold D. Robbins
2020-02-19Update FIXES, version.Arnold D. Robbins