aboutsummaryrefslogtreecommitdiff
path: root/BUGS
diff options
context:
space:
mode:
Diffstat (limited to 'BUGS')
-rw-r--r--BUGS133
1 files changed, 133 insertions, 0 deletions
diff --git a/BUGS b/BUGS
new file mode 100644
index 0000000..138f3cd
--- /dev/null
+++ b/BUGS
@@ -0,0 +1,133 @@
+* ABOUT BUGS
+
+Before reporting a bug, please check the list of known bugs
+and the list of oft-reported non-bugs (below).
+
+Bugs and comments may be sent to bonzini@gnu.org; please
+include in the Subject: header the first line of the output of
+``sed --version''.
+
+Please do not send a bug report like this:
+
+ [while building frobme-1.3.4]
+ $ configure
+ sed: file sedscr line 1: Unknown option to 's'
+
+If sed doesn't configure your favorite package, take a few extra
+minutes to identify the specific problem and make a stand-alone test
+case.
+
+A stand-alone test case includes all the data necessary to perform the
+test, and the specific invocation of sed that causes the problem. The
+smaller a stand-alone test case is, the better. A test case should
+not involve something as far removed from sed as ``try to configure
+frobme-1.3.4''. Yes, that is in principle enough information to look
+for the bug, but that is not a very practical prospect.
+
+
+
+* NON-BUGS
+
+`N' command on the last line
+
+ Most versions of sed exit without printing anything when the `N'
+ command is issued on the last line of a file. GNU sed instead
+ prints pattern space before exiting unless of course the `-n'
+ command switch has been specified. More information on the reason
+ behind this choice can be found in the Info manual.
+
+
+regex syntax clashes (problems with backslashes)
+
+ sed uses the Posix basic regular expression syntax. According to
+ the standard, the meaning of some escape sequences is undefined in
+ this syntax; notable in the case of GNU sed are `\|', `\+', `\?',
+ `\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'.
+
+ As in all GNU programs that use Posix basic regular expressions, sed
+ interprets these escape sequences as meta-characters. So, `x\+'
+ matches one or more occurrences of `x'. `abc\|def' matches either
+ `abc' or `def'.
+
+ This syntax may cause problems when running scripts written for other
+ seds. Some sed programs have been written with the assumption that
+ `\|' and `\+' match the literal characters `|' and `+'. Such scripts
+ must be modified by removing the spurious backslashes if they are to
+ be used with recent versions of sed (not only GNU sed).
+
+ On the other hand, some scripts use `s|abc\|def||g' to remove occurrences
+ of _either_ `abc' or `def'. While this worked until sed 4.0.x, newer
+ versions interpret this as removing the string `abc|def'. This is
+ again undefined behavior according to POSIX, but this interpretation
+ is arguably more robust: the older one, for example, required that
+ the regex matcher parsed `\/' as `/' in the common case of escaping
+ a slash, which is again undefined behavior; the new behavior avoids
+ this, and this is good because the regex matcher is only partially
+ under our control.
+
+ In addition, GNU sed supports several escape characters (some of
+ which are multi-character) to insert non-printable characters
+ in scripts (`\a', `\c', `\d', `\o', `\r', `\t', `\v', `\x'). These
+ can cause similar problems with scripts written for other seds.
+
+
+-i clobbers read-only files
+
+ In short, `sed d -i' will let one delete the contents of
+ a read-only file, and in general the `-i' option will let
+ one clobber protected files. This is not a bug, but rather a
+ consequence of how the Unix filesystem works.
+
+ The permissions on a file say what can happen to the data
+ in that file, while the permissions on a directory say what can
+ happen to the list of files in that directory. `sed -i'
+ will not ever open for writing a file that is already on disk,
+ rather, it will work on a temporary file that is finally renamed
+ to the original name: if you rename or delete files, you're actually
+ modifying the contents of the directory, so the operation depends on
+ the permissions of the directory, not of the file). For this same
+ reason, sed will not let one use `-i' on a writeable file in a
+ read-only directory, and will break hard or symbolic links when
+ `-i' is used on such a file.
+
+
+`0a' does not work (gives an error)
+
+ There is no line 0. 0 is a special address that is only used to treat
+ addresses like `0,/RE/' as active when the script starts: if you
+ write `1,/abc/d' and the first line includes the word `abc', then
+ that match would be ignored because address ranges must span at least
+ two lines (barring the end of the file); but what you probably wanted is
+ to delete every line up to the first one including `abc', and this
+ is obtained with `0,/abc/d'.
+
+
+`[a-z]' is case insensitive
+`s/.*//' does not clear pattern space
+
+ You are encountering problems with locales. POSIX mandates that `[a-z]'
+ uses the current locale's collation order -- in C parlance, that means
+ strcoll(3) instead of strcmp(3). Some locales have a case insensitive
+ strcoll, others don't.
+
+ Another problem is that [a-z] tries to use collation symbols. This
+ only happens if you are on the GNU system, using GNU libc's regular
+ expression matcher instead of compiling the one supplied with GNU sed.
+ In a Danish locale, for example, the regular expression `^[a-z]$'
+ matches the string `aa', because `aa' is a single collating symbol that
+ comes after `a' and before `b'; `ll' behaves similarly in Spanish
+ locales, or `ij' in Dutch locales.
+
+ Another common localization-related problem happens if your input stream
+ includes invalid multibyte sequences. POSIX mandates that such
+ sequences are _not_ matched by `.', so that `s/.*//' will not clear
+ pattern space as you would expect. In fact, there is no way to clear
+ sed's buffers in the middle of the script in most multibyte locales
+ (including UTF-8 locales). For this reason, GNU sed provides a `z'
+ command (for `zap') as an extension.
+
+ However, to work around both of these problems, which may cause bugs
+ in shell scripts, you can set the LC_ALL environment variable to `C',
+ or set the locale on a more fine-grained basis with the other LC_*
+ environment variables.
+