-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sed #2768
Add sed #2768
Conversation
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2768 +/- ##
==========================================
+ Coverage 75.17% 75.19% +0.01%
==========================================
Files 432 433 +1
Lines 42933 43040 +107
==========================================
+ Hits 32275 32363 +88
- Misses 10658 10677 +19 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend to follow the same pattern as we do for example grep command, add support of BRE and add more tests.
cmds/exp/sed/sed.go
Outdated
for r.Scan() { | ||
line := r.Text() | ||
for _, transform := range cfg.transforms { | ||
line = strings.ReplaceAll(line, transform.from, transform.to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sed command should support BRE, here it just a string replace. See https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/sed.html
So sed -e 's/m.ngo/apple/' FILE
will not work as expected.
I'm not a sed/ed expert but replaceAll should happen if command is like s/p1/p2/g
in keys of s/old/new/
it should replace only first occurrence per line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. only replace all on a line with g otion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do someting like this
f :=- strings.Replace
if global
f = strings.ReplaceAll
and consider passing f as a parameter to transformCopy, which makes testing easier, as f can something other than strings.Replace.
cmds/exp/sed/sed.go
Outdated
inplace bool | ||
} | ||
|
||
var cfg = config{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid global variables in command flags it makes testing hard. See /cmds/core/grep
for reference. It's better to create a structure with command flags and other necessary things. This will simplify testing significantly because it you no longer need to unwind global variables.
cmds/exp/sed/sed.go
Outdated
return fmt.Sprint(*t) | ||
} | ||
|
||
func (t *transforms) Set(value string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe sed command parse should be more complicated:
for example this valid command doesn't work:
./sed -e 's/m.ngo/apple/' FILE
Also ./sed 's/mango/apple' FILE
don't work either, but it should according to POSIX spec.
So probably the whole sed command parser is complicated, so it's ok to start small. But it need to be aware of regular expression, so I don't think strings.Split will be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starting small is fine, right now we have zero :-)
cmds/exp/sed/sed_test.go
Outdated
"testing" | ||
) | ||
|
||
type nopCloser struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
standard library io package has NopCloser(r Reader) ReadCloser
cmds/exp/sed/sed.go
Outdated
fiDir := path.Dir(fiName) | ||
ftmp, err := os.CreateTemp(fiDir, "sed*.txt") | ||
if err != nil { | ||
fmt.Printf("unable to create temp file: %#v\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to return errors rather than os.Exit or doing log.Fatalf
cmds/exp/sed/sed.go
Outdated
if cfg.inplace { | ||
fiName := readStreams[i].(*os.File).Name() | ||
fiDir := path.Dir(fiName) | ||
ftmp, err := os.CreateTemp(fiDir, "sed*.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to create temp file? We probably can rewrite it in place, also if something go wrong we need to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure rewrite in place is an option. If there is a failure at any point, and we have gotten part way through a rewrite in place, that will be very bad. At the same time, reading the entire file in is not really an option, as on limited memory systems it might not fit.
here is what real sed does on linux:
1240935 futex(0x7fefb0c1aa6c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
1240935 openat(AT_FDCWD, "/tmp/hosts", O_RDONLY) = 3
1240935 ioctl(3, TCGETS, 0x7ffc2eb5c010) = -1 ENOTTY (Inappropriate ioctl for device)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 umask(077) = 002
1240935 openat(AT_FDCWD, "/tmp/sedyrouyD", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
1240935 umask(002) = 077
1240935 fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 read(3, "127.0.0.1\tlocalhost\n::1\t\tlocalhost\n127.0.1.1\tpop-os.localdomain\tpop-os\n\n192.168.0.1 centre usbadapter\n192.168.0.2 visionfive2 u6ccf39004821 u6ccf39004820 u7eda2b555f3a\n192.168.0.3 vf3 u6ccf39003f6b u6cc
f39003f6a\n\n# windows craptop\n192.168.1.2 galaxy g u00e04c680a0f\n", 4096) = 267
1240935 newfstatat(4, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508433, st_mode=S_IFREG|0600, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=0, st_size=0, st_atime=1695668175 /* 2023-09-25T11:56:15.182309592-0700
*/, st_atime_nsec=182309592, st_mtime=1695668175 /* 2023-09-25T11:56:15.182309592-0700 */, st_mtime_nsec=182309592, st_ctime=1695668175 /* 2023-09-25T11:56:15.182309592-0700 */, st_ctime_nsec=182309592}, AT_EMPTY_PATH) = 0
1240935 read(3, "", 4096) = 0
1240935 fchown(4, 1000, 1000) = 0
1240935 fgetxattr(3, "system.posix_acl_access", 0x7ffc2eb5bfb0, 132) = -1 ENODATA (No data available)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 fsetxattr(4, "system.posix_acl_access", "\2\0\0\0\1\0\6\0\377\377\377\377\4\0\4\0\377\377\377\377 \0\4\0\377\377\377\377", 28, 0) = 0
1240935 close(3) = 0
1240935 write(4, "127.0.0.1\tlocblhost\n::1\t\tlocblhost\n127.0.1.1\tpop-os.locbldomain\tpop-os\n\n192.168.0.1 centre usbbdapter\n192.168.0.2 visionfive2 u6ccf39004821 u6ccf39004820 u7edb2b555f3a\n192.168.0.3 vf3 u6ccf39003f6b u6c
cf39003f6b\n\n# windows crbptop\n192.168.1.2 gblaxy g u00e04c680a0f\n", 267) = 267
1240935 close(4) = 0
1240935 rename("/tmp/sedyrouyD", "/tmp/hosts") = 0
1240935 close(1) = 0
1240935 exit_group(0) = ?
I suspect this is the best compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You right, sed doing exactly that. Also on linux if I run sed 's/a/b/' file > file
I got an empty file at the end, it's kinda surprising result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is but it isn't.
The shell evaluates the command and the redirect, so the first thing that happens, after parsing, is the shell truncates file to 0 bytes (> file) and then runs sed on it :-)
It's something we could wish worked differently, but fixing it is harder than telling people not to do it :-)
cmds/exp/sed/sed.go
Outdated
} | ||
|
||
if err := w.Flush(); err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to panic it's better to return error and then log.Fatalf like we do in other commands.
thanks very much for this contribution. Please make the requests from binjip978, they are all very sensible, and we would like to have this command! thanks! |
cmds/exp/sed/sed.go
Outdated
if cfg.inplace { | ||
fiName := readStreams[i].(*os.File).Name() | ||
fiDir := path.Dir(fiName) | ||
ftmp, err := os.CreateTemp(fiDir, "sed*.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure rewrite in place is an option. If there is a failure at any point, and we have gotten part way through a rewrite in place, that will be very bad. At the same time, reading the entire file in is not really an option, as on limited memory systems it might not fit.
here is what real sed does on linux:
1240935 futex(0x7fefb0c1aa6c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
1240935 openat(AT_FDCWD, "/tmp/hosts", O_RDONLY) = 3
1240935 ioctl(3, TCGETS, 0x7ffc2eb5c010) = -1 ENOTTY (Inappropriate ioctl for device)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 umask(077) = 002
1240935 openat(AT_FDCWD, "/tmp/sedyrouyD", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
1240935 umask(002) = 077
1240935 fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 read(3, "127.0.0.1\tlocalhost\n::1\t\tlocalhost\n127.0.1.1\tpop-os.localdomain\tpop-os\n\n192.168.0.1 centre usbadapter\n192.168.0.2 visionfive2 u6ccf39004821 u6ccf39004820 u7eda2b555f3a\n192.168.0.3 vf3 u6ccf39003f6b u6cc
f39003f6a\n\n# windows craptop\n192.168.1.2 galaxy g u00e04c680a0f\n", 4096) = 267
1240935 newfstatat(4, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508433, st_mode=S_IFREG|0600, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=0, st_size=0, st_atime=1695668175 /* 2023-09-25T11:56:15.182309592-0700
*/, st_atime_nsec=182309592, st_mtime=1695668175 /* 2023-09-25T11:56:15.182309592-0700 */, st_mtime_nsec=182309592, st_ctime=1695668175 /* 2023-09-25T11:56:15.182309592-0700 */, st_ctime_nsec=182309592}, AT_EMPTY_PATH) = 0
1240935 read(3, "", 4096) = 0
1240935 fchown(4, 1000, 1000) = 0
1240935 fgetxattr(3, "system.posix_acl_access", 0x7ffc2eb5bfb0, 132) = -1 ENODATA (No data available)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 fsetxattr(4, "system.posix_acl_access", "\2\0\0\0\1\0\6\0\377\377\377\377\4\0\4\0\377\377\377\377 \0\4\0\377\377\377\377", 28, 0) = 0
1240935 close(3) = 0
1240935 write(4, "127.0.0.1\tlocblhost\n::1\t\tlocblhost\n127.0.1.1\tpop-os.locbldomain\tpop-os\n\n192.168.0.1 centre usbbdapter\n192.168.0.2 visionfive2 u6ccf39004821 u6ccf39004820 u7edb2b555f3a\n192.168.0.3 vf3 u6ccf39003f6b u6c
cf39003f6b\n\n# windows crbptop\n192.168.1.2 gblaxy g u00e04c680a0f\n", 267) = 267
1240935 close(4) = 0
1240935 rename("/tmp/sedyrouyD", "/tmp/hosts") = 0
1240935 close(1) = 0
1240935 exit_group(0) = ?
I suspect this is the best compromise.
cmds/exp/sed/sed.go
Outdated
for r.Scan() { | ||
line := r.Text() | ||
for _, transform := range cfg.transforms { | ||
line = strings.ReplaceAll(line, transform.from, transform.to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. only replace all on a line with g otion.
cmds/exp/sed/sed.go
Outdated
for r.Scan() { | ||
line := r.Text() | ||
for _, transform := range cfg.transforms { | ||
line = strings.ReplaceAll(line, transform.from, transform.to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do someting like this
f :=- strings.Replace
if global
f = strings.ReplaceAll
and consider passing f as a parameter to transformCopy, which makes testing easier, as f can something other than strings.Replace.
cmds/exp/sed/sed.go
Outdated
return fmt.Sprint(*t) | ||
} | ||
|
||
func (t *transforms) Set(value string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starting small is fine, right now we have zero :-)
cmds/exp/sed/sed.go
Outdated
writeStreams := make([]io.WriteCloser, l) | ||
for i := range writeStreams { | ||
if cfg.inplace { | ||
fiName := readStreams[i].(*os.File).Name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of cast is a bit dangerous. Not all ReadClosers are os.File.
Passing ReadCloser is good for testing.
options:
implement your own type that implements ReadCloser and provides Name()
fiName, ok := readStreams[i].(*os.File).Name()
if ! ok {
return fmt.Errorf("can not do in-place rewrite on %T:%w", readStreams[i], os.ErrInvalid) // use %w to make tests easier
}
cmds/exp/sed/sed.go
Outdated
func createWriteStreams(cfg config, readStreams []io.ReadCloser) []io.WriteCloser { | ||
l := len(readStreams) | ||
writeStreams := make([]io.WriteCloser, l) | ||
for i := range writeStreams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do:
for i, r := range readStreams {
the code might be easier to write/read.
cmds/exp/sed/sed.go
Outdated
for _, filename := range flag.Args() { | ||
fh, err := os.Open(filename) | ||
if err != nil { | ||
fmt.Printf("unable to open input stream: %s. %#v\n", filename, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use log.Fatalf instead for fmt.Printf follow by os.Exit. It's the same thing.
cmds/exp/sed/sed.go
Outdated
if cfg.inplace { | ||
fo := writeStreams[i] | ||
fo.Close() | ||
fiName := fi.(*os.File).Name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, this cast is dangerous if fi is not an os.File.
Thanks @rminnich @binjip978 for the review. I will give it another iteration (probably this weekend as time permits). And thanks for the encouragement, it helps :) |
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, PR is in much better shape now. Also I think you should move command to core from exp.
} | ||
|
||
func newTmpWriter(filename string) (*tmpWriter, error) { | ||
ftmp, err := os.CreateTemp("/tmp", ".sed*.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, but you can remove "/tmp" and just pass a ""
, I don't expect someone will run sed on android or windows, but anyway it better to use defaults.
if err != nil { | ||
return err | ||
} | ||
os.Rename(tw.ftmp.Name(), tw.filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we got any error when we calling Close for tmpWriter we should delete tmp file. Anyway we should not ignore os.Rename error.
replaced = t.from.ReplaceAllString(line, t.to) | ||
} else { | ||
counter := 0 | ||
// Replaces only the first occurrence which golang stdlib does not offer! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use both regex and strings to simplify:
found := t.from.FindString(line)
if found != "" {
strings.Replace(line, f, t.to, 1)
}
switch e[:1] { | ||
case "s": | ||
separator := e[1:2] | ||
eParts := strings.Split(e, separator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably can't use strings.Split here, the problem is escaping.
See this example:
input file: /home/dir
sed -e 's//home//Home/' file
/Home/dir
./sed -e 's//home//Home/' file
2023/10/03 08:16:44 error parsing expressions: unable to parse transformation. This should be of the form s/old/new/
Also you allow such transformations:
./sed -e 's/hello/world' file
but sed will return an error:
sed -e 's/hello/world' file
sed: 1: "s/hello/world
": unescaped newline inside substitute pattern
hi, if you can, please take a lot and we can get this very useful command in! Thanks! |
Thanks, I will surely try to allocate some time in the next few weeks to hammer it in shape. Just been too busy |
I think this person is not coming back. |
Fixes: #2769
Implement sed from https://github.com/u-root/u-root/blob/main/roadmap.md
This supports
Or (from one or more files to stdout)
Or (from files, inplace write to the same file)