Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sed #2768

Closed
wants to merge 11 commits into from
Closed

Add sed #2768

wants to merge 11 commits into from

Conversation

kim0
Copy link

@kim0 kim0 commented Sep 24, 2023

Fixes: #2769

Implement sed from https://github.com/u-root/u-root/blob/main/roadmap.md

This supports

echo "This is foo word" | sed -e 's/foo/bar'

Or (from one or more files to stdout)

echo "This is foo word" > file.txt
sed -e 's/foo/bar' file.txt  # output to stdout

Or (from files, inplace write to the same file)

echo "This is foo word" > file.txt
cp file.txt file2.txt
sed -i -e 's/foo/bar' file.txt file2.txt  # output to input files

Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
@codecov
Copy link

codecov bot commented Sep 24, 2023

Codecov Report

Attention: 19 lines in your changes are missing coverage. Please review.

Comparison is base (37ed9ac) 75.17% compared to head (32cdd2d) 75.19%.

Files Patch % Lines
cmds/exp/sed/sed.go 82.24% 19 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2768      +/-   ##
==========================================
+ Coverage   75.17%   75.19%   +0.01%     
==========================================
  Files         432      433       +1     
  Lines       42933    43040     +107     
==========================================
+ Hits        32275    32363      +88     
- Misses      10658    10677      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@binjip978 binjip978 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to follow the same pattern as we do for example grep command, add support of BRE and add more tests.

for r.Scan() {
line := r.Text()
for _, transform := range cfg.transforms {
line = strings.ReplaceAll(line, transform.from, transform.to)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sed command should support BRE, here it just a string replace. See https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/sed.html
So sed -e 's/m.ngo/apple/' FILE will not work as expected.

I'm not a sed/ed expert but replaceAll should happen if command is like s/p1/p2/g in keys of s/old/new/ it should replace only first occurrence per line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. only replace all on a line with g otion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do someting like this
f :=- strings.Replace
if global
f = strings.ReplaceAll

and consider passing f as a parameter to transformCopy, which makes testing easier, as f can something other than strings.Replace.

inplace bool
}

var cfg = config{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid global variables in command flags it makes testing hard. See /cmds/core/grep
for reference. It's better to create a structure with command flags and other necessary things. This will simplify testing significantly because it you no longer need to unwind global variables.

return fmt.Sprint(*t)
}

func (t *transforms) Set(value string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe sed command parse should be more complicated:

for example this valid command doesn't work:
./sed -e 's/m.ngo/apple/' FILE

Also ./sed 's/mango/apple' FILE don't work either, but it should according to POSIX spec.

So probably the whole sed command parser is complicated, so it's ok to start small. But it need to be aware of regular expression, so I don't think strings.Split will be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting small is fine, right now we have zero :-)

"testing"
)

type nopCloser struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standard library io package has NopCloser(r Reader) ReadCloser

fiDir := path.Dir(fiName)
ftmp, err := os.CreateTemp(fiDir, "sed*.txt")
if err != nil {
fmt.Printf("unable to create temp file: %#v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to return errors rather than os.Exit or doing log.Fatalf

if cfg.inplace {
fiName := readStreams[i].(*os.File).Name()
fiDir := path.Dir(fiName)
ftmp, err := os.CreateTemp(fiDir, "sed*.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to create temp file? We probably can rewrite it in place, also if something go wrong we need to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure rewrite in place is an option. If there is a failure at any point, and we have gotten part way through a rewrite in place, that will be very bad. At the same time, reading the entire file in is not really an option, as on limited memory systems it might not fit.

here is what real sed does on linux:

1240935 futex(0x7fefb0c1aa6c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
1240935 openat(AT_FDCWD, "/tmp/hosts", O_RDONLY) = 3
1240935 ioctl(3, TCGETS, 0x7ffc2eb5c010) = -1 ENOTTY (Inappropriate ioctl for device)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 umask(077)                      = 002
1240935 openat(AT_FDCWD, "/tmp/sedyrouyD", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
1240935 umask(002)                      = 077
1240935 fcntl(4, F_GETFL)               = 0x8002 (flags O_RDWR|O_LARGEFILE)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 read(3, "127.0.0.1\tlocalhost\n::1\t\tlocalhost\n127.0.1.1\tpop-os.localdomain\tpop-os\n\n192.168.0.1 centre usbadapter\n192.168.0.2 visionfive2 u6ccf39004821  u6ccf39004820 u7eda2b555f3a\n192.168.0.3 vf3 u6ccf39003f6b u6cc
f39003f6a\n\n# windows craptop\n192.168.1.2 galaxy g u00e04c680a0f\n", 4096) = 267
1240935 newfstatat(4, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508433, st_mode=S_IFREG|0600, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=0, st_size=0, st_atime=1695668175 /* 2023-09-25T11:56:15.182309592-0700
*/, st_atime_nsec=182309592, st_mtime=1695668175 /* 2023-09-25T11:56:15.182309592-0700 */, st_mtime_nsec=182309592, st_ctime=1695668175 /* 2023-09-25T11:56:15.182309592-0700 */, st_ctime_nsec=182309592}, AT_EMPTY_PATH) = 0
1240935 read(3, "", 4096)               = 0
1240935 fchown(4, 1000, 1000)           = 0
1240935 fgetxattr(3, "system.posix_acl_access", 0x7ffc2eb5bfb0, 132) = -1 ENODATA (No data available)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 fsetxattr(4, "system.posix_acl_access", "\2\0\0\0\1\0\6\0\377\377\377\377\4\0\4\0\377\377\377\377 \0\4\0\377\377\377\377", 28, 0) = 0
1240935 close(3)                        = 0
1240935 write(4, "127.0.0.1\tlocblhost\n::1\t\tlocblhost\n127.0.1.1\tpop-os.locbldomain\tpop-os\n\n192.168.0.1 centre usbbdapter\n192.168.0.2 visionfive2 u6ccf39004821  u6ccf39004820 u7edb2b555f3a\n192.168.0.3 vf3 u6ccf39003f6b u6c
cf39003f6b\n\n# windows crbptop\n192.168.1.2 gblaxy g u00e04c680a0f\n", 267) = 267
1240935 close(4)                        = 0
1240935 rename("/tmp/sedyrouyD", "/tmp/hosts") = 0
1240935 close(1)                        = 0
1240935 exit_group(0)                   = ?

I suspect this is the best compromise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right, sed doing exactly that. Also on linux if I run sed 's/a/b/' file > file I got an empty file at the end, it's kinda surprising result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is but it isn't.
The shell evaluates the command and the redirect, so the first thing that happens, after parsing, is the shell truncates file to 0 bytes (> file) and then runs sed on it :-)
It's something we could wish worked differently, but fixing it is harder than telling people not to do it :-)

}

if err := w.Flush(); err != nil {
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to panic it's better to return error and then log.Fatalf like we do in other commands.

@binjip978 binjip978 added the Awaiting author Waiting for new changes or feedback for author. label Sep 25, 2023
@rminnich
Copy link
Member

thanks very much for this contribution. Please make the requests from binjip978, they are all very sensible, and we would like to have this command! thanks!

if cfg.inplace {
fiName := readStreams[i].(*os.File).Name()
fiDir := path.Dir(fiName)
ftmp, err := os.CreateTemp(fiDir, "sed*.txt")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure rewrite in place is an option. If there is a failure at any point, and we have gotten part way through a rewrite in place, that will be very bad. At the same time, reading the entire file in is not really an option, as on limited memory systems it might not fit.

here is what real sed does on linux:

1240935 futex(0x7fefb0c1aa6c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
1240935 openat(AT_FDCWD, "/tmp/hosts", O_RDONLY) = 3
1240935 ioctl(3, TCGETS, 0x7ffc2eb5c010) = -1 ENOTTY (Inappropriate ioctl for device)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 umask(077)                      = 002
1240935 openat(AT_FDCWD, "/tmp/sedyrouyD", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
1240935 umask(002)                      = 077
1240935 fcntl(4, F_GETFL)               = 0x8002 (flags O_RDWR|O_LARGEFILE)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 read(3, "127.0.0.1\tlocalhost\n::1\t\tlocalhost\n127.0.1.1\tpop-os.localdomain\tpop-os\n\n192.168.0.1 centre usbadapter\n192.168.0.2 visionfive2 u6ccf39004821  u6ccf39004820 u7eda2b555f3a\n192.168.0.3 vf3 u6ccf39003f6b u6cc
f39003f6a\n\n# windows craptop\n192.168.1.2 galaxy g u00e04c680a0f\n", 4096) = 267
1240935 newfstatat(4, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508433, st_mode=S_IFREG|0600, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=0, st_size=0, st_atime=1695668175 /* 2023-09-25T11:56:15.182309592-0700
*/, st_atime_nsec=182309592, st_mtime=1695668175 /* 2023-09-25T11:56:15.182309592-0700 */, st_mtime_nsec=182309592, st_ctime=1695668175 /* 2023-09-25T11:56:15.182309592-0700 */, st_ctime_nsec=182309592}, AT_EMPTY_PATH) = 0
1240935 read(3, "", 4096)               = 0
1240935 fchown(4, 1000, 1000)           = 0
1240935 fgetxattr(3, "system.posix_acl_access", 0x7ffc2eb5bfb0, 132) = -1 ENODATA (No data available)
1240935 newfstatat(3, "", {st_dev=makedev(0xfd, 0x1), st_ino=113508431, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=8, st_size=267, st_atime=1695668151 /* 2023-09-25T11:55:51.782631585-070
0 */, st_atime_nsec=782631585, st_mtime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_mtime_nsec=782631585, st_ctime=1695668151 /* 2023-09-25T11:55:51.782631585-0700 */, st_ctime_nsec=782631585}, AT_EMPTY_PATH) = 0
1240935 fsetxattr(4, "system.posix_acl_access", "\2\0\0\0\1\0\6\0\377\377\377\377\4\0\4\0\377\377\377\377 \0\4\0\377\377\377\377", 28, 0) = 0
1240935 close(3)                        = 0
1240935 write(4, "127.0.0.1\tlocblhost\n::1\t\tlocblhost\n127.0.1.1\tpop-os.locbldomain\tpop-os\n\n192.168.0.1 centre usbbdapter\n192.168.0.2 visionfive2 u6ccf39004821  u6ccf39004820 u7edb2b555f3a\n192.168.0.3 vf3 u6ccf39003f6b u6c
cf39003f6b\n\n# windows crbptop\n192.168.1.2 gblaxy g u00e04c680a0f\n", 267) = 267
1240935 close(4)                        = 0
1240935 rename("/tmp/sedyrouyD", "/tmp/hosts") = 0
1240935 close(1)                        = 0
1240935 exit_group(0)                   = ?

I suspect this is the best compromise.

for r.Scan() {
line := r.Text()
for _, transform := range cfg.transforms {
line = strings.ReplaceAll(line, transform.from, transform.to)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. only replace all on a line with g otion.

for r.Scan() {
line := r.Text()
for _, transform := range cfg.transforms {
line = strings.ReplaceAll(line, transform.from, transform.to)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do someting like this
f :=- strings.Replace
if global
f = strings.ReplaceAll

and consider passing f as a parameter to transformCopy, which makes testing easier, as f can something other than strings.Replace.

return fmt.Sprint(*t)
}

func (t *transforms) Set(value string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting small is fine, right now we have zero :-)

writeStreams := make([]io.WriteCloser, l)
for i := range writeStreams {
if cfg.inplace {
fiName := readStreams[i].(*os.File).Name()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of cast is a bit dangerous. Not all ReadClosers are os.File.
Passing ReadCloser is good for testing.
options:
implement your own type that implements ReadCloser and provides Name()
fiName, ok := readStreams[i].(*os.File).Name()
if ! ok {
return fmt.Errorf("can not do in-place rewrite on %T:%w", readStreams[i], os.ErrInvalid) // use %w to make tests easier
}

func createWriteStreams(cfg config, readStreams []io.ReadCloser) []io.WriteCloser {
l := len(readStreams)
writeStreams := make([]io.WriteCloser, l)
for i := range writeStreams {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do:
for i, r := range readStreams {
the code might be easier to write/read.

for _, filename := range flag.Args() {
fh, err := os.Open(filename)
if err != nil {
fmt.Printf("unable to open input stream: %s. %#v\n", filename, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use log.Fatalf instead for fmt.Printf follow by os.Exit. It's the same thing.

if cfg.inplace {
fo := writeStreams[i]
fo.Close()
fiName := fi.(*os.File).Name()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this cast is dangerous if fi is not an os.File.

@kim0
Copy link
Author

kim0 commented Sep 27, 2023

Thanks @rminnich @binjip978 for the review. I will give it another iteration (probably this weekend as time permits). And thanks for the encouragement, it helps :)

Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Signed-off-by: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Copy link
Contributor

@binjip978 binjip978 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, PR is in much better shape now. Also I think you should move command to core from exp.

}

func newTmpWriter(filename string) (*tmpWriter, error) {
ftmp, err := os.CreateTemp("/tmp", ".sed*.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but you can remove "/tmp" and just pass a "", I don't expect someone will run sed on android or windows, but anyway it better to use defaults.

if err != nil {
return err
}
os.Rename(tw.ftmp.Name(), tw.filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we got any error when we calling Close for tmpWriter we should delete tmp file. Anyway we should not ignore os.Rename error.

replaced = t.from.ReplaceAllString(line, t.to)
} else {
counter := 0
// Replaces only the first occurrence which golang stdlib does not offer!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use both regex and strings to simplify:
found := t.from.FindString(line)
if found != "" {
strings.Replace(line, f, t.to, 1)
}

switch e[:1] {
case "s":
separator := e[1:2]
eParts := strings.Split(e, separator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably can't use strings.Split here, the problem is escaping.

See this example:

input file: /home/dir
sed -e 's//home//Home/' file
/Home/dir

./sed -e 's//home//Home/' file
2023/10/03 08:16:44 error parsing expressions: unable to parse transformation. This should be of the form s/old/new/

Also you allow such transformations:
./sed -e 's/hello/world' file
but sed will return an error:
sed -e 's/hello/world' file
sed: 1: "s/hello/world
": unescaped newline inside substitute pattern

@rminnich
Copy link
Member

hi, if you can, please take a lot and we can get this very useful command in! Thanks!

@kim0
Copy link
Author

kim0 commented Nov 16, 2023

hi, if you can, please take a lot and we can get this very useful command in! Thanks!

Thanks, I will surely try to allocate some time in the next few weeks to hammer it in shape. Just been too busy

@rminnich
Copy link
Member

I think this person is not coming back.

@rminnich rminnich closed this Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting author Waiting for new changes or feedback for author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement sed command
3 participants