Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(jailer): enable multi-threaded test #4438

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

cm-iwata
Copy link
Contributor

@cm-iwata cm-iwata commented Feb 9, 2024

Changes

Previously, all tests shared same temporary file/directory, causing concurrency conflicts when running tests in multi-threaded.
Resolved test concurrency issues by incorporating random strings into file/directory names.

Reason

Fixes #4412

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@cm-iwata cm-iwata marked this pull request as ready for review February 9, 2024 05:19
Copy link

codecov bot commented Feb 9, 2024

Codecov Report

Attention: Patch coverage is 96.82540% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 81.66%. Comparing base (19f3623) to head (e01d256).

❗ Current head e01d256 differs from pull request most recent head 05b0fea. Consider uploading reports for the commit 05b0fea to get more accurate results

Files Patch % Lines
src/jailer/src/env.rs 85.71% 1 Missing ⚠️
src/jailer/src/main.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4438      +/-   ##
==========================================
- Coverage   82.13%   81.66%   -0.48%     
==========================================
  Files         255      243      -12     
  Lines       31268    29871    -1397     
==========================================
- Hits        25683    24394    -1289     
+ Misses       5585     5477     -108     
Flag Coverage Δ
4.14-c5n.metal 78.99% <96.82%> (-0.65%) ⬇️
4.14-c7g.metal ?
4.14-m5n.metal 78.98% <96.82%> (-0.65%) ⬇️
4.14-m6a.metal 78.12% <96.82%> (-0.72%) ⬇️
4.14-m6g.metal 77.08% <96.82%> (+0.39%) ⬆️
4.14-m6i.metal 78.97% <96.82%> (-0.65%) ⬇️
4.14-m7g.metal 77.08% <96.82%> (+0.39%) ⬆️
5.10-c5n.metal 81.64% <96.82%> (-0.52%) ⬇️
5.10-m5n.metal 81.62% <96.82%> (-0.52%) ⬇️
5.10-m6a.metal 80.86% <96.82%> (-0.58%) ⬇️
5.10-m6g.metal 79.96% <96.82%> (+0.49%) ⬆️
5.10-m6i.metal 81.62% <96.82%> (-0.52%) ⬇️
5.10-m7g.metal 79.96% <96.82%> (+0.49%) ⬆️
6.1-c5n.metal 81.64% <96.82%> (-0.52%) ⬇️
6.1-m5n.metal 81.62% <96.82%> (-0.52%) ⬇️
6.1-m6a.metal 80.86% <96.82%> (-0.59%) ⬇️
6.1-m6g.metal 79.96% <96.82%> (+0.49%) ⬆️
6.1-m6i.metal 81.62% <96.82%> (-0.51%) ⬇️
6.1-m7g.metal 79.96% <96.82%> (+0.50%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zulinx86
Copy link
Contributor

zulinx86 commented Feb 9, 2024

Hello @cm-iwata ,

Thank you for your contribution!
Could you please fix clippy test errors?
https://buildkite.com/firecracker/firecracker-pr/builds/8932#018d8d15-d42f-4863-969e-4df250266426/55-402

@cm-iwata
Copy link
Contributor Author

cm-iwata commented Feb 9, 2024

@zulinx86
Sorry. fixed it.

@xmarcalx xmarcalx added the Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled ` label Feb 12, 2024
@zulinx86
Copy link
Contributor

zulinx86 commented Feb 12, 2024

@cm-iwata

Could you please do the following things?:

Thanks!

@zulinx86
Copy link
Contributor

It's a matter of taste, but it might be better to use "test(jailer)" rather than "fix(jailer)". "fix" may look like fixing a bug of jailer. I believe it's like improvement of jailer unit tests.

@cm-iwata cm-iwata force-pushed the fix/parallel-jailer-test branch 4 times, most recently from 3134792 to 33ced74 Compare February 13, 2024 02:10
@cm-iwata
Copy link
Contributor Author

@zulinx86
Thank you for the review!

Add a commit message explaining the context like why this commit is needed

fixed.

Fix all the unit test failures

I believe I've probably fixed the relevant parts.
Since the tests were passing in my dev environment before the fix, I haven't been able to confirm correctly that the fix made the tests pass.
Also, since some unit tests are failing in my dev environment, I haven't been able to confirm that all unit tests are passing.
Could you please run CI?

※I use GCEfor dev environment.

Post the result here when running it in parallel as instructed in Fix running cargo test in parallel for vmm #4412 since we cannot still run all the rust unit tests in parallel at the moment.

here is result.

root@dbaa0c959c45:/firecracker# cargo test --package jailer
warning: patch for `kvm-bindings` uses the features mechanism. default-features and features will not take effect because the patch dependency does not support this mechanism
    Finished test [unoptimized + debuginfo] target(s) in 0.10s
     Running unittests src/main.rs (build/cargo_target/debug/deps/jailer-2ccd13efbd56ca77)

running 32 tests
test cgroup::tests::test_cgroup_builder_no_mounts ... ok
test cgroup::tests::test_get_controller ... ok
test env::tests::test_dup2 ... ok
test env::tests::test_join_netns ... ok
test env::tests::test_parse_resource_limits ... ok
test cgroup::tests::test_inherit_from_parent ... ok
test env::tests::test_copy_exec_to_chroot ... ok
test resource_limits::tests::test_default_resource_limits ... ok
test env::tests::test_validate_exec_file ... ok
test cgroup::tests::test_cgroup_builder_v2 ... ok
test resource_limits::tests::test_display_resource ... ok
test cgroup::tests::test_cgroup_builder_v1_with_v2_mounts ... ok
test resource_limits::tests::test_from_resource ... ok
test resource_limits::tests::test_install ... ok
test resource_limits::tests::test_set_resource_limits ... ok
test tests::test_clean_env_vars ... ok
test env::tests::test_save_exec_file_pid ... ok
test cgroup::tests::test_cgroup_builder_v1 ... ok
test cgroup::tests::test_cgroup_builder_v2_with_v1_mounts ... ok
test tests::test_to_cstring ... ok
test env::tests::test_copy_cache_info ... ok
test env::tests::test_cgroups_parsing ... ok
test env::tests::test_setup_jailed_folder ... ok
test env::tests::test_userfaultfd_dev ... ok
test env::tests::test_mknod_and_own_dev ... ok
test cgroup::tests::test_cgroup_build ... ok
test cgroup::tests::test_cgroup_build_invalid ... ok
test cgroup::tests::test_cgroup_v2_write_value ... ok
test env::tests::test_new_env ... ok
test tests::test_fds_close_range ... ok
test tests::test_sanitize_process ... ok
test tests::test_fds_proc ... ok

test result: ok. 32 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.12s

Comment on lines +25 to 30
pub struct CgroupBuilder<'a> {
version: u8,
hierarchies: HashMap<String, PathBuf>,
mount_points: Vec<CgroupMountPoint>,
proc_mounts_path: &'a str,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may use more memory than before.

It would be better to do:

Suggested change
pub struct CgroupBuilder<'a> {
version: u8,
hierarchies: HashMap<String, PathBuf>,
mount_points: Vec<CgroupMountPoint>,
proc_mounts_path: &'a str,
}
pub struct CgroupBuilder {
version: u8,
hierarchies: HashMap<String, PathBuf>,
mount_points: Vec<CgroupMountPoint>,
proc_mounts_path: &'static str,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonathanWoollett-Light
Thank you for your suggestion. However, since the field requires dynamic values.
Because it's necessary to isolate the environment between multiple threads running tests.
If static lifetime were used, multiple threads running tests would access the same directory, potentially causing fail tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only saw static strings passed to be used as proc_mounts_path, if the number of threads in tests is constant we could have a static string for each. Could you point to where a non-static string is used?

This is a nit regardless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although only for testing, dynamic values are used in the following places:

let mock_proc_mounts = format!(
"/tmp/firecracker/test/{}/jailer/proc/mounts",
rand::rand_alphanumerics(4).into_string().unwrap()

let builder = CgroupBuilder::new(1, mock_cgroups.proc_mounts_path.as_str());

How about changing the type of proc_mounts_path to Cow<'static, str>? With this change, the production code can accept the constant PROC_MOUNTS, while the test code can dynamically generate a String. What do you think about this approach?

@zulinx86 zulinx86 changed the title fix(jailer): enable multi-threaded test test(jailer): enable multi-threaded test Mar 13, 2024
@zulinx86
Copy link
Contributor

@cm-iwata Could you please squash 3134792 and 33ced74 into one commit? Two identical commits are shown in https://github.com/firecracker-microvm/firecracker/pull/4438/commits.

Comment on lines 451 to 456
pub fn get_mock_proc_mounts() -> String {
format!(
"/tmp/firecracker/test/{}/jailer/proc/mounts",
rand::rand_alphanumerics(4).into_string().unwrap()
)
}
Copy link
Contributor

@zulinx86 zulinx86 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_mock_proc_mounts() is always called just before MockCgroupFs::new(). How about putting get_mock_proc_monts() inside the MockCgroupFs::new()? By doing this and owning the string, the lifetime parameter <'a> of MockCgroupFs can be removed as follows:

-    pub fn get_mock_proc_mounts() -> String {
-        format!(
-            "/tmp/firecracker/test/{}/jailer/proc/mounts",
-            rand::rand_alphanumerics(4).into_string().unwrap()
-        )
-    }
-
     #[derive(Debug)]
-    pub struct MockCgroupFs<'a> {
+    pub struct MockCgroupFs {
         mounts_file: File,
-        pub proc_mount_path: &'a str,
+        pub proc_mount_path: String,
     }
 
     // Helper object that simulates the layout of the cgroup file system
     // This can be used for testing regardless of the availability of a particular
     // version of cgroups on the system
-    impl<'a> MockCgroupFs<'a> {
+    impl MockCgroupFs {
         pub fn create_file_with_contents<P: AsRef<Path> + Debug>(
             filename: P,
             contents: &str,
@@ -480,10 +473,12 @@ pub mod test_util {
             Ok(())
         }
 
-        pub fn new(
-            mock_proc_mount: &'a str,
-        ) -> std::result::Result<MockCgroupFs<'a>, std::io::Error> {
-            let mock_proc_dir = Path::new(mock_proc_mount).parent().unwrap();
+        pub fn new() -> std::result::Result<MockCgroupFs, std::io::Error> {
+            let mock_proc_mount = format!(
+                "/tmp/firecracker/test/{}/jailer/proc/mounts",
+                rand::rand_alphanumerics(4).into_string().unwrap()
+            );
+            let mock_proc_dir = Path::new(&mock_proc_mount).parent().unwrap();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 540eea5 and 14f8f38

let f = File::open(PROC_MOUNTS)
.map_err(|err| JailerError::FileOpen(PathBuf::from(PROC_MOUNTS), err))?;
let f = File::open(b.proc_mounts_path)
.map_err(|err| JailerError::FileOpen(PathBuf::from(&b.proc_mounts_path), err))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking, but & can be removed.

Suggested change
.map_err(|err| JailerError::FileOpen(PathBuf::from(&b.proc_mounts_path), err))?;
.map_err(|err| JailerError::FileOpen(PathBuf::from(b.proc_mounts_path), err))?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in e77496a

@@ -72,7 +68,8 @@ impl CgroupBuilder {
).map_err(JailerError::RegEx)?;

for l in BufReader::new(f).lines() {
let l = l.map_err(|err| JailerError::ReadLine(PathBuf::from(PROC_MOUNTS), err))?;
let l =
l.map_err(|err| JailerError::ReadLine(PathBuf::from(&b.proc_mounts_path), err))?;
Copy link
Contributor

@zulinx86 zulinx86 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: See #4438 (comment)

Suggested change
l.map_err(|err| JailerError::ReadLine(PathBuf::from(&b.proc_mounts_path), err))?;
l.map_err(|err| JailerError::ReadLine(PathBuf::from(b.proc_mounts_path), err))?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in e77496a

mounts_file: File,
pub proc_mount_path: &'a str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: the name proc_mounts_path is used for CgroupBuilder and the file path /proc/mounts, so proc_mounts_path would be preferable here as well.

Suggested change
pub proc_mount_path: &'a str,
pub proc_mounts_path: &'a str,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in a9979ed

@@ -478,28 +480,45 @@ pub mod test_util {
Ok(())
}

pub fn new() -> std::result::Result<MockCgroupFs, std::io::Error> {
pub fn new(
mock_proc_mount: &'a str,
Copy link
Contributor

@zulinx86 zulinx86 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: See #4438 (comment).

Suggested change
mock_proc_mount: &'a str,
mock_proc_mounts: &'a str,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in a9979ed

Previously, all tests shared same temporary file/directory,
causing concurrency conflicts when running tests in multi-threaded.

Resolved test concurrency issues by incorporating random strings
into file/directory names.

Link: firecracker-microvm#4412

Signed-off-by: Tomoya Iwata <iwata.tomoya@classmethod.jp>
Renamed several variables.
Because these variables are typically assigned to `/proc/mounts`.

Signed-off-by: Tomoya Iwata <iwata.tomoya@classmethod.jp>
Move mocked path prepare process inside `MockCgroupFs::new()`.

Signed-off-by: Tomoya Iwata <iwata.tomoya@classmethod.jp>
Remove unnecessary `&` from function call arguments

Signed-off-by: Tomoya Iwata <iwata.tomoya@classmethod.jp>
Move mocked path prepare process inside `MockCgroupFs::new()`.

Signed-off-by: Tomoya Iwata <iwata.tomoya@classmethod.jp>
@cm-iwata
Copy link
Contributor Author

Could you please squash 3134792 and 33ced74 into one commit?

Sorry. I made a mistake while rebasing.
I squashed tow commits.

@cm-iwata
Copy link
Contributor Author

cm-iwata commented Mar 14, 2024

I thought the code was functioning correctly after the fix, but I've noticed occasional test failures.
Could you please advise on the following issues?

1.Too many open files

Sometimes tests fail as follows.

---- tests::test_fds_proc stdout ----
thread 'tests::test_fds_proc' panicked at src/jailer/src/main.rs:409:33:
called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }

---- tests::test_fds_close_range stdout ----
thread 'tests::test_fds_close_range' panicked at src/jailer/src/main.rs:409:33:
called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- tests::test_sanitize_process stdout ----
thread 'tests::test_sanitize_process' panicked at src/jailer/src/main.rs:409:33:
called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }

The code causing this issue can be found here.

for i in 0..n {
let maybe_file = File::create(format!("{}/{}", &tmp_dir_path, i));
fds.push(maybe_file.unwrap().into_raw_fd());
}

When running the tests, 100 files are create per thread, it causing the file descriptor limit to be exceeded when running tests in multithread. Changing the loop condition n from 100 to 10 resolved the issue. Is there any specific reason for using the number 100? If not, would it be acceptable to change n to 10?

2. Bad file descriptor

Even after fixing 1, tests may still fail.

---- env::tests::test_new_env stdout ----
thread 'env::tests::test_new_env' panicked at src/jailer/src/env.rs:911:14:
This another new environment should be created successfully.: ReadLine("/tmp/firecracker/test/MCzQ/jailer/proc/mounts", Os { code: 9, kind: Uncategorized, message: "Bad file descriptor" })
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I believe the cause is the following code.

fn test_fds_proc() {
run_close_fds_test(close_fds_by_reading_proc);
}

The error should occur in the following sequence:

  • thread1 opens fd4.
  • thread2 opens fd5.
  • thread1 closes fd4 and fd5.
  • thread 2 access to fd5, then Bad file descriptor error will occur

Commenting out the above test cases allows the tests to run successfully. Do you have any suggestions for how to fix the code? I couldn't think of any good ideas, so I would appreciate any advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled `
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix running cargo test in parallel for Jailer
4 participants