NBFTReliability #401

OlivierHecart · 2022-12-13T15:21:11Z

No description provided.

…Info

…mples

p-avital · 2022-12-13T16:29:51Z

For now, my main blocker is not code (I haven't read much of it yet), but lack of documentation:

What does NBFT mean?
What are its contracts? (What problems are solved, under what conditions)
How should it be used?
- How many caches?
- How should the caches be configured?

The filenames are also a bit awkward, why not group all NBFT things into a directory rather than prefixed files (at least for the ones in src)?

For the builder, I would ask why they don't just extend the standard builders, but I guess it's mostly because you might need access to data that the builders keep private. If possible, it would be nice to add a small trait to turn do sub_builder.reliable() and obtain the NFTReliableSubscriberBuilder. I only see this as possible if we can have the reliable builders have a SubscriberBuilder field, which might not be possible.

This last comment makes me think: maybe it would be useful to have a (forever unstable) way to break a builder into its component parts, so that zenoh-ext would less often need to re-implement the builders fully just to add one option.

OlivierHecart · 2022-12-14T09:18:25Z

For now, my main blocker is not code (I haven't read much of it yet), but lack of documentation.

Most of the answers (if they exist) are there: https://github.com/eclipse-zenoh/roadmap/blob/main/rfcs/ALL/Non%20Blocking%20Fault%20Tolerant%20Reliability.md
Should we point there from the rustdoc ?

The filenames are also a bit awkward, why not group all NBFT things into a directory rather than prefixed files (at least for the ones in src)?

Sure

For the builder, I would ask why they don't just extend the standard builders, but I guess it's mostly because you might need access to data that the builders keep private.

That's indeed the main reason.

Mallets · 2022-12-14T08:58:36Z

zenoh-ext/examples/z_nbftr_cache.rs

+    let _cache = session
+        .declare_reliability_cache(key_expr)
+        .history(history)
+        .queryable_prefix(prefix)


What does queryable_prefix do? In what does it differ from the key_expr used in the declare_reliability_cache?

As described here data published on key k by a publisher with id <publisher_id>, will be made available by the cache for queries on <publisher_id>/k. So by changing this prefix you indicate for what publishers this cache is storing data (see here). Typically the cache attached to a NFTReliablePublisher will have the publisher id as prefix to only store data from this publisher while a cache deployed in the infrastructure will have a * prefix to store data from all publishers.
The code was partially inspired from the existing publication cache and names were taken from there. But I agree better names could be found.

Mallets · 2022-12-14T09:01:06Z

zenoh-ext/examples/z_nbftr_cache.rs

+    println!("Declaring NBFTReliabilityCache on {}", key_expr);
+    let _cache = session
+        .declare_reliability_cache(key_expr)
+        .history(history)


If history is not configured, what's the default? Is it mandatory?

The default history is 1024

The size of the reliability cache is really dependent on the use case and should be configured accordingly. This will force the user to think how big or small it has to be. Similar thing applies for the publisher and subscriber. Should we make it mandatory?

Mallets · 2022-12-14T09:03:10Z

zenoh-ext/examples/z_nbftr_pub.rs

+    println!("Declaring NBFTReliablePublisher on {}", &key_expr);
+    let publ = session
+        .declare_reliable_publisher(&key_expr)
+        .with_cache(cache)


Here I was expecting with_cache to accept a cache object created by declare_reliability_cache. Instead, it seems to take a bool. This is a bit counter-intuitive for the with_X pattern.

A cache associated to a publisher will be preconfigured for it. It will typically have the publisher id as queryable prefix. So taking a cache declared independently does not seem the best way to me. Still a better name for the function could be found...

Mallets · 2022-12-14T09:03:35Z

zenoh-ext/examples/z_nbftr_pub.rs

+    let publ = session
+        .declare_reliable_publisher(&key_expr)
+        .with_cache(cache)
+        .history(history)


If history is not configured, what's the default? Is it mandatory?

The default history is 1024

Mallets · 2022-12-14T09:04:11Z

zenoh-ext/examples/z_nbftr_sub.rs

+    println!("Declaring NBFTReliableSubscriber on {}", key_expr);
+    let subscriber = session
+        .declare_reliable_subscriber(key_expr)
+        .history(history)


If history is not configured, what's the default? Is it mandatory?

The default is false.

While the history function of NBFTReliabilityCache and NBFTReliablePublisher builders accept an integer and configures the depth of the reliability cache in number of samples for each key, the history function of NBFTReliableSubscriber builder defines if yes or no the NBFTReliableSubscriber should query for historical data at startup. This is indeed a bit confusing. one of them should be renamed.

zenoh-ext/src/nbftreliability_cache.rs

zenoh-ext/src/nbftreliable_subscriber.rs

Mallets · 2022-12-14T09:27:29Z

zenoh-ext/src/nbftreliable_subscriber.rs

+                .accept_replies(ReplyKeyExpr::Any)
+                .target(self.query_target)
+                .timeout(self.query_timeout)
+                .res_sync();


This should be res_async since it executed within async fn run.

Mallets · 2022-12-14T09:28:12Z

zenoh-ext/src/nbftreliable_subscriber.rs

+                .callback({
+                    move |r: Reply| {
+                        if let Ok(s) = r.sample {
+                            let (ref mut states, wait) = &mut *zlock!(handler.statesref);


Is there a reason to use a sync lock here instead of an async one? In case a lock needs to be used both in async and sync context I would use an async lock and use task::block_on in the sync context. In this way we avoid to potentially block the executor in the async context.

Mallets · 2022-12-14T09:37:14Z

zenoh-ext/src/nbftreliable_subscriber.rs

+                                    move |r: Reply| {
+                                        if let Ok(s) = r.sample {
+                                            let (ref mut states, wait) =
+                                                &mut *zlock!(handler.statesref);


It seems that here the lock is taken and kept until handle_sample has finished. However, handle_sample calls a callback while keeping the lock. Shouldn't the lock be release before calling the callback?

…ableSubscriber key_expr

evshary · 2023-09-01T01:30:31Z

@OlivierHecart Since it's been a while ago, is the PR still alive? As a guard this week, just want to make sure no PR is forgotten. 😃

fuzzypixelz · 2024-01-12T16:15:49Z

@OlivierHecart Please change your pull request's base branch to main (new default branch). And rebase your branch against main as it is missing a status check necessary to merge this pull request but which is only available on main.

imstevenpmwork · 2024-03-14T11:54:45Z

@milyin I see this PR was mentioned in #669 which is part of the next release. Would you mind linking this PR to the issue and adding it to the project roadmap? Bonus points if you use the release tag and set a timeline :D

milyin · 2024-03-15T10:30:43Z

@milyin I see this PR was mentioned in #669 which is part of the next release. Would you mind linking this PR to the issue and adding it to the project roadmap? Bonus points if you use the release tag and set a timeline :D

This implementation is going to be rewritten accordingly to requirements in #669. This is definitely going to be started after the release, currently we don't have time to it. Converting the PR to draft

OlivierHecart added 15 commits October 22, 2022 01:29

Add first end-to-end reliability implementation in zenoh-ext

1d364b3

ReliabilityCache can interpret a seq num range in query parameters

01e69ac

Cleanup

bfa393a

Add periodic queries option to ReliableSubscriber

ed3850d

ReliableSubscriber can optionally retrieve historical data

84ea3a0

Mark end-to-end reliability related stuff as unstable

38555f6

Merge branch 'master' into end2end-reliability

2b18a11

Reintroduce SourceInfo without first_router_id/sn

f3d0f28

Add with_source_info function to Publication builder

b2840ac

Fix clippy warnings

64283a7

ReliablePublisher, ReliableSubscriber and ReliabilityCache use Source…

e74ddde

…Info

End-to-end reliability examples use demo/example key as the other exa…

75433d8

…mples

Fix clippy warning

330eb23

Rename (E2E)Reliability to NBFTReliability

72235b2

Merge branch 'master' into end2end-reliability

477ac32

OlivierHecart requested review from JEnoch, Mallets and p-avital December 13, 2022 15:21

Mallets reviewed Dec 14, 2022

View reviewed changes

OlivierHecart added 3 commits December 14, 2022 15:36

Avoid unsafe using OwnedKeyExpr::borrow

15e39ac

Check that sample keys of query replies indeed intersect the NBFTReli…

3464907

…ableSubscriber key_expr

Rename nbft declaration functions

89957a1

JEnoch mentioned this pull request Dec 9, 2023

Always create RELIABLE Writers (fix #23) eclipse-zenoh/zenoh-plugin-ros2dds#26

Merged

milyin mentioned this pull request Jan 25, 2024

NBFT reliability #669

Open

milyin marked this pull request as draft March 15, 2024 10:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NBFTReliability #401

NBFTReliability #401

OlivierHecart commented Dec 13, 2022

p-avital commented Dec 13, 2022

OlivierHecart commented Dec 14, 2022

Mallets Dec 14, 2022

OlivierHecart Dec 14, 2022 •

edited

Mallets Dec 14, 2022

OlivierHecart Dec 14, 2022

Mallets Dec 14, 2022

Mallets Dec 14, 2022

OlivierHecart Dec 14, 2022

Mallets Dec 14, 2022

OlivierHecart Dec 14, 2022

Mallets Dec 14, 2022

OlivierHecart Dec 14, 2022 •

edited

Mallets Dec 14, 2022

Mallets Dec 14, 2022

Mallets Dec 14, 2022

evshary commented Sep 1, 2023

fuzzypixelz commented Jan 12, 2024

imstevenpmwork commented Mar 14, 2024 •

edited

milyin commented Mar 15, 2024

NBFTReliability #401

Are you sure you want to change the base?

NBFTReliability #401

Conversation

OlivierHecart commented Dec 13, 2022

p-avital commented Dec 13, 2022

OlivierHecart commented Dec 14, 2022

Choose a reason for hiding this comment

OlivierHecart Dec 14, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OlivierHecart Dec 14, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evshary commented Sep 1, 2023

fuzzypixelz commented Jan 12, 2024

imstevenpmwork commented Mar 14, 2024 • edited

milyin commented Mar 15, 2024

OlivierHecart Dec 14, 2022 •

edited

OlivierHecart Dec 14, 2022 •

edited

imstevenpmwork commented Mar 14, 2024 •

edited