Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 6884: Random Read Perf Tuning - Token Cache #6885

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

yaol7
Copy link

@yaol7 yaol7 commented Sep 26, 2022

Change log description

  • Add one SimpleCache for DelegationTokenProvider for every segment to avoid the time cost operation, namely token fetch from controller and token parse and so on.
  • When the EventSegmentReader is created, it should be initialized with the correct read offset from the EventPointer.
  • Fix one bug in the SegmentStore side for ReadSegment operation. Here is the error message. This issue will result in that there is no ReplyProcessor for the response of ReadSegment operation. The error handler won't be triggered as normal.

2022-08-26 07:42:36,610 398197 [ClientSocketReaders-5] WARN io.pravega.client.connection.impl.FlowHandler: [] - No ReplyProcessor found for the provided flowId 0. Ignoring response

Purpose of the change
Fixes #6884

What the code does
(Detailed description of the code changes)

Test Result
//before perf tuning, result of query result.

50th percentile latency,term,53123.77484299941,ms
90th percentile latency,term,56851.09998769185,ms
99th percentile latency,term,57644.67728016287,ms
100th percentile latency,term,57716.90405299887,ms
50th percentile service time,term,86.67103000334464,ms
90th percentile service time,term,129.02330341166817,ms
99th percentile service time,term,170.20670108206104,ms
100th percentile service time,term,183.68700201972388,ms

//After perf tuning

50th percentile latency,term,15752.61373948888,ms
90th percentile latency,term,17127.619731408777,ms
99th percentile latency,term,17377.725965366117,ms
100th percentile latency,term,17410.35679000197,ms
50th percentile service time,term,64.25273600325454,ms
90th percentile service time,term,73.91316668363288,ms
99th percentile service time,term,84.0185228383052,ms
100th percentile service time,term,89.00812402134761,ms


protected DelegationTokenProvider getDelegationTokenProvider(Segment segment) {
log.debug("getDelegationTokenProvider, scopedName: {}, cacheInfo: {}", segment.getScopedName(), tokenProviderCache.printCache());
DelegationTokenProvider tokenProvider = tokenProviderCache.get(segment.getScopedName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using a per-segment cache it makes more sense to have a per stream cache. It is possible to create a DelegationTokenProvider given a stream. If we take advantage of this we will be able to consolidate entries in the cache.

}

protected DelegationTokenProvider createDelegationTokenProvider(Segment segment) {
return DelegationTokenProviderFactory.create(controller, segment, AccessOperation.READ);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the stream rather than the segment. (See above)

DelegationTokenProvider mockDelegationTokenProvider = mock(DelegationTokenProvider.class);
tokenProviderSimpleCache.put("1", mockDelegationTokenProvider);

Thread.sleep(60000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleep is not allowed in unit tests. We run over 800 tests and they complete in 13 minutes. Please insure that your tests do not take disproportionate time. They should run in a few milliseconds. To do this we need all tests operated based on time to use a Supplier<Long> instead of just using systemTimeMillis that was an alternate implementation can be provided, or alternatively find a way to use a mock or have test methods to make the test deterministic without relying on time.


private static final Logger log = LoggerFactory.getLogger(SegmentInputStreamFactoryImpl.class);
private static final int CACHE_MAX_SIZE = 100;
private static final int EXPIRATION_TIME_MINUTE = 60;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it for every 60 minutes token gets evicted from cache? and cache is updated with new token in very next call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be in milis. That is more standard.

@ShwethaSNayak
Copy link
Contributor

Looks like client Junit test is failing SegmentInputStreamFactoryImplTest.unnecessary Mockito stubbings FAILED, Check https://github.com/pravega/pravega/actions/runs/3150227954/jobs/5122840362 for more details

@yaol7 yaol7 force-pushed the issue-6884-PSEARCH-perf-token-cache branch 3 times, most recently from 008c6e2 to 8b5caec Compare October 10, 2022 09:38
@yaol7 yaol7 force-pushed the issue-6884-PSEARCH-perf-token-cache branch from 6411a61 to f0faab0 Compare October 11, 2022 02:21
private final Controller controller;
private final ConnectionPool cp;
private SimpleCache<String, DelegationTokenProvider> tokenProviderCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

Copy link
Contributor

@RaulGracia RaulGracia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I have some comments:

  1. Client tests are failing because they are terminating with code 99. If you look to our guidelines (https://github.com/pravega/pravega/wiki/Contributing#builddebugging), this may indicate that at some point of your test some resource is being left open. Please, check this when you have the time.
  2. Have we executed a performance regression test with this change? That would be important to do before being safe when merging this PR.
  3. What would be the performance implication of this change when security is enabled compared to when it is not? Maybe we need to test Pravega with this change and security enabled.

public class SegmentInputStreamFactoryImpl implements SegmentInputStreamFactory {

private static final Logger log = LoggerFactory.getLogger(SegmentInputStreamFactoryImpl.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we use @Slf4j annotation that creates the logger transparently.

this.controller = controller;
this.cp = cp;
this.tokenProviderCache = new SimpleCache<>(CACHE_MAX_SIZE,
Duration.ofMillis(EXPIRATION_TIME_MILLIS), (k, v) -> log.info("key: {} is evicted from tokenProviderCache.", k));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this log to be info level? I think it may be too verbose, so please consider making it debug

public class SegmentInputStreamFactoryImpl implements SegmentInputStreamFactory {

private static final Logger log = LoggerFactory.getLogger(SegmentInputStreamFactoryImpl.class);
private static final int CACHE_MAX_SIZE = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know the heap space of the cache with 100 elements? Why 100 concretely? Do we have an idea of when elements are added or removed from this cache to ensure that this number is good enough?

@@ -88,4 +108,23 @@ public SegmentInputStream createInputStreamForSegment(Segment segment, Delegatio
async.getConnection();
return new SegmentInputStreamImpl(async, startOffset);
}

DelegationTokenProvider getDelegationTokenProvider(final String scope, final String streamName) {
final String scopedStreamName = NameUtils.getScopedStreamName(scope, streamName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how expensive this String operation is? This is being executed on a per-read basis? If so, we may want to think how to get rid of this string operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random Read Perf Tuning - Token Cache
4 participants