Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-3100 : Introducing CustomIndexArray to speedup HashMap lookups #2711

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

roshannaik
Copy link
Contributor

@roshannaik roshannaik commented Jun 10, 2018

  • Added Unit Tests
  • Ran below perf numbers on my mid 2015 macbook pro for single worker.

Ran these perf topos to check perf impact:

1) ConstSpoutIdBoltNullBoltTopo With 1 Acker

Before: Throughput= 1,067,128 /sec . Latency = 1.038 ms
After: Throughput= 1,294,511 /sec . Latency = 1.037 ms

  • Throughput improved ~21%. No signifiant change in latency.

2) ConstSpoutIdBoltNullBoltTopo With 0 Acker

Before: Throughput= 4,389,133
After: Throughput= 4,748,744 /sec

  • Throughput improved ~8%.

3) TVL : --rate 350000 --spouts 1 --splitters 2 --counters 2 -c topology.max.spout.pending=500 -c topology.acker.executors=2 -c topology.disable.loadaware.messaging=true

No significant change observed in Throughput or Latency.

@roshannaik roshannaik force-pushed the STORM-3100 branch 2 times, most recently from bc9cbb7 to 9d875a0 Compare June 14, 2018 09:14
@roshannaik
Copy link
Contributor Author

Looks like tester_bolt_metrics.py under storm-core is failing. I see that it can be run via mvn clojure:test under storm-core. But unable to figure out what really going wrong. Any help would be appreciated. thanks.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Jun 19, 2018

@roshannaik
It looks like Zk issue but not 100% sure. If it is happening intermittently we might be fine. Could you setup Travis to your fork and run build couple of times?

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good. I have a number of nits/questions Mostly about the tests.

Integer lowerIndex = Integer.MAX_VALUE;
Integer upperIndex = Integer.MIN_VALUE;

for (Map.Entry<Integer, T> entry : src.entrySet()) { // calculate smallest & largest indexes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would src.keySet() be simpler because we don't use the values?


this.elements = makeNullInitializedArray(elemCount);

for (Map.Entry<Integer, T> entry : src.entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would calling set improve code reuse a little?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, calling set() directly instead of elements.set() would have been nice..But normally I dont call instance methods from constructors since the object is not considered as fully initialized until the constructor has finished.

/**
* Always returns true as this cannot be empty.
*/
public boolean isEmpty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this at all. It does not implement collection so we don't need it.

/**
* Returns the index range as a Set.
*/
public Set<Integer> indices() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we are keeping the order of the indices here, but it is not documented in the javadocs, and it is not needed, as the only place this is used is for doing a set difference.

It would also be nice to indicate that this is not a cheap call so people don't abuse it calling it a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sue.. I just felt its intuitive and nice to maintain the integers in order even if not absolutely necessary. Updating the javadoc.

@@ -63,7 +64,7 @@
public static Tuple generateTestTuple(String source, String index, String type, String id) {
TopologyBuilder builder = new TopologyBuilder();
GeneralTopologyContext topologyContext = new GeneralTopologyContext(builder.createTopology(),
new Config(), new HashMap<>(), new HashMap<>(), new HashMap<>(), "") {
new Config(), new CustomIndexArray<String>(0,1), new HashMap<>(), new HashMap<>(), "") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space.

@@ -56,7 +57,7 @@ public static Tuple generateTestTuple(final String source,
GeneralTopologyContext topologyContext = new GeneralTopologyContext(
builder.createTopology(),
new Config(),
new HashMap<>(),
new CustomIndexArray<String>(0,1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: java should be able to figure out that it is a String from the required type for GeneralTopologyContext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mostly for readability .. when reading this code its unclear what the deduced type will be without further checking the parameter types for the constructor. Happy to change if you prefer the other way.

@@ -56,7 +57,7 @@ public static Tuple generateTestTuple(final String source,
GeneralTopologyContext topologyContext = new GeneralTopologyContext(
builder.createTopology(),
new Config(),
new HashMap<>(),
new CustomIndexArray<String>(0,1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why does this need to be 0,1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just any non empty range. will change it to (0,0)

@@ -69,8 +70,8 @@ public void testTimeFormat() {
}

private TopologyContext createTopologyContext(Map<String, Object> topoConf) {
Map<Integer, String> taskToComponent = new HashMap<Integer, String>();
taskToComponent.put(7, "Xcom");
CustomIndexArray<String> taskToComponent = new CustomIndexArray<>(0,8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0,8 if all we are setting is 7?

@@ -112,15 +112,15 @@ public Task(Executor executor, Integer taskId) throws IOException {
public List<Integer> getOutgoingTasks(Integer outTaskId, String stream, List<Object> values) {
if (debug) {
LOG.info("Emitting direct: {}; {} {} {} ", outTaskId, componentId, stream, values);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think grouping check should happen even if debug is not enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was part of an older PR. From what I recall ... on profiling I had noticed that the grouping check was expensive in the critical path due to the fact that it needed lookups in three (now down to 2) hashmaps : streamComponentToGrouper & componentGrouping. Since neither were keyed on Integer, the CustomIndexArray style optimization was not possible.

@@ -660,8 +661,8 @@ public static Tuple testTuple(List<Object> values, MkTupleParam param) {
}
}

Map<Integer, String> taskToComp = new HashMap<>();
taskToComp.put(task, component);
CustomIndexArray<String> taskToComp = new CustomIndexArray<>(task,task+1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lower and upper are inclusive why do we need task, task+1?

@roshannaik
Copy link
Contributor Author

Sorry for not being able to address the comments sooner. Have been traveling and should be able to get back to this soon after jul 12 th.

@HeartSaVioR
Copy link
Contributor

@roshannaik

https://travis-ci.org/apache/storm/jobs/403972342
https://travis-ci.org/apache/storm/jobs/403972347
https://travis-ci.org/apache/storm/jobs/403972348
https://travis-ci.org/apache/storm/jobs/403972349

Travis CI build is failing and failures look like related to the changeset. Could you check above build result and track?

@HeartSaVioR
Copy link
Contributor

Looks like the build still fails and relevant to this patch.

@roshannaik
Copy link
Contributor Author

roshannaik commented Aug 3, 2018

Spent sometime trying to triage this failure but unable to figure out the cause. i know the issue has to do with this patch as it passes w/o this patch. I think it will take me some more time to get to the bottom of this. I did notice another (kind of significant) performance issue which is simple to address. Given that this is a relatively minor optimization, I am planning to provide a fix for that one first, and then revert back to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants