Which is used for BERT training benchmark #84

LeoZhao-Habana · 2019-05-22T07:13:34Z

Which script is used for BERT training benchmark, I see there are 2 kind of script, one is for pre-train, e.g train.py, the other is for fine tuning, e.g. run_classify.py.
Which one is used for benchmark?

LeoZhao-Habana · 2019-05-22T07:13:51Z

@luotao1

luotao1 · 2019-05-22T10:33:28Z

We use run_classify.py.

LeoZhao-Habana · 2019-05-23T08:56:04Z

@luotao1 For ParallelExecutor, how to calculate benchmark result by your QA team? is it "speed * CPU_NUM" or just speed?

LeoZhao-Habana · 2019-05-28T06:29:03Z

@luotao1 For ParallelExecutor, how to calculate benchmark result by your QA team? is it "speed * CPU_NUM" or just speed?

@luotao1 Any feedback on this question?

luotao1 · 2019-05-28T06:48:32Z

1 CPU_NUM: speed xxx
16 CPU_NUM: speed xxx

We don't use speed * CPU_NUM, which is for throughput.

LeoZhao-Habana · 2019-05-28T06:53:42Z

then how to measure if speed is comparable with V100 ?
e.g. V100: BS=1 speed 3.4steps/s,
Xeon: BS=1 8 CPU_NUM: speed 0.43 steps/s

Are they identical?

luotao1 · 2019-05-28T07:43:18Z

It is not identical.
BS=1 CPU_NUM=8: speed 0.43 steps/s, means: BS=1 CPU_NUM=1, speed 0.43/8 steps/s?
And the speed may be not linear with CPU_NUM increases.
You can give the result: BS=1 CPU_NUM=ALL

LeoZhao-Habana · 2019-05-28T08:06:47Z

Yes, speed is not linear with CPU_NUM, but I checked code, and find this speed reflects iteration execution time, not really processed samples. It means:
for each iteration, the processed samples is actually batchsize * CPU_NUM.
I can confirm this.

So my question is for cpu vs. GPU, we may not compare data directly on speed output from log, given CPU_NUM is a virtual concept to use CPU multi-cores , and used to utilize data parallelism, while GPU need discrete card to extend multi-node. This speed is more like latency,

We can give different speed with different CPU_NUM, but how to compare them with GPU fairly, that is what I want to ask.

luotao1 · 2019-05-29T04:24:36Z

but how to compare them with GPU fairly, that is what I want to ask.

how about compute samples/s to compare between CPU and GPU?

LeoZhao-Habana · 2019-05-29T05:27:01Z

I see this calculation logic in benchmark run.sh by use samples/s, it counts both CPU_NUM, BS.
I think it makes more sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which is used for BERT training benchmark #84

Which is used for BERT training benchmark #84

LeoZhao-Habana commented May 22, 2019

LeoZhao-Habana commented May 22, 2019

luotao1 commented May 22, 2019

LeoZhao-Habana commented May 23, 2019 •

edited

LeoZhao-Habana commented May 28, 2019

luotao1 commented May 28, 2019

LeoZhao-Habana commented May 28, 2019

luotao1 commented May 28, 2019

LeoZhao-Habana commented May 28, 2019 •

edited

luotao1 commented May 29, 2019

LeoZhao-Habana commented May 29, 2019

Which is used for BERT training benchmark #84

Which is used for BERT training benchmark #84

Comments

LeoZhao-Habana commented May 22, 2019

LeoZhao-Habana commented May 22, 2019

luotao1 commented May 22, 2019

LeoZhao-Habana commented May 23, 2019 • edited

LeoZhao-Habana commented May 28, 2019

luotao1 commented May 28, 2019

LeoZhao-Habana commented May 28, 2019

luotao1 commented May 28, 2019

LeoZhao-Habana commented May 28, 2019 • edited

luotao1 commented May 29, 2019

LeoZhao-Habana commented May 29, 2019

LeoZhao-Habana commented May 23, 2019 •

edited

LeoZhao-Habana commented May 28, 2019 •

edited