-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a load testing EPP client #2415
Conversation
load-testing/src/main/java/google/registry/client/ProxyHeaderHandler.java
Fixed
Show fixed
Hide fixed
0b300f0
to
57a4af0
Compare
57a4af0
to
26cb815
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 11 files at r1, 6 of 6 files at r2, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @sarahcaseybot)
build.gradle
line 307 at r2 (raw file):
afterEvaluate { if (rootProject.enableDependencyLocking.toBoolean() && (project.name != 'integration' && project.name != 'load-testing')) {
Any reason we don't want to do dependency looking for the load-testing
project?
config/presubmits.py
line 103 at r2 (raw file):
"Source files must end in a newline.", # System.(out|err).println should only appear in tools/ or load-testing/
I think we can probably use loggers in the load testing client? There is no need to directly use System.(out|err)
.
load-testing/build.gradle
line 36 at r2 (raw file):
distZip.shouldRunAfter(build)
Not sure why this is necessary? I think if you use createUberJar
(mentioned below), you should be able to skip this task.
load-testing/build.gradle
line 38 at r2 (raw file):
jar { manifest {
You can use the createUberJar
function to create tasks that build the uber jar.
load-testing/src/main/java/google/registry/client/EppClient.java
line 75 at r2 (raw file):
/** A simple EPP client that can be used for load testing. */ @Parameters(separators = " =") @SuppressWarnings("FutureReturnValueIgnored")
We probably don't need this annotation (and all the unused return values) anymore. There were there because google3 has a linter that forces you to explicitly deal with return values.
load-testing/src/main/java/google/registry/client/EppClient.java
line 183 at r2 (raw file):
} static byte[] readBytesFromFile(String filename) {
Is this function used anywhere?
Also, all the static functions here can be private
, right?
load-testing/src/main/java/google/registry/client/EppClient.java
line 279 at r2 (raw file):
public void run() { String outputFolder = DateTime.now(UTC).toString(); int numberOfThreads = Runtime.getRuntime().availableProcessors();
In retrospect, it seems ill-advised to use all available cores for log-writing. We are probably better off using a thread pool here with a modest fixed number of cores here. We want to dedicate as many cores to network IO as possible (which NioEventLoopGroup
does if you don't specify the thread pool size, I believe).
load-testing/src/main/java/google/registry/client/EppClientHandler.java
line 82 at r2 (raw file):
ctx.channel().attr(REQUEST_SENT).get().add(now); ctx.channel().attr(LOGGING_REQUEST_COMPLETE).set(ctx.executor().newPromise()); if (ctx.channel().attr(LOGGING_REQUEST_COMPLETE).get() == null) {
I don't recall why I did this. Do you know...? We just set this channel attribute in the line above. It doesn't seem possible for it to be null
now. Also if it is null
, how can we call setSuccess()
on it...?
load-testing/src/main/java/google/registry/client/ProxyHeaderHandler.java
line 32 at r2 (raw file):
*/ @SuppressWarnings("FutureReturnValueIgnored") public class ProxyHeaderHandler extends ChannelInboundHandlerAdapter {
I don't think this is necessary any more. Have you tried not installing this handler in the channel pipeline?
load-testing/src/main/java/google/registry/client/README.md
line 58 at r2 (raw file):
$ mkdir stage $ cp -r load-testing/build/distributions/load-testing/ stage/ $ cp -r jdk-21_linux-x64_bin.tar.gz stage/
You are probably better off storing these files in GoB and writing Gradle tasks that take care off building an executable jar file that can be used as-is. Even better, maybe you want to create docker images instead of providing jars, which eliminate the need to configure Java on your client machines.
All I'm saying here is that if you want to productionize this client, it's worth it to do the initial heavy-lifting to make it as easy to use as possible, without the need to consult the README
file and run a bunch of commands from it.
load-testing/src/main/java/google/registry/client/run.sh
line 20 at r2 (raw file):
tar -xvf jdk-21_linux-x64_bin.tar.gz jdk-21.0.2/bin/java -jar load-testing/load-testing.jar --host epp.registry-sandbox.google --certificate certificate.pem -k key.pem -ft
We certainly do not want to leak the endpoints here. Also, having a re-usable shell script that extracts the JDK installation every time certainly looks sub-optimal. You should either just expect Java to exist on the machines (because you have pre-configured the machines), or use a docker image that brings all of its dependencies with it.
4662731
to
91a66b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 11 files reviewed, 13 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)
build.gradle
line 307 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Any reason we don't want to do dependency looking for the
load-testing
project?
Dependency locking did not work when making a jar how I was doing it previously. Now that I am using createUberJar, I was able to add back dependency locking.
config/presubmits.py
line 103 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
I think we can probably use loggers in the load testing client? There is no need to directly use
System.(out|err)
.
Agreed that we can use loggers for the output of the epp commands, but I think we need to print to system output for the summary at the end of the load test, otherwise the summary info would be logged separately on the logs of each individual instance.
load-testing/build.gradle
line 36 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Not sure why this is necessary? I think if you use
createUberJar
(mentioned below), you should be able to skip this task.
Yes, now that I figured out how createUberJar works, I was able to remove this
load-testing/build.gradle
line 38 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
You can use the
createUberJar
function to create tasks that build the uber jar.
Done.
load-testing/src/main/java/google/registry/client/EppClient.java
line 343 at r1 (raw file):
Previously, github-advanced-security[bot] wrote…
Unread local variable
Variable 'Future<?> unusedFuture' is never read.
Done.
load-testing/src/main/java/google/registry/client/EppClient.java
line 75 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
We probably don't need this annotation (and all the unused return values) anymore. There were there because google3 has a linter that forces you to explicitly deal with return values.
Nope, this was not from Google3. I added this annotation since it is needed for the Gradle build to pass since we have -Werror specified
load-testing/src/main/java/google/registry/client/EppClient.java
line 183 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Is this function used anywhere?
Also, all the static functions here can be
private
, right?
done and removed, this isn't necessary anymore now that I'm reading the key and certificate file from input
load-testing/src/main/java/google/registry/client/EppClient.java
line 279 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
In retrospect, it seems ill-advised to use all available cores for log-writing. We are probably better off using a thread pool here with a modest fixed number of cores here. We want to dedicate as many cores to network IO as possible (which
NioEventLoopGroup
does if you don't specify the thread pool size, I believe).
is 5 a good number of threads?
load-testing/src/main/java/google/registry/client/EppClientHandler.java
line 82 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
I don't recall why I did this. Do you know...? We just set this channel attribute in the line above. It doesn't seem possible for it to be
null
now. Also if it isnull
, how can we callsetSuccess()
on it...?
hmm, this does look strange. Seems like it's just unnecessary lines, removed.
load-testing/src/main/java/google/registry/client/README.md
line 58 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
You are probably better off storing these files in GoB and writing Gradle tasks that take care off building an executable jar file that can be used as-is. Even better, maybe you want to create docker images instead of providing jars, which eliminate the need to configure Java on your client machines.
All I'm saying here is that if you want to productionize this client, it's worth it to do the initial heavy-lifting to make it as easy to use as possible, without the need to consult the
README
file and run a bunch of commands from it.
Moved some of the files to GoB and created gradle tasks to run most of these commands. Updated the readme with the new instructions on how to run this.
load-testing/src/main/java/google/registry/client/ProxyHeaderHandler.java
line 53 at r1 (raw file):
Previously, github-advanced-security[bot] wrote…
Use of default toString()
Default toString(): ProxyHeaderHandler inherits toString() from Object, and so is not suitable for printing.
Done.
load-testing/src/main/java/google/registry/client/ProxyHeaderHandler.java
line 32 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
I don't think this is necessary any more. Have you tried not installing this handler in the channel pipeline?
tried removing and things seem to still work
load-testing/src/main/java/google/registry/client/run.sh
line 20 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
We certainly do not want to leak the endpoints here. Also, having a re-usable shell script that extracts the JDK installation every time certainly looks sub-optimal. You should either just expect Java to exist on the machines (because you have pre-configured the machines), or use a docker image that brings all of its dependencies with it.
moved this script to GoB and moved the jdk extraction into a step in the gradle build so it only needs to be done once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)
build.gradle
line 307 at r2 (raw file):
Previously, sarahcaseybot wrote…
Dependency locking did not work when making a jar how I was doing it previously. Now that I am using createUberJar, I was able to add back dependency locking.
You can remove the extra whitespace added and revert this file now.
load-testing/build.gradle
line 38 at r3 (raw file):
task makeStagingDirectory { new File('stage').mkdirs()
better to make a staging directory under build
, which will be cleaned up by the clean
task.
load-testing/build.gradle
line 41 at r3 (raw file):
} task copyJarToStaging(dependsOn: makeStagingDirectory, type: Copy) {
You don't have to create a task to copy everything, just the include/exclude/filter pattern:
https://docs.gradle.org/current/dsl/org.gradle.api.tasks.Copy.html
load-testing/build.gradle
line 63 at r3 (raw file):
task copyJavaToStaging(dependsOn: copyKeyToStaging, type: Copy) { from "${rootDir}/load-testing/jdk-21_linux-x64_bin.tar.gz" into "${rootDir}/stage"
I still don't understand why you'd want to package your own JDK (at the very least a JRE should suffice).
If you look at the other uber jars that we make, we never package JDKs with them. I think it is a reasonable assumption that the runtime will be provided by the environment. If you really want to make your client platform-independent, make it a Docker image. See buildToolImage
.
load-testing/build.gradle
line 79 at r3 (raw file):
task deployLoadTest(dependsOn: deployStagingToInstances) { doLast {println 'deploying load tests'}
Why the extra task? It doesn't do anything...?
load-testing/src/main/java/google/registry/client/EppClient.java
line 343 at r1 (raw file):
Previously, sarahcaseybot wrote…
Done.
Well, that's unfortunate. It looks like GitHub code analysis doesn't understand unusedFuture
.... I guess a class-level suppression is the way to go if you want to get rid of all these warnings, or you can just ignore GitHub, as this is not a blocking violation.
load-testing/src/main/java/google/registry/client/EppClient.java
line 75 at r2 (raw file):
Previously, sarahcaseybot wrote…
Nope, this was not from Google3. I added this annotation since it is needed for the Gradle build to pass since we have -Werror specified
OK so it's the same error prone rule that checks for this condition. You can just assign the return value to a variable called unusedFuture
to signal that it is intentionally unused. It's better than having a class-level suppression.
https://errorprone.info/bugpattern/FutureReturnValueIgnored
load-testing/src/main/java/google/registry/client/EppClient.java
line 279 at r2 (raw file):
Previously, sarahcaseybot wrote…
is 5 a good number of threads?
I'd imagine a small number like 5 would suffice as writing logs is mostly disk IO-bound. It also depends on the machines that you run on, as you want to leave enough threads to do the network IO work.
load-testing/src/main/java/google/registry/client/EppClient.java
line 287 at r3 (raw file):
.group(eventLoopGroup) .channel(NioSocketChannel.class) .handler(makeChannelInitializer(outputFolder, ImmutableList.of(loggingExecutors)));
You still need a list of single thread executors, not a thread pool executor with a certain number of threads. Have you looked at the logs after this change? You should have seen logs from different network activities interwoven with each other.
With a list of single thread executors and each channel getting mapped to the same executor, logging jobs from the same channel submitted to the same executor will be processed in order (as there is only one thread to work).
If you instead of a thread pool, even though network activities are processed in order (and therefore the submission of the corresponding logging jobs is also in order), different threads in the pool might pick up these jobs and write to the same file at the same time.
load-testing/src/main/java/google/registry/client/EppClient.java
line 316 at r3 (raw file):
.closeFuture() .awaitUninterruptibly( Duration.ofSeconds(
What's the point of callingDuraing.ofSeconds().getSeconds()?
load-testing/src/main/java/google/registry/client/README.md
line 51 at r3 (raw file):
* Run the load test. Configurations of the load test can be made by configuring this `run.sh` file locally and re-deploying.
I don't think you have a run.sh
any more?
load-testing/src/main/java/google/registry/client/deploy.sh
line 16 at r3 (raw file):
# limitations under the License. HOSTS=$(gcloud compute instances list | awk '/^loadtest/ { print $5 }')
The file should be under load-testing
.
load-testing/src/main/java/google/registry/client/deploy.sh
line 17 at r3 (raw file):
HOSTS=$(gcloud compute instances list | awk '/^loadtest/ { print $5 }') for host in $HOSTS; do ssh $host sudo apt-get -y install rsync; done
If you know you are going to use a set number of VMs, it is better to have a separate script to set up them once (like installing the JDK, installing rsync, etc), and leave this script to just do rsync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)
load-testing/src/main/java/google/registry/client/README.md
line 51 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
I don't think you have a
run.sh
any more?
Oh, I see it's in GoB now, you can probably just copy paste the content and redact the password and hostnames here, so the public README is still useful.
2cbd484
to
e42f7a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 15 files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)
load-testing/build.gradle
line 38 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
better to make a staging directory under
build
, which will be cleaned up by theclean
task.
Done.
load-testing/build.gradle
line 41 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
You don't have to create a task to copy everything, just the include/exclude/filter pattern:
https://docs.gradle.org/current/dsl/org.gradle.api.tasks.Copy.html
For some reason, the include/exclude pattern did not seem to work when I tried it. I was able to combine the copies into a single task though. Needed to include a specified base directory first https://discuss.gradle.org/t/multiple-copies-in-a-single-copy-task-works-a-bit-weird/23187
load-testing/build.gradle
line 63 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
I still don't understand why you'd want to package your own JDK (at the very least a JRE should suffice).
If you look at the other uber jars that we make, we never package JDKs with them. I think it is a reasonable assumption that the runtime will be provided by the environment. If you really want to make your client platform-independent, make it a Docker image. See
buildToolImage
.
I modified this as suggested below to instead have a one time run set up script for the instances that configures java 21.
load-testing/build.gradle
line 79 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Why the extra task? It doesn't do anything...?
I figured this would be a nice "trigger" task to trigger the deployment of the string of tasks, but if that's unusual I'll remove it. - removed
load-testing/src/main/java/google/registry/client/EppClient.java
line 343 at r1 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Well, that's unfortunate. It looks like GitHub code analysis doesn't understand
unusedFuture
.... I guess a class-level suppression is the way to go if you want to get rid of all these warnings, or you can just ignore GitHub, as this is not a blocking violation.
I think I would prefer to keep the class level supression to avoid these annoying github warnings in the future
load-testing/src/main/java/google/registry/client/EppClient.java
line 75 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
OK so it's the same error prone rule that checks for this condition. You can just assign the return value to a variable called
unusedFuture
to signal that it is intentionally unused. It's better than having a class-level suppression.
Done.
load-testing/src/main/java/google/registry/client/EppClient.java
line 279 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
I'd imagine a small number like 5 would suffice as writing logs is mostly disk IO-bound. It also depends on the machines that you run on, as you want to leave enough threads to do the network IO work.
Done.
load-testing/src/main/java/google/registry/client/EppClient.java
line 287 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
You still need a list of single thread executors, not a thread pool executor with a certain number of threads. Have you looked at the logs after this change? You should have seen logs from different network activities interwoven with each other.
With a list of single thread executors and each channel getting mapped to the same executor, logging jobs from the same channel submitted to the same executor will be processed in order (as there is only one thread to work).
If you instead of a thread pool, even though network activities are processed in order (and therefore the submission of the corresponding logging jobs is also in order), different threads in the pool might pick up these jobs and write to the same file at the same time.
Done.
load-testing/src/main/java/google/registry/client/EppClient.java
line 316 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
What's the point of calling
Duraing.ofSeconds().getSeconds()?
It wasn't calling Duration.ofSecond().getSeconds(), it was calling Duration.ofSeconds(seconds - durution.getSeconds()). However, I changed it to multiply the TIMEOUT_SECONDS by 1000 to make it less confusing.
load-testing/src/main/java/google/registry/client/README.md
line 51 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Oh, I see it's in GoB now, you can probably just copy paste the content and redact the password and hostnames here, so the public README is still useful.
Done.
build.gradle
line 307 at r2 (raw file):
Previously, jianglai (Lai Jiang) wrote…
You can remove the extra whitespace added and revert this file now.
Done.
load-testing/src/main/java/google/registry/client/deploy.sh
line 16 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
The file should be under
load-testing
.
Done.
load-testing/src/main/java/google/registry/client/deploy.sh
line 17 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
If you know you are going to use a set number of VMs, it is better to have a separate script to set up them once (like installing the JDK, installing rsync, etc), and leave this script to just do rsync.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r4, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)
load-testing/build.gradle
line 38 at r3 (raw file):
Previously, sarahcaseybot wrote…
Done.
I'd use a Gradle variable like layout.buildDirectory
instead of hardcoding build
.
https://docs.gradle.org/current/userguide/working_with_files.html
load-testing/build.gradle
line 41 at r3 (raw file):
Previously, sarahcaseybot wrote…
For some reason, the include/exclude pattern did not seem to work when I tried it. I was able to combine the copies into a single task though. Needed to include a specified base directory first https://discuss.gradle.org/t/multiple-copies-in-a-single-copy-task-works-a-bit-weird/23187
I think you can just do {projectDir}
, instead of ${rootDir}/load-testing
, or use the layout
class.
https://cs.opensource.google/search?q=projectdir&sq=&ss=nomulus%2Fnomulus
By hardcoding the directory name, you are making it harder to change it in the future, if you ever want to.
load-testing/build.gradle
line 54 at r4 (raw file):
task deployLoadTestsToInstances (dependsOn: copyFilesToStaging, type: Exec) { executable "sh" workingDir "${rootDir}/"
working dir can be the project dir, and you just call ./deploy.sh
from it.
load-testing/instanceSetUp.sh
line 27 at r4 (raw file):
do for i in {1..60}; do if ssh $host sudo apt-get -y install rsync; then
For newly created VMs, you need to run apt-get update && apt-get upgrade
first to make sure your apt database and the installed software is up-to-date, before calling apt-get install
.
Also, I'd do everything in one ssh session, instead of connection every time to execute one command. Something like:
ssh $host "sudo apt-get -y update && sudo apt-get -y upgrade && sudo apt-get -y install rsync && mkdir -p test-client
Is it common to have to retry? What's the reason for the transient failure?
load-testing/src/main/java/google/registry/client/EppClient.java
line 316 at r3 (raw file):
Previously, sarahcaseybot wrote…
It wasn't calling Duration.ofSecond().getSeconds(), it was calling Duration.ofSeconds(seconds - durution.getSeconds()). However, I changed it to multiply the TIMEOUT_SECONDS by 1000 to make it less confusing.
Oh, duh.
load-testing/src/main/java/google/registry/client/README.md
line 8 at r4 (raw file):
### Setting up the test instances * Download the Java 21 JDK jdk-21_linux-x64_bin.tar.gz (note that if this version is no longer available, you'll need to modify the instanceSetUp.sh script(it
Just use apt-get to install OpenJDK 21. If the instance OS is reasonably up-to-date, there is no reason why this won't work.
load-testing/src/main/java/google/registry/client/README.md
line 61 at r4 (raw file):
```shell $ HOSTS=$(gcloud compute instances list | awk '/^loadtest/ { print $5 }') $ for host in $HOSTS; do ssh $host test-client/run.sh; done
Since all of your scripts are operating in batch mode (connecting to all the VMs and repeating the same operations on each of them), why is run.sh
only operating in a single machine? Why not put the loop in run.sh
as well?
e42f7a9
to
97f5421
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 15 files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)
load-testing/build.gradle
line 38 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
I'd use a Gradle variable like
layout.buildDirectory
instead of hardcodingbuild
.https://docs.gradle.org/current/userguide/working_with_files.html
Done.
load-testing/build.gradle
line 41 at r3 (raw file):
Previously, jianglai (Lai Jiang) wrote…
I think you can just do
{projectDir}
, instead of${rootDir}/load-testing
, or use thelayout
class.https://cs.opensource.google/search?q=projectdir&sq=&ss=nomulus%2Fnomulus
By hardcoding the directory name, you are making it harder to change it in the future, if you ever want to.
Done.
load-testing/instanceSetUp.sh
line 27 at r4 (raw file):
Previously, jianglai (Lai Jiang) wrote…
For newly created VMs, you need to run
apt-get update && apt-get upgrade
first to make sure your apt database and the installed software is up-to-date, before callingapt-get install
.Also, I'd do everything in one ssh session, instead of connection every time to execute one command. Something like:
ssh $host "sudo apt-get -y update && sudo apt-get -y upgrade && sudo apt-get -y install rsync && mkdir -p test-client
Is it common to have to retry? What's the reason for the transient failure?
The retry was necessary since the instances created on line 18 take a while, so the ssh was failing when the instances were not done being created. Just adding an && to line 18 was still causing failures since I guess the command returns a success before the instances are actually done being created
load-testing/src/main/java/google/registry/client/README.md
line 8 at r4 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Just use apt-get to install OpenJDK 21. If the instance OS is reasonably up-to-date, there is no reason why this won't work.
apt-get install does not seem to work, error message: 'Package "openjdk-21-jdk" is not available, but is referred to by another package." Used Wget instead to download from the web and install.
load-testing/src/main/java/google/registry/client/README.md
line 61 at r4 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Since all of your scripts are operating in batch mode (connecting to all the VMs and repeating the same operations on each of them), why is
run.sh
only operating in a single machine? Why not put the loop inrun.sh
as well?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r5, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)
load-testing/instanceSetUp.sh
line 27 at r4 (raw file):
Previously, sarahcaseybot wrote…
The retry was necessary since the instances created on line 18 take a while, so the ssh was failing when the instances were not done being created. Just adding an && to line 18 was still causing failures since I guess the command returns a success before the instances are actually done being created
That's too bad. I guess it's not too big of a deal, given that these scripts are only run once.
Perhaps you can sleep for a while before trying to ssh, if you know it's going to take some time for the machines to be provisioned after the gcloud
command returns anyway?
load-testing/instanceSetUp.sh
line 27 at r5 (raw file):
do for i in {1..60}; do if ssh $host 'sudo apt-get -y update &&
Does SSH work for you directly? I tried to create a test VM but it I can't connect to its external IP directly. I had to use gcloud compute ssh
to connect.
load-testing/src/main/java/google/registry/client/README.md
line 8 at r4 (raw file):
Previously, sarahcaseybot wrote…
apt-get install does not seem to work, error message: 'Package "openjdk-21-jdk" is not available, but is referred to by another package." Used Wget instead to download from the web and install.
That's because the default OS image is Debian, and JDK 21 is still in Debian testing. If you use Ubuntu, you can install openjdk-21-jdk
.
gcloud compute instances create test-ubuntu --machine-type n4-standard-16 --image-family ubuntu-2204-lts --image-project ubuntu-os-cloud --zone us-east4-a
Also, I'd recommend using a higher-speced machine. g1-small is really slow and running apt-get takes forever. For load testing you'd want to not be limited by your client.
This code is mostly based off of what was used for a past EPP load testing client that can be found in Google3 at https://source.corp.google.com/piper///depot/google3/experimental/users/jianglai/proxy/java/google/registry/proxy/client/ I modified the old client to be open-source friendly and use Gradle. For now, this only performs a login and logout command, I will further expand on this in later PRs to add other EPP commands so that we can truly load test the system.
97f5421
to
a235918
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 15 files reviewed, 5 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)
load-testing/instanceSetUp.sh
line 27 at r4 (raw file):
Previously, jianglai (Lai Jiang) wrote…
That's too bad. I guess it's not too big of a deal, given that these scripts are only run once.
Perhaps you can sleep for a while before trying to ssh, if you know it's going to take some time for the machines to be provisioned after the
gcloud
command returns anyway?
I don't think it's a good idea to rely on sleep since it could variable how long the instances take to be created. I think retry is the best option here. Added a comment for why retry is needed
load-testing/instanceSetUp.sh
line 27 at r5 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Does SSH work for you directly? I tried to create a test VM but it I can't connect to its external IP directly. I had to use
gcloud compute ssh
to connect.
ssh works fine for me
load-testing/src/main/java/google/registry/client/README.md
line 8 at r4 (raw file):
Previously, jianglai (Lai Jiang) wrote…
That's because the default OS image is Debian, and JDK 21 is still in Debian testing. If you use Ubuntu, you can install
openjdk-21-jdk
.
gcloud compute instances create test-ubuntu --machine-type n4-standard-16 --image-family ubuntu-2204-lts --image-project ubuntu-os-cloud --zone us-east4-a
Also, I'd recommend using a higher-speced machine. g1-small is really slow and running apt-get takes forever. For load testing you'd want to not be limited by your client.
When I tried changing the machine type, I was hit with this error saying I had exceeded the CPU limit.
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Quota 'CPUS_PER_VM_FAMILY' exceeded. Limit: 24.0 in region us-east4.
metric name = compute.googleapis.com/cpus_per_vm_family
limit name = CPUS-PER-VM-FAMILY-per-project-region
limit = 24.0
dimensions = region: us-east4
vm_family: N4
Try your request in another zone, or view documentation on how to increase quotas: https://cloud.google.com/compute/quotas.
Since the most pertinent issues we have been experiencing in production show a qps of only 30 requests/second. Hoepefully for now we can get enough information from these smaller instances. Later on, it looks like there may be a process for increasing our quota, or I was able to successfully create one VM, it failed on the second VM, so perhaps I could create multiple VMs each in a different region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r6, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)
load-testing/instanceSetUp.sh
line 27 at r4 (raw file):
Previously, sarahcaseybot wrote…
I don't think it's a good idea to rely on sleep since it could variable how long the instances take to be created. I think retry is the best option here. Added a comment for why retry is needed
Oh, I meant to sleep before trying ssh, in addition to the retry. If you know it's going to take some time for the machines to be ready, there's no point trying it right away.
load-testing/instanceSetUp.sh
line 27 at r5 (raw file):
Previously, sarahcaseybot wrote…
ssh works fine for me
It's likely that you have something configured already. There's no reason why it would work directly without configuring the keys or other authentication mechanisms. Can you try on a different machine, perhaps your cloudtop?
load-testing/src/main/java/google/registry/client/README.md
line 8 at r4 (raw file):
Previously, sarahcaseybot wrote…
When I tried changing the machine type, I was hit with this error saying I had exceeded the CPU limit.
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Quota 'CPUS_PER_VM_FAMILY' exceeded. Limit: 24.0 in region us-east4.
metric name = compute.googleapis.com/cpus_per_vm_family
limit name = CPUS-PER-VM-FAMILY-per-project-region
limit = 24.0
dimensions = region: us-east4
vm_family: N4
Try your request in another zone, or view documentation on how to increase quotas: https://cloud.google.com/compute/quotas.Since the most pertinent issues we have been experiencing in production show a qps of only 30 requests/second. Hoepefully for now we can get enough information from these smaller instances. Later on, it looks like there may be a process for increasing our quota, or I was able to successfully create one VM, it failed on the second VM, so perhaps I could create multiple VMs each in a different region.
These annoying quotas are per-zone, not per machine. So it's in your interest to get machines with better CPUs that have more processing power per core. But I agree that 30 QPS is not a tall order.
The process to request a quota increase is quite opaque. You fill in a form and it either gets auto approved or rejected. In order to appeal you have to contact your "sales representative", which we don't have. I think your best bet is to use more zones, should you need more power than one zone can provide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 13 of 15 files reviewed, 5 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)
load-testing/instanceSetUp.sh
line 27 at r4 (raw file):
Previously, jianglai (Lai Jiang) wrote…
Oh, I meant to sleep before trying ssh, in addition to the retry. If you know it's going to take some time for the machines to be ready, there's no point trying it right away.
Done.
load-testing/instanceSetUp.sh
line 27 at r5 (raw file):
Previously, jianglai (Lai Jiang) wrote…
It's likely that you have something configured already. There's no reason why it would work directly without configuring the keys or other authentication mechanisms. Can you try on a different machine, perhaps your cloudtop?
Oh, now I remember, I needed to add my public key here: https://pantheon.corp.google.com/compute/metadata?resourceTab=sshkeys&project=domain-registry-sandbox&scopeTab=projectMetadata
added this to the readme
A lot of this code came from what was used for a past EPP load testing client written by @jianglai that can be found in Google3 at https://source.corp.google.com/piper///depot/google3/experimental/users/jianglai/proxy/java/google/registry/proxy/client/
I modified the old client to get it working again, to be open-source friendly and use Gradle.
For now, this only performs a login and logout command, I will further expand on this in later PRs to add other EPP commands so that we can truly load test the system.
This change is