Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a load testing EPP client #2415

Merged
merged 9 commits into from
May 23, 2024

Conversation

sarahcaseybot
Copy link
Collaborator

@sarahcaseybot sarahcaseybot commented Apr 25, 2024

A lot of this code came from what was used for a past EPP load testing client written by @jianglai that can be found in Google3 at https://source.corp.google.com/piper///depot/google3/experimental/users/jianglai/proxy/java/google/registry/proxy/client/

I modified the old client to get it working again, to be open-source friendly and use Gradle.

For now, this only performs a login and logout command, I will further expand on this in later PRs to add other EPP commands so that we can truly load test the system.


This change is Reviewable

@sarahcaseybot sarahcaseybot changed the title Create a load testing EPP client. Create a load testing EPP client Apr 26, 2024
Copy link
Collaborator

@jianglai jianglai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 11 files at r1, 6 of 6 files at r2, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @sarahcaseybot)


build.gradle line 307 at r2 (raw file):

  afterEvaluate {
    if (rootProject.enableDependencyLocking.toBoolean()
        && (project.name != 'integration' && project.name != 'load-testing')) {

Any reason we don't want to do dependency looking for the load-testing project?


config/presubmits.py line 103 at r2 (raw file):

        "Source files must end in a newline.",

    # System.(out|err).println should only appear in tools/ or load-testing/

I think we can probably use loggers in the load testing client? There is no need to directly use System.(out|err).


load-testing/build.gradle line 36 at r2 (raw file):

distZip.shouldRunAfter(build)

Not sure why this is necessary? I think if you use createUberJar (mentioned below), you should be able to skip this task.


load-testing/build.gradle line 38 at r2 (raw file):

jar {
    manifest {

You can use the createUberJar function to create tasks that build the uber jar.


load-testing/src/main/java/google/registry/client/EppClient.java line 75 at r2 (raw file):

/** A simple EPP client that can be used for load testing. */
@Parameters(separators = " =")
@SuppressWarnings("FutureReturnValueIgnored")

We probably don't need this annotation (and all the unused return values) anymore. There were there because google3 has a linter that forces you to explicitly deal with return values.


load-testing/src/main/java/google/registry/client/EppClient.java line 183 at r2 (raw file):

  }

  static byte[] readBytesFromFile(String filename) {

Is this function used anywhere?

Also, all the static functions here can be private, right?


load-testing/src/main/java/google/registry/client/EppClient.java line 279 at r2 (raw file):

  public void run() {
    String outputFolder = DateTime.now(UTC).toString();
    int numberOfThreads = Runtime.getRuntime().availableProcessors();

In retrospect, it seems ill-advised to use all available cores for log-writing. We are probably better off using a thread pool here with a modest fixed number of cores here. We want to dedicate as many cores to network IO as possible (which NioEventLoopGroup does if you don't specify the thread pool size, I believe).


load-testing/src/main/java/google/registry/client/EppClientHandler.java line 82 at r2 (raw file):

    ctx.channel().attr(REQUEST_SENT).get().add(now);
    ctx.channel().attr(LOGGING_REQUEST_COMPLETE).set(ctx.executor().newPromise());
    if (ctx.channel().attr(LOGGING_REQUEST_COMPLETE).get() == null) {

I don't recall why I did this. Do you know...? We just set this channel attribute in the line above. It doesn't seem possible for it to be null now. Also if it is null, how can we call setSuccess() on it...?


load-testing/src/main/java/google/registry/client/ProxyHeaderHandler.java line 32 at r2 (raw file):

 */
@SuppressWarnings("FutureReturnValueIgnored")
public class ProxyHeaderHandler extends ChannelInboundHandlerAdapter {

I don't think this is necessary any more. Have you tried not installing this handler in the channel pipeline?


load-testing/src/main/java/google/registry/client/README.md line 58 at r2 (raw file):

    $ mkdir stage
    $ cp -r load-testing/build/distributions/load-testing/ stage/
    $ cp -r jdk-21_linux-x64_bin.tar.gz stage/

You are probably better off storing these files in GoB and writing Gradle tasks that take care off building an executable jar file that can be used as-is. Even better, maybe you want to create docker images instead of providing jars, which eliminate the need to configure Java on your client machines.

All I'm saying here is that if you want to productionize this client, it's worth it to do the initial heavy-lifting to make it as easy to use as possible, without the need to consult the README file and run a bunch of commands from it.


load-testing/src/main/java/google/registry/client/run.sh line 20 at r2 (raw file):

tar -xvf jdk-21_linux-x64_bin.tar.gz

jdk-21.0.2/bin/java -jar load-testing/load-testing.jar --host epp.registry-sandbox.google --certificate certificate.pem -k key.pem -ft

We certainly do not want to leak the endpoints here. Also, having a re-usable shell script that extracts the JDK installation every time certainly looks sub-optimal. You should either just expect Java to exist on the machines (because you have pre-configured the machines), or use a docker image that brings all of its dependencies with it.

Copy link
Collaborator Author

@sarahcaseybot sarahcaseybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 11 files reviewed, 13 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)


build.gradle line 307 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Any reason we don't want to do dependency looking for the load-testing project?

Dependency locking did not work when making a jar how I was doing it previously. Now that I am using createUberJar, I was able to add back dependency locking.


config/presubmits.py line 103 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

I think we can probably use loggers in the load testing client? There is no need to directly use System.(out|err).

Agreed that we can use loggers for the output of the epp commands, but I think we need to print to system output for the summary at the end of the load test, otherwise the summary info would be logged separately on the logs of each individual instance.


load-testing/build.gradle line 36 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Not sure why this is necessary? I think if you use createUberJar (mentioned below), you should be able to skip this task.

Yes, now that I figured out how createUberJar works, I was able to remove this


load-testing/build.gradle line 38 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

You can use the createUberJar function to create tasks that build the uber jar.

Done.


load-testing/src/main/java/google/registry/client/EppClient.java line 343 at r1 (raw file):

Previously, github-advanced-security[bot] wrote…

Unread local variable

Variable 'Future<?> unusedFuture' is never read.

Show more details

Done.


load-testing/src/main/java/google/registry/client/EppClient.java line 75 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

We probably don't need this annotation (and all the unused return values) anymore. There were there because google3 has a linter that forces you to explicitly deal with return values.

Nope, this was not from Google3. I added this annotation since it is needed for the Gradle build to pass since we have -Werror specified


load-testing/src/main/java/google/registry/client/EppClient.java line 183 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Is this function used anywhere?

Also, all the static functions here can be private, right?

done and removed, this isn't necessary anymore now that I'm reading the key and certificate file from input


load-testing/src/main/java/google/registry/client/EppClient.java line 279 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

In retrospect, it seems ill-advised to use all available cores for log-writing. We are probably better off using a thread pool here with a modest fixed number of cores here. We want to dedicate as many cores to network IO as possible (which NioEventLoopGroup does if you don't specify the thread pool size, I believe).

is 5 a good number of threads?


load-testing/src/main/java/google/registry/client/EppClientHandler.java line 82 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

I don't recall why I did this. Do you know...? We just set this channel attribute in the line above. It doesn't seem possible for it to be null now. Also if it is null, how can we call setSuccess() on it...?

hmm, this does look strange. Seems like it's just unnecessary lines, removed.


load-testing/src/main/java/google/registry/client/README.md line 58 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

You are probably better off storing these files in GoB and writing Gradle tasks that take care off building an executable jar file that can be used as-is. Even better, maybe you want to create docker images instead of providing jars, which eliminate the need to configure Java on your client machines.

All I'm saying here is that if you want to productionize this client, it's worth it to do the initial heavy-lifting to make it as easy to use as possible, without the need to consult the README file and run a bunch of commands from it.

Moved some of the files to GoB and created gradle tasks to run most of these commands. Updated the readme with the new instructions on how to run this.


load-testing/src/main/java/google/registry/client/ProxyHeaderHandler.java line 53 at r1 (raw file):

Previously, github-advanced-security[bot] wrote…

Use of default toString()

Default toString(): ProxyHeaderHandler inherits toString() from Object, and so is not suitable for printing.

Show more details

Done.


load-testing/src/main/java/google/registry/client/ProxyHeaderHandler.java line 32 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

I don't think this is necessary any more. Have you tried not installing this handler in the channel pipeline?

tried removing and things seem to still work


load-testing/src/main/java/google/registry/client/run.sh line 20 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

We certainly do not want to leak the endpoints here. Also, having a re-usable shell script that extracts the JDK installation every time certainly looks sub-optimal. You should either just expect Java to exist on the machines (because you have pre-configured the machines), or use a docker image that brings all of its dependencies with it.

moved this script to GoB and moved the jdk extraction into a step in the gradle build so it only needs to be done once

Copy link
Collaborator

@jianglai jianglai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)


build.gradle line 307 at r2 (raw file):

Previously, sarahcaseybot wrote…

Dependency locking did not work when making a jar how I was doing it previously. Now that I am using createUberJar, I was able to add back dependency locking.

You can remove the extra whitespace added and revert this file now.


load-testing/build.gradle line 38 at r3 (raw file):

task makeStagingDirectory {
    new File('stage').mkdirs()

better to make a staging directory under build, which will be cleaned up by the clean task.


load-testing/build.gradle line 41 at r3 (raw file):

}

task copyJarToStaging(dependsOn: makeStagingDirectory, type: Copy) {

You don't have to create a task to copy everything, just the include/exclude/filter pattern:

https://docs.gradle.org/current/dsl/org.gradle.api.tasks.Copy.html


load-testing/build.gradle line 63 at r3 (raw file):

task copyJavaToStaging(dependsOn: copyKeyToStaging, type: Copy) {
    from "${rootDir}/load-testing/jdk-21_linux-x64_bin.tar.gz"
    into "${rootDir}/stage"

I still don't understand why you'd want to package your own JDK (at the very least a JRE should suffice).

If you look at the other uber jars that we make, we never package JDKs with them. I think it is a reasonable assumption that the runtime will be provided by the environment. If you really want to make your client platform-independent, make it a Docker image. See buildToolImage.


load-testing/build.gradle line 79 at r3 (raw file):

task deployLoadTest(dependsOn: deployStagingToInstances) {
    doLast {println 'deploying load tests'}

Why the extra task? It doesn't do anything...?


load-testing/src/main/java/google/registry/client/EppClient.java line 343 at r1 (raw file):

Previously, sarahcaseybot wrote…

Done.

Well, that's unfortunate. It looks like GitHub code analysis doesn't understand unusedFuture.... I guess a class-level suppression is the way to go if you want to get rid of all these warnings, or you can just ignore GitHub, as this is not a blocking violation.


load-testing/src/main/java/google/registry/client/EppClient.java line 75 at r2 (raw file):

Previously, sarahcaseybot wrote…

Nope, this was not from Google3. I added this annotation since it is needed for the Gradle build to pass since we have -Werror specified

OK so it's the same error prone rule that checks for this condition. You can just assign the return value to a variable called unusedFuture to signal that it is intentionally unused. It's better than having a class-level suppression.

https://errorprone.info/bugpattern/FutureReturnValueIgnored


load-testing/src/main/java/google/registry/client/EppClient.java line 279 at r2 (raw file):

Previously, sarahcaseybot wrote…

is 5 a good number of threads?

I'd imagine a small number like 5 would suffice as writing logs is mostly disk IO-bound. It also depends on the machines that you run on, as you want to leave enough threads to do the network IO work.


load-testing/src/main/java/google/registry/client/EppClient.java line 287 at r3 (raw file):

              .group(eventLoopGroup)
              .channel(NioSocketChannel.class)
              .handler(makeChannelInitializer(outputFolder, ImmutableList.of(loggingExecutors)));

You still need a list of single thread executors, not a thread pool executor with a certain number of threads. Have you looked at the logs after this change? You should have seen logs from different network activities interwoven with each other.

With a list of single thread executors and each channel getting mapped to the same executor, logging jobs from the same channel submitted to the same executor will be processed in order (as there is only one thread to work).

If you instead of a thread pool, even though network activities are processed in order (and therefore the submission of the corresponding logging jobs is also in order), different threads in the pool might pick up these jobs and write to the same file at the same time.


load-testing/src/main/java/google/registry/client/EppClient.java line 316 at r3 (raw file):

            .closeFuture()
            .awaitUninterruptibly(
                Duration.ofSeconds(

What's the point of callingDuraing.ofSeconds().getSeconds()?


load-testing/src/main/java/google/registry/client/README.md line 51 at r3 (raw file):

* Run the load test. Configurations of the load test can be made by configuring 
this `run.sh` file locally and re-deploying.

I don't think you have a run.sh any more?


load-testing/src/main/java/google/registry/client/deploy.sh line 16 at r3 (raw file):

# limitations under the License.

HOSTS=$(gcloud compute instances list | awk '/^loadtest/ { print $5 }')

The file should be under load-testing.


load-testing/src/main/java/google/registry/client/deploy.sh line 17 at r3 (raw file):

HOSTS=$(gcloud compute instances list | awk '/^loadtest/ { print $5 }')
for host in $HOSTS; do ssh $host sudo apt-get -y install rsync; done

If you know you are going to use a set number of VMs, it is better to have a separate script to set up them once (like installing the JDK, installing rsync, etc), and leave this script to just do rsync.

Copy link
Collaborator

@jianglai jianglai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)


load-testing/src/main/java/google/registry/client/README.md line 51 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

I don't think you have a run.sh any more?

Oh, I see it's in GoB now, you can probably just copy paste the content and redact the password and hostnames here, so the public README is still useful.

@sarahcaseybot sarahcaseybot force-pushed the addEppClient branch 2 times, most recently from 2cbd484 to e42f7a9 Compare May 15, 2024 20:03
Copy link
Collaborator Author

@sarahcaseybot sarahcaseybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 15 files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)


load-testing/build.gradle line 38 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

better to make a staging directory under build, which will be cleaned up by the clean task.

Done.


load-testing/build.gradle line 41 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

You don't have to create a task to copy everything, just the include/exclude/filter pattern:

https://docs.gradle.org/current/dsl/org.gradle.api.tasks.Copy.html

For some reason, the include/exclude pattern did not seem to work when I tried it. I was able to combine the copies into a single task though. Needed to include a specified base directory first https://discuss.gradle.org/t/multiple-copies-in-a-single-copy-task-works-a-bit-weird/23187


load-testing/build.gradle line 63 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

I still don't understand why you'd want to package your own JDK (at the very least a JRE should suffice).

If you look at the other uber jars that we make, we never package JDKs with them. I think it is a reasonable assumption that the runtime will be provided by the environment. If you really want to make your client platform-independent, make it a Docker image. See buildToolImage.

I modified this as suggested below to instead have a one time run set up script for the instances that configures java 21.


load-testing/build.gradle line 79 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Why the extra task? It doesn't do anything...?

I figured this would be a nice "trigger" task to trigger the deployment of the string of tasks, but if that's unusual I'll remove it. - removed


load-testing/src/main/java/google/registry/client/EppClient.java line 343 at r1 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Well, that's unfortunate. It looks like GitHub code analysis doesn't understand unusedFuture.... I guess a class-level suppression is the way to go if you want to get rid of all these warnings, or you can just ignore GitHub, as this is not a blocking violation.

I think I would prefer to keep the class level supression to avoid these annoying github warnings in the future


load-testing/src/main/java/google/registry/client/EppClient.java line 75 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

OK so it's the same error prone rule that checks for this condition. You can just assign the return value to a variable called unusedFuture to signal that it is intentionally unused. It's better than having a class-level suppression.

https://errorprone.info/bugpattern/FutureReturnValueIgnored

Done.


load-testing/src/main/java/google/registry/client/EppClient.java line 279 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

I'd imagine a small number like 5 would suffice as writing logs is mostly disk IO-bound. It also depends on the machines that you run on, as you want to leave enough threads to do the network IO work.

Done.


load-testing/src/main/java/google/registry/client/EppClient.java line 287 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

You still need a list of single thread executors, not a thread pool executor with a certain number of threads. Have you looked at the logs after this change? You should have seen logs from different network activities interwoven with each other.

With a list of single thread executors and each channel getting mapped to the same executor, logging jobs from the same channel submitted to the same executor will be processed in order (as there is only one thread to work).

If you instead of a thread pool, even though network activities are processed in order (and therefore the submission of the corresponding logging jobs is also in order), different threads in the pool might pick up these jobs and write to the same file at the same time.

Done.


load-testing/src/main/java/google/registry/client/EppClient.java line 316 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

What's the point of callingDuraing.ofSeconds().getSeconds()?

It wasn't calling Duration.ofSecond().getSeconds(), it was calling Duration.ofSeconds(seconds - durution.getSeconds()). However, I changed it to multiply the TIMEOUT_SECONDS by 1000 to make it less confusing.


load-testing/src/main/java/google/registry/client/README.md line 51 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Oh, I see it's in GoB now, you can probably just copy paste the content and redact the password and hostnames here, so the public README is still useful.

Done.


build.gradle line 307 at r2 (raw file):

Previously, jianglai (Lai Jiang) wrote…

You can remove the extra whitespace added and revert this file now.

Done.


load-testing/src/main/java/google/registry/client/deploy.sh line 16 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

The file should be under load-testing.

Done.


load-testing/src/main/java/google/registry/client/deploy.sh line 17 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

If you know you are going to use a set number of VMs, it is better to have a separate script to set up them once (like installing the JDK, installing rsync, etc), and leave this script to just do rsync.

Done.

Copy link
Collaborator

@jianglai jianglai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r4, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)


load-testing/build.gradle line 38 at r3 (raw file):

Previously, sarahcaseybot wrote…

Done.

I'd use a Gradle variable like layout.buildDirectory instead of hardcoding build.

https://docs.gradle.org/current/userguide/working_with_files.html


load-testing/build.gradle line 41 at r3 (raw file):

Previously, sarahcaseybot wrote…

For some reason, the include/exclude pattern did not seem to work when I tried it. I was able to combine the copies into a single task though. Needed to include a specified base directory first https://discuss.gradle.org/t/multiple-copies-in-a-single-copy-task-works-a-bit-weird/23187

I think you can just do {projectDir}, instead of ${rootDir}/load-testing, or use the layout class.

https://cs.opensource.google/search?q=projectdir&sq=&ss=nomulus%2Fnomulus

By hardcoding the directory name, you are making it harder to change it in the future, if you ever want to.


load-testing/build.gradle line 54 at r4 (raw file):

task deployLoadTestsToInstances (dependsOn: copyFilesToStaging, type: Exec) {
    executable "sh"
    workingDir "${rootDir}/"

working dir can be the project dir, and you just call ./deploy.sh from it.


load-testing/instanceSetUp.sh line 27 at r4 (raw file):

  do
    for i in {1..60}; do
      if ssh $host sudo apt-get -y install rsync; then

For newly created VMs, you need to run apt-get update && apt-get upgrade first to make sure your apt database and the installed software is up-to-date, before calling apt-get install.

Also, I'd do everything in one ssh session, instead of connection every time to execute one command. Something like:

ssh $host "sudo apt-get -y update && sudo apt-get -y upgrade && sudo apt-get -y install rsync && mkdir -p test-client

Is it common to have to retry? What's the reason for the transient failure?


load-testing/src/main/java/google/registry/client/EppClient.java line 316 at r3 (raw file):

Previously, sarahcaseybot wrote…

It wasn't calling Duration.ofSecond().getSeconds(), it was calling Duration.ofSeconds(seconds - durution.getSeconds()). However, I changed it to multiply the TIMEOUT_SECONDS by 1000 to make it less confusing.

Oh, duh.


load-testing/src/main/java/google/registry/client/README.md line 8 at r4 (raw file):

### Setting up the test instances
* Download the Java 21 JDK jdk-21_linux-x64_bin.tar.gz (note that if this version 
is no longer available, you'll need to modify the instanceSetUp.sh script(it 

Just use apt-get to install OpenJDK 21. If the instance OS is reasonably up-to-date, there is no reason why this won't work.


load-testing/src/main/java/google/registry/client/README.md line 61 at r4 (raw file):

    ```shell
    $ HOSTS=$(gcloud compute instances list | awk '/^loadtest/ { print $5 }')
    $ for host in $HOSTS; do ssh $host test-client/run.sh; done

Since all of your scripts are operating in batch mode (connecting to all the VMs and repeating the same operations on each of them), why is run.sh only operating in a single machine? Why not put the loop in run.sh as well?

Copy link
Collaborator Author

@sarahcaseybot sarahcaseybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 15 files reviewed, 14 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)


load-testing/build.gradle line 38 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

I'd use a Gradle variable like layout.buildDirectory instead of hardcoding build.

https://docs.gradle.org/current/userguide/working_with_files.html

Done.


load-testing/build.gradle line 41 at r3 (raw file):

Previously, jianglai (Lai Jiang) wrote…

I think you can just do {projectDir}, instead of ${rootDir}/load-testing, or use the layout class.

https://cs.opensource.google/search?q=projectdir&sq=&ss=nomulus%2Fnomulus

By hardcoding the directory name, you are making it harder to change it in the future, if you ever want to.

Done.


load-testing/instanceSetUp.sh line 27 at r4 (raw file):

Previously, jianglai (Lai Jiang) wrote…

For newly created VMs, you need to run apt-get update && apt-get upgrade first to make sure your apt database and the installed software is up-to-date, before calling apt-get install.

Also, I'd do everything in one ssh session, instead of connection every time to execute one command. Something like:

ssh $host "sudo apt-get -y update && sudo apt-get -y upgrade && sudo apt-get -y install rsync && mkdir -p test-client

Is it common to have to retry? What's the reason for the transient failure?

The retry was necessary since the instances created on line 18 take a while, so the ssh was failing when the instances were not done being created. Just adding an && to line 18 was still causing failures since I guess the command returns a success before the instances are actually done being created


load-testing/src/main/java/google/registry/client/README.md line 8 at r4 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Just use apt-get to install OpenJDK 21. If the instance OS is reasonably up-to-date, there is no reason why this won't work.

apt-get install does not seem to work, error message: 'Package "openjdk-21-jdk" is not available, but is referred to by another package." Used Wget instead to download from the web and install.


load-testing/src/main/java/google/registry/client/README.md line 61 at r4 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Since all of your scripts are operating in batch mode (connecting to all the VMs and repeating the same operations on each of them), why is run.sh only operating in a single machine? Why not put the loop in run.sh as well?

Done.

Copy link
Collaborator

@jianglai jianglai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r5, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)


load-testing/instanceSetUp.sh line 27 at r4 (raw file):

Previously, sarahcaseybot wrote…

The retry was necessary since the instances created on line 18 take a while, so the ssh was failing when the instances were not done being created. Just adding an && to line 18 was still causing failures since I guess the command returns a success before the instances are actually done being created

That's too bad. I guess it's not too big of a deal, given that these scripts are only run once.

Perhaps you can sleep for a while before trying to ssh, if you know it's going to take some time for the machines to be provisioned after the gcloud command returns anyway?


load-testing/instanceSetUp.sh line 27 at r5 (raw file):

  do
    for i in {1..60}; do
      if ssh $host 'sudo apt-get -y update &&

Does SSH work for you directly? I tried to create a test VM but it I can't connect to its external IP directly. I had to use gcloud compute ssh to connect.


load-testing/src/main/java/google/registry/client/README.md line 8 at r4 (raw file):

Previously, sarahcaseybot wrote…

apt-get install does not seem to work, error message: 'Package "openjdk-21-jdk" is not available, but is referred to by another package." Used Wget instead to download from the web and install.

That's because the default OS image is Debian, and JDK 21 is still in Debian testing. If you use Ubuntu, you can install openjdk-21-jdk.

gcloud compute instances create test-ubuntu --machine-type n4-standard-16 --image-family ubuntu-2204-lts --image-project ubuntu-os-cloud --zone us-east4-a

Also, I'd recommend using a higher-speced machine. g1-small is really slow and running apt-get takes forever. For load testing you'd want to not be limited by your client.

This code is mostly based off of what was used for a past EPP load testing client that can be found in Google3 at https://source.corp.google.com/piper///depot/google3/experimental/users/jianglai/proxy/java/google/registry/proxy/client/

I modified the old client to be open-source friendly and use Gradle.

For now, this only performs a login and logout command, I will further expand on this in later PRs to add other EPP commands so that we can truly load test the system.
Copy link
Collaborator Author

@sarahcaseybot sarahcaseybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 15 files reviewed, 5 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)


load-testing/instanceSetUp.sh line 27 at r4 (raw file):

Previously, jianglai (Lai Jiang) wrote…

That's too bad. I guess it's not too big of a deal, given that these scripts are only run once.

Perhaps you can sleep for a while before trying to ssh, if you know it's going to take some time for the machines to be provisioned after the gcloud command returns anyway?

I don't think it's a good idea to rely on sleep since it could variable how long the instances take to be created. I think retry is the best option here. Added a comment for why retry is needed


load-testing/instanceSetUp.sh line 27 at r5 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Does SSH work for you directly? I tried to create a test VM but it I can't connect to its external IP directly. I had to use gcloud compute ssh to connect.

ssh works fine for me


load-testing/src/main/java/google/registry/client/README.md line 8 at r4 (raw file):

Previously, jianglai (Lai Jiang) wrote…

That's because the default OS image is Debian, and JDK 21 is still in Debian testing. If you use Ubuntu, you can install openjdk-21-jdk.

gcloud compute instances create test-ubuntu --machine-type n4-standard-16 --image-family ubuntu-2204-lts --image-project ubuntu-os-cloud --zone us-east4-a

Also, I'd recommend using a higher-speced machine. g1-small is really slow and running apt-get takes forever. For load testing you'd want to not be limited by your client.

When I tried changing the machine type, I was hit with this error saying I had exceeded the CPU limit.

ERROR: (gcloud.compute.instances.create) Could not fetch resource:

  • Quota 'CPUS_PER_VM_FAMILY' exceeded. Limit: 24.0 in region us-east4.
    metric name = compute.googleapis.com/cpus_per_vm_family
    limit name = CPUS-PER-VM-FAMILY-per-project-region
    limit = 24.0
    dimensions = region: us-east4
    vm_family: N4
    Try your request in another zone, or view documentation on how to increase quotas: https://cloud.google.com/compute/quotas.

Since the most pertinent issues we have been experiencing in production show a qps of only 30 requests/second. Hoepefully for now we can get enough information from these smaller instances. Later on, it looks like there may be a process for increasing our quota, or I was able to successfully create one VM, it failed on the second VM, so perhaps I could create multiple VMs each in a different region.

Copy link
Collaborator

@jianglai jianglai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r6, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Github-advanced-security[bot] and @sarahcaseybot)


load-testing/instanceSetUp.sh line 27 at r4 (raw file):

Previously, sarahcaseybot wrote…

I don't think it's a good idea to rely on sleep since it could variable how long the instances take to be created. I think retry is the best option here. Added a comment for why retry is needed

Oh, I meant to sleep before trying ssh, in addition to the retry. If you know it's going to take some time for the machines to be ready, there's no point trying it right away.


load-testing/instanceSetUp.sh line 27 at r5 (raw file):

Previously, sarahcaseybot wrote…

ssh works fine for me

It's likely that you have something configured already. There's no reason why it would work directly without configuring the keys or other authentication mechanisms. Can you try on a different machine, perhaps your cloudtop?


load-testing/src/main/java/google/registry/client/README.md line 8 at r4 (raw file):

Previously, sarahcaseybot wrote…

When I tried changing the machine type, I was hit with this error saying I had exceeded the CPU limit.

ERROR: (gcloud.compute.instances.create) Could not fetch resource:

  • Quota 'CPUS_PER_VM_FAMILY' exceeded. Limit: 24.0 in region us-east4.
    metric name = compute.googleapis.com/cpus_per_vm_family
    limit name = CPUS-PER-VM-FAMILY-per-project-region
    limit = 24.0
    dimensions = region: us-east4
    vm_family: N4
    Try your request in another zone, or view documentation on how to increase quotas: https://cloud.google.com/compute/quotas.

Since the most pertinent issues we have been experiencing in production show a qps of only 30 requests/second. Hoepefully for now we can get enough information from these smaller instances. Later on, it looks like there may be a process for increasing our quota, or I was able to successfully create one VM, it failed on the second VM, so perhaps I could create multiple VMs each in a different region.

These annoying quotas are per-zone, not per machine. So it's in your interest to get machines with better CPUs that have more processing power per core. But I agree that 30 QPS is not a tall order.

The process to request a quota increase is quite opaque. You fill in a form and it either gets auto approved or rejected. In order to appeal you have to contact your "sales representative", which we don't have. I think your best bet is to use more zones, should you need more power than one zone can provide.

Copy link
Collaborator Author

@sarahcaseybot sarahcaseybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 13 of 15 files reviewed, 5 unresolved discussions (waiting on @Github-advanced-security[bot] and @jianglai)


load-testing/instanceSetUp.sh line 27 at r4 (raw file):

Previously, jianglai (Lai Jiang) wrote…

Oh, I meant to sleep before trying ssh, in addition to the retry. If you know it's going to take some time for the machines to be ready, there's no point trying it right away.

Done.


load-testing/instanceSetUp.sh line 27 at r5 (raw file):

Previously, jianglai (Lai Jiang) wrote…

It's likely that you have something configured already. There's no reason why it would work directly without configuring the keys or other authentication mechanisms. Can you try on a different machine, perhaps your cloudtop?

Oh, now I remember, I needed to add my public key here: https://pantheon.corp.google.com/compute/metadata?resourceTab=sshkeys&project=domain-registry-sandbox&scopeTab=projectMetadata

added this to the readme

@sarahcaseybot sarahcaseybot added this pull request to the merge queue May 23, 2024
Merged via the queue into google:master with commit 0781010 May 23, 2024
8 of 9 checks passed
@sarahcaseybot sarahcaseybot deleted the addEppClient branch May 23, 2024 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants