Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance when creating new records #432

Open
fdocr opened this issue Feb 24, 2023 · 7 comments
Open

Slow performance when creating new records #432

fdocr opened this issue Feb 24, 2023 · 7 comments

Comments

@fdocr
Copy link

fdocr commented Feb 24, 2023

I've been running some tests where I insert new records of a Turn model on an endpoint and I'm seeing degraded performance. I'm opening this issue where I would hope maybe there's a way to work around this, or perhaps an opportunity to identify and fix this problem for the shard. I implemented the same DB logic with amberframework/granite to compare and the performance is much better there. I prefer jennifer overall for a couple of reasons, so it would be great to find a solution for this issue.

class Turn < Jennifer::Model::Base
  with_timestamps

  mapping(
    id: Primary64,
    game_id: String,
    snake_id: String,
    context: String,
    path: String,
    dead: Bool,
    created_at: Time?,
    updated_at: Time?,
  )
end

I added Opentelemetry tracing for the slow endpoint and looks like this:

post "/move" do |env|
  OpenTelemetry.trace "move-trace" do |span|
    span["debug-attr"] = "move-trace"
    context = BattleSnake::Context.from_json(env.params.json.to_json)
  
    OpenTelemetry.trace "context persist" do |span|
      span["debug-attr"] = "persist"
      Turn.create(
          game_id: context.game.id,
          snake_id: context.you.id,
          context: env.params.json.to_json,
          path: env.request.path,
          dead: false
        )
    end

    OpenTelemetry.trace "context persist" do |span|
      span["debug-attr"] = "move-logic"

      res = CUSTOM_LOGIC_CALL
      res.to_json
    end
  end
end

The records are being persisted correctly and I'm seeing this DEBUG log from the database

2023-02-24T18:50:33.310554Z DEBUG - db: 747.958 μs INSERT INTO "turns"("game_id", "snake_id", "context", "path", "dead", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6, $7)  RETURNING "id" | ["86494598-a6ee-4317-933d-3fb81e2f4cbc", "5d58224d-cc49-4ba9-b65f-cfec2edb32f4", "small", "/end", false, 2023-02-24 18:50:33.309634000 UTC, 2023-02-24 18:50:33.309634000 UTC]
2023-02-24T18:50:33.310561Z DEBUG - db: 0.458 μs COMMIT

This tells me the DB roundtrip is quick (within < 1ms) but response times are coming back in around 1,000 ms. With tracing I can see these results:

Screen Shot 2023-02-24 at 12 53 03

A simple endpoint with a straightforward query like Turn.all.to_a is much quicker (~6ms responses with ~2.5 ms DB log times). So based on all of this my assumption is that there is some overhead in the model that's being created. I'd love to help further identify this, but I'd need help understanding how can I go about this (open for pairing or running more/different tests if needed).

@crimson-knight
Copy link
Contributor

I find the debug logger gives a considerable performance hit, first try turning that down to a warning level or error level and re-run.

The biggest performance hits in cases like this are usually when generating and serving data to a console. The output is handy, but comes at a performance cost similar to what you're seeing.

I just did a test with Jennifer and Kemal where I made 2,000,000 db entries, exported to a file and saved to disk and that took under 36 seconds on my M1 Pro.

@fdocr
Copy link
Author

fdocr commented Feb 25, 2023

Confirmed the log output wasn't causing the issue. Some more of my tracing also confirms that when separated (Turn.create vs .build and then .save) it's the .save and not .build that's taking up a long time.

I realized this is a connection pool size issue. I found this comment on Jennifer::Adapter::Base that could be what I'm experiencing. The tests I'm running are quick requests coming in by a CLI tool and there's commonly 2 in parallel at a time. Adding conf.pool_size = (ENV["DB_POOL"] ||= "5").to_i to the database configuration solved my problem.

Since I also had a granite implementation here's the performance comparison between them on a similar test for both of them.

Jennifer

Screen Shot 2023-02-25 at 04 56 40

Granite

Screen Shot 2023-02-25 at 04 59 50

Turns out Jennifer performs better than Granite 😄

I wonder if it's worth changing the default connection pool size from 1 to something like ActiveRecord's default of 5 (I believe that's right but might be something different). Regardless of this, I guess it's important to always configure it via ENV var for web server configurability in prod environments anyways.

@crimson-knight
Copy link
Contributor

I'll go ahead and make some updates to the docs to clarify the connection pool. I've also run into some challenges with the config initializer and ensuring it's using the test db when running crystal spec so I think I'll add in the niceties I've come up with for that

@crimson-knight
Copy link
Contributor

@fdocr I made some changes to the docs to help others avoid the performance bottleneck you experienced. Check out #430

@fdocr
Copy link
Author

fdocr commented Mar 5, 2023

That's awesome. Thank you @crimson-knight

@imdrasil
Copy link
Owner

Ok, it sounds that technically you don't have performance issue anymore but the docs should be updated to make it clear you need to configure connection pool size (@crimson-knight thank you for this).
Initially connection pool size was changed to 1 to make it easier to debug potential people issues and make it more reliable (previously I had some issues with concurrent requests). Maybe the time has come and it worth increasing it to something bigger (like 5). What do you think @crimson-knight @cyangle @fdocr

@crimson-knight
Copy link
Contributor

Let's change the default pool size to 5, that makes sense to me.

Let's also add a note that if they need to debug potential concurrent request problems to adjust the pool size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants