Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Prover CLI): requeue cmd #1719

Merged
merged 49 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
ab2a88d
Add requeue queries
Apr 18, 2024
aa85977
Add requeue cmd implementation
Apr 18, 2024
477e0a2
Support requeue cmd
Apr 18, 2024
c67626d
Merge branch 'main' into requeue_witness_generator_jobs
ilitteri Apr 23, 2024
ea3a97f
Reimplement requeue command logic
ilitteri Apr 25, 2024
c763b62
Add queries for requeuing batches
ilitteri Apr 25, 2024
28b225d
Merge branch 'requeue_witness_generator_jobs' of github.com:matter-la…
ilitteri Apr 25, 2024
49ad8c4
Add missing files
ilitteri Apr 25, 2024
d64522a
Remove unused queries
ilitteri Apr 25, 2024
bf1dd81
Cleanup imports
ilitteri Apr 25, 2024
29ab540
Delete deprecated query
ilitteri Apr 25, 2024
e1ed166
Delete deprecated query
ilitteri Apr 25, 2024
ecd8c59
Delete deprecated query
ilitteri Apr 25, 2024
5b3eb94
Fix query
ilitteri Apr 25, 2024
e7f2438
Merge branch 'requeue_witness_generator_jobs' of github.com:matter-la…
ilitteri Apr 25, 2024
d76dcec
Improve query
ilitteri Apr 25, 2024
53f3871
Fix query
ilitteri Apr 25, 2024
4d510f2
Add requeue batch query in prover_jobs_fri DAL
ilitteri Apr 25, 2024
1aa9af5
Requeue prover jobs
ilitteri Apr 25, 2024
428963d
Merge branch 'main' into requeue_witness_generator_jobs
ilitteri Apr 25, 2024
71beadd
Merge branch 'main' of github.com:matter-labs/zksync-era into requeue…
ilitteri Apr 30, 2024
5e7ad60
Merge branch 'main' into requeue_witness_generator_jobs
ilitteri May 2, 2024
6d88ba7
Fix merge
ilitteri May 2, 2024
6077487
Merge branch 'requeue_witness_generator_jobs' of github.com:matter-la…
ilitteri May 2, 2024
9afdfb0
Fix typo
ilitteri May 2, 2024
605a2f3
Revert Cargo.lock changes
ilitteri May 2, 2024
b73a9df
Revert Cargo.lock changes
ilitteri May 2, 2024
a4ffb88
Merge branch 'main' into requeue_witness_generator_jobs
ilitteri May 2, 2024
8aad53d
Handle recursion tip aggregation round
ilitteri May 3, 2024
03f3128
Update queries
ilitteri May 3, 2024
9edad00
Update deps
ilitteri May 3, 2024
167d3fa
Merge branch 'requeue_witness_generator_jobs' of github.com:matter-la…
ilitteri May 3, 2024
ec38dfd
Merge branch 'main' of github.com:matter-labs/zksync-era into requeue…
ilitteri May 7, 2024
014c88d
Add circuit_id to StuckJob struct
ilitteri May 7, 2024
7567e34
Add Cargo.lock
ilitteri May 7, 2024
02a0010
Merge branch 'main' into requeue_witness_generator_jobs
ilitteri May 8, 2024
595f1a5
Use singletoon connection pool
ilitteri May 8, 2024
3a279f0
Merge branch 'main' of github.com:matter-labs/zksync-era into requeue…
ilitteri May 8, 2024
10a076a
Merge branch 'main' into requeue_witness_generator_jobs
ColoCarletti May 9, 2024
7cc5a2b
Restore Cargo.lock
ilitteri May 9, 2024
ba7ecac
Merge branch 'main' into requeue_witness_generator_jobs
ColoCarletti May 10, 2024
10202e5
Fix requeue for stuck scheduler jobs
ilitteri May 10, 2024
71ddbeb
Fix requeue for stuck recursion tip jobs
ilitteri May 10, 2024
8494044
Fix requeue for stuck witness inputs jobs
ilitteri May 10, 2024
fc153c5
Merge branch 'main' into requeue_witness_generator_jobs
ilitteri May 13, 2024
8bb3223
Merge branch 'main' into requeue_witness_generator_jobs
ilitteri May 13, 2024
e9e82fc
Merge branch 'main' into requeue_witness_generator_jobs
ilitteri May 13, 2024
00a7514
Merge branch 'main' of github.com:matter-labs/zksync-era into requeue…
ilitteri May 14, 2024
7c2c02e
Merge branch 'main' into requeue_witness_generator_jobs
ilitteri May 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions core/lib/basic_types/src/prover_dal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ pub struct StuckJobs {
pub id: u64,
pub status: String,
pub attempts: u64,
pub circuit_id: Option<u32>,
}

// TODO (PLA-774): Redundant structure, should be replaced with `std::net::SocketAddr`.
Expand Down
4 changes: 3 additions & 1 deletion prover/prover_cli/src/cli.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use clap::{command, Args, Parser, Subcommand};
use zksync_types::url::SensitiveUrl;

use crate::commands::{self, delete, get_file_info, restart};
use crate::commands::{self, delete, get_file_info, requeue, restart};

pub const VERSION_STRING: &str = env!("CARGO_PKG_VERSION");

Expand Down Expand Up @@ -32,6 +32,7 @@ enum ProverCommand {
Delete(delete::Args),
#[command(subcommand)]
Status(commands::StatusCommand),
Requeue(requeue::Args),
Restart(restart::Args),
}

Expand All @@ -41,6 +42,7 @@ pub async fn start() -> anyhow::Result<()> {
ProverCommand::FileInfo(args) => get_file_info::run(args).await?,
ProverCommand::Delete(args) => delete::run(args).await?,
ProverCommand::Status(cmd) => cmd.run(config).await?,
ProverCommand::Requeue(args) => requeue::run(args, config).await?,
ProverCommand::Restart(args) => restart::run(args).await?,
};

Expand Down
1 change: 1 addition & 0 deletions prover/prover_cli/src/commands/mod.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
pub(crate) mod delete;
pub(crate) mod get_file_info;
pub(crate) mod requeue;
pub(crate) mod restart;
pub(crate) mod status;

Expand Down
87 changes: 87 additions & 0 deletions prover/prover_cli/src/commands/requeue.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
use anyhow::Context;
use clap::Args as ClapArgs;
use prover_dal::{ConnectionPool, Prover, ProverDal};
use zksync_types::{basic_fri_types::AggregationRound, prover_dal::StuckJobs, L1BatchNumber};

use crate::cli::ProverCLIConfig;

#[derive(ClapArgs)]
pub struct Args {
#[clap(short, long)]
batch: L1BatchNumber,
/// Maximum number of attempts to re-queue a job.
/// Default value is 10.
/// NOTE: this argument is temporary and will be deprecated once the `config` command is implemented.
#[clap(long, default_value_t = 10)]
max_attempts: u32,
}

pub async fn run(args: Args, config: ProverCLIConfig) -> anyhow::Result<()> {
let pool = ConnectionPool::<Prover>::singleton(config.db_url)
.build()
.await
.context("failed to build a prover_connection_pool")?;

let mut conn = pool
.connection()
.await
.context("failed to acquire a connection")?;

let mut fri_witness_generator_dal = conn.fri_witness_generator_dal();

let stuck_witness_input_jobs = fri_witness_generator_dal
.requeue_stuck_witness_inputs_jobs_for_batch(args.batch, args.max_attempts)
.await;
display_requeued_stuck_jobs(stuck_witness_input_jobs, AggregationRound::BasicCircuits);

let stuck_leaf_aggregations_stuck_jobs = fri_witness_generator_dal
.requeue_stuck_leaf_aggregation_jobs_for_batch(args.batch, args.max_attempts)
.await;
display_requeued_stuck_jobs(
stuck_leaf_aggregations_stuck_jobs,
AggregationRound::LeafAggregation,
);

let stuck_node_aggregations_jobs = fri_witness_generator_dal
.requeue_stuck_node_aggregation_jobs_for_batch(args.batch, args.max_attempts)
.await;
display_requeued_stuck_jobs(
stuck_node_aggregations_jobs,
AggregationRound::NodeAggregation,
);

let stuck_recursion_tip_job = fri_witness_generator_dal
.requeue_stuck_recursion_tip_jobs_for_batch(args.batch, args.max_attempts)
.await;
display_requeued_stuck_jobs(stuck_recursion_tip_job, AggregationRound::RecursionTip);

let stuck_scheduler_jobs = fri_witness_generator_dal
.requeue_stuck_scheduler_jobs_for_batch(args.batch, args.max_attempts)
.await;
display_requeued_stuck_jobs(stuck_scheduler_jobs, AggregationRound::Scheduler);

let stuck_proof_compressor_jobs = conn
.fri_proof_compressor_dal()
.requeue_stuck_jobs_for_batch(args.batch, args.max_attempts)
.await;
for stuck_job in stuck_proof_compressor_jobs {
println!("Re-queuing proof compressor job {stuck_job:?} 🔁",);
}

let stuck_prover_jobs = conn
.fri_prover_jobs_dal()
.requeue_stuck_jobs_for_batch(args.batch, args.max_attempts)
.await;

for stuck_job in stuck_prover_jobs {
ilitteri marked this conversation as resolved.
Show resolved Hide resolved
println!("Re-queuing prover job {stuck_job:?} 🔁",);
}

Ok(())
}

fn display_requeued_stuck_jobs(stuck_jobs: Vec<StuckJobs>, aggregation_round: AggregationRound) {
for stuck_job in stuck_jobs {
println!("Re-queuing {aggregation_round} stuck job {stuck_job:?} 🔁",);
}
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

41 changes: 41 additions & 0 deletions prover/prover_dal/src/fri_proof_compressor_dal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@ impl FriProofCompressorDal<'_, '_> {
id: row.l1_batch_number as u64,
status: row.status,
attempts: row.attempts as u64,
circuit_id: None,
})
.collect()
}
Expand Down Expand Up @@ -374,4 +375,44 @@ impl FriProofCompressorDal<'_, '_> {
.execute(self.storage.conn())
.await
}

pub async fn requeue_stuck_jobs_for_batch(
&mut self,
block_number: L1BatchNumber,
max_attempts: u32,
) -> Vec<StuckJobs> {
{
sqlx::query!(
r#"
UPDATE proof_compression_jobs_fri
SET
status = 'queued',
error = 'Manually requeued',
attempts = 2,
updated_at = NOW(),
processing_started_at = NOW()
WHERE
l1_batch_number = $1
AND attempts >= $2
AND (status = 'in_progress' OR status = 'failed')
RETURNING
status,
attempts
"#,
i64::from(block_number.0),
max_attempts as i32,
)
.fetch_all(self.storage.conn())
.await
.unwrap()
.into_iter()
.map(|row| StuckJobs {
id: block_number.0 as u64,
status: row.status,
attempts: row.attempts as u64,
circuit_id: None,
})
.collect()
}
}
}