Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The estRows is inaccurate when there is data skew. #53178

Open
yedushusheng opened this issue May 10, 2024 · 2 comments
Open

The estRows is inaccurate when there is data skew. #53178

yedushusheng opened this issue May 10, 2024 · 2 comments

Comments

@yedushusheng
Copy link

yedushusheng commented May 10, 2024

Bug Report

1. Minimal reproduce step (Required)

  1. create table like this:
    CREATE TABLE tb (
    id int(11) NOT NULL,
    f1 int(11) DEFAULT NULL,
    f2 int(11) DEFAULT NULL,
    f3 int(11) DEFAULT NULL,
    PRIMARY KEY (id),
    KEY idx_f1 (f1) ,
    KEY idx_f2 (f2)
    ) partition by hash(id)
    (partition p0,
    partition p1,
    partition p2);
  2. insert data:
    step 1: insert one million records by the following script:
#!/bin/bash    
HOST="127.0.0.1"    
PORT="xxxx"    
USER="xxxx"    
DB="test"    
TABLE="tb"    
    
TOTAL_ROWS=1000000  
     
COUNTER=0    
  
BATCH_SIZE=10000    
  
generate_data() {    
    local sql="INSERT INTO $TABLE (id, f1, f2, f3) VALUES "    
    for i in $(seq 1 $BATCH_SIZE); do    
        local id=$((COUNTER + i))    
        if ((id > TOTAL_ROWS)); then   
            break  
        fi  
        local f1=$((RANDOM % 1000))    
        local f2=$((RANDOM % 1000))    
        local f3=$((RANDOM % 1000))    
      
        if [[ $i -ne 1 ]]; then    
            sql+=","    
        fi    
        sql+="($id, $f1, $f2, $f3)"    
    done
}    
    
while ((COUNTER < TOTAL_ROWS)); do    
    DATA=$(generate_data)    
    
    COUNTER=$((COUNTER + BATCH_SIZE))    
 
    sleep 1
done 

step 2: Insert the following data to construct a data skew
insert into tb (id,f1,f2,f3) values(10000001,10000,102,10000),(20000001,20000,103,20000);
3. Execute analyze table
step 1: analyze table tb;
step 2: analyze table tb update histogram on id,f1,f2;
4. Execute Query
image
The estRows above is incorrect.

However, what's puzzling is that if we insert 1000 records[Still using the previous script, modifyvariable TOTAL_ROWS to 1000], you will get the correct estRows:
image

2. What did you expect to see?

image

3. What did you see instead

image

4. What is your TiDB version?

pd instance:v8.0.0
tikv instance:v8.0.0
tidb instance:v8.0.0
tiflash instance:v8.0.0

@qw4990
Copy link
Contributor

qw4990 commented May 16, 2024

Doesn't affect the query result, degrade this from Major to Moderate.

@hawkingrei
Copy link
Member

it may be related to #52421.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants