is there any way to speed up deletion #544

fenchu · 2023-10-18T10:21:07Z

My tindydb json file grows with 5GB per week if I do not delete.

Currently we just load the tinydb data.json and delete all internalids below a given threshold.

But the major problem is that we need to close tinydb handler to do this, this do not work well in a multiprocessing asyncio fastapi app.

I like to keep max 1000 entries in the list and delete everything below the 1000 highest.

Any guidelines on how to do this while keeping the app running would be great.

A suggestion I got was adding a timestamps (epoch) and delete any timestamps below the 1000 highest, but it bloats up the table.
and add extra logic.
Thanks

fenchu · 2023-10-18T15:35:49Z

This can be obtained using db.max(), but it is slow

def keep_newest(key:str='jobid', maxlen:int=1000) -> Optional[List]:
    """ keep the newest maxlen entries in database """
    global db
    if not db:
        db = TinyDB(db_path)
    currlen = len(db.all())
    if currlen<=maxlen:
        #log.warning(f"database size is:{currlen} which is less than {maxlen} - no deletion")
        return False
    ids = []
    for d in db.all()[:currlen - maxlen + 1]:
        id = db.remove(where(key)==d[key])
        if id:
            ids.append(id)
        #log.info(f"removed {d} with index {id}")
    return ids

number of entries in database: 10000
number of entries in database: 999
deleting 9001 took 450.83sec

The json direct version is way faster: 1875 times faster?

def keep_newest_json(fname:str,  maxlen:int=1000, table:str='_default') -> Optional[List]:
    """ keep the newest maxlen entries in database """
    dat = None
    with open(fname, 'r', encoding='utf8') as FR:
        dat = json.load(FR)
    if table not in dat:
        log.fatal(f"table:{table} not found in dat:{list(dat.keys())}")
        return None
    currlen = len(dat[table].keys())
    if currlen<=maxlen:
        log.info(f"table:{table} has {currlen} entries, less than maxlen:{maxlen}")
        return None
    ids = []
    for id in list(dat[table].keys())[:currlen - maxlen + 1]:
        del dat[table][id]
        ids.append(id)
        #log.info(f"removed index {id} from {table}")
    with open(fname, 'w', encoding='utf8') as FW:
        FW.write(json.dumps(dat, indent=2, sort_keys=True))
    return ids

number of entries in database: 10000
number of entries in database: 999
deleting 9001 took 0.24sec

fenchu changed the title ~~Feature request: get all the internalids and delete by internalid~~ is there any way to speed up deletion Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is there any way to speed up deletion #544

is there any way to speed up deletion #544

fenchu commented Oct 18, 2023

fenchu commented Oct 18, 2023

is there any way to speed up deletion #544

is there any way to speed up deletion #544

Comments

fenchu commented Oct 18, 2023

fenchu commented Oct 18, 2023