Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove identical item during index merge #6227

Open
caterchong opened this issue May 7, 2024 · 0 comments
Open

remove identical item during index merge #6227

caterchong opened this issue May 7, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@caterchong
Copy link

Is your feature request related to a problem? Please describe

Normally, there is no identical index item in merge set. So mergeset/merge.go don't consider this scenorio in (bsm *blockStreamMerger) Merge.

But it still make sense for some offline data processing. for example, merging serveral backups into one.
Removing identical items help to shrink the size of index.

Describe the solution you'd like

if current item equals next item ,just skip it.
the code might look like this:

for bsr.currItemIdx < len(items) {
		item := items[bsr.currItemIdx].Bytes(data)
		itemStr := string(item)
		if compareEveryItem && itemStr > nextItem {
			break
		}

		// if equal to nextItem, skip this item
		if itemStr != nextItem {
			if !bsm.ib.Add(item) {
				// The bsm.ib is full. Flush it to bsw and continue.
				bsm.flushIB(bsw, ph, itemsMerged)
				continue
			}
		}

		bsr.currItemIdx++
	}

this change requires a string == calculation, and performance call be improved by optimize string(item).

Describe alternatives you've considered

No response

Additional information

No response

@caterchong caterchong added the enhancement New feature or request label May 7, 2024
@caterchong caterchong changed the title remove duplication item during index merge remove indentical item during index merge May 7, 2024
@caterchong caterchong changed the title remove indentical item during index merge remove identical item during index merge May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant