You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I met the same error as #345 when I used clip-retrieval inference command to extract images and corresponding texts features, my command is like following:
Traceback (most recent call last):
File "/xxx/anaconda3/envs/it-retrieval/bin/clip-retrieval", line 8, in
sys.exit(main())
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/clip_retrieval/cli.py", line 18, in main
fire.Fire(
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/clip_retrieval/clip_inference/main.py", line 155, in main
distributor()
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/clip_retrieval/clip_inference/distributor.py", line 17, in call
worker(
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/clip_retrieval/clip_inference/worker.py", line 127, in worker
runner(task)
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/clip_retrieval/clip_inference/runner.py", line 39, in call
batch = iterator.next()
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/clip_retrieval/clip_inference/reader.py", line 225, in iter
for batch in self.dataloader:
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/site-packages/clip_retrieval/clip_inference/reader.py", line 99, in getitem
image_file = self.image_files[key] KeyError: 'BoredApeYachtClub_0.txt'
Traceback (most recent call last):0
File "", line 1, in
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/xxx/anaconda3/envs/it-retrieval/lib/python3.10/multiprocessing/synchronize.py", line 110, in setstate
self._semlock = _multiprocessing.SemLock._rebuild(*state) FileNotFoundError: [Errno 2] No such file or directory
After my analysis, I think the problem is that the file suffix ".txt" in "key" at this location in the code causes an issue in finding the corresponding file in the image dictionary. This is because in the source code, the possible image file extensions are: ".png", ".jpg", ".jpeg", ".bmp", ".webp", ".PNG", ".JPG", ".JPEG", ".BMP", ".WEBP".
To elaborate further, the function folder_to_keys(folder, enable_text=True, enable_image=True, enable_metadata=False) at this location in the code incorrectly uses filenames with suffixes as keys while constructing the dictionaries "text_files", "image_files", and "metadata_files". In fact, it should only retain the filename (removing the suffix). Here is my modified version of the code:
def folder_to_keys(folder, enable_text=True, enable_image=True, enable_metadata=False):
"""returns a list of keys from a folder of images and text"""
path = Path(folder)
text_files = None
metadata_files = None
image_files = None
if enable_text:
text_files = [*path.glob("**/*.txt")]
text_files = {text_file.relative_to(path).with_suffix('').as_posix(): text_file for text_file in text_files}
if enable_image:
image_files = [
*path.glob("**/*.png"),
*path.glob("**/*.jpg"),
*path.glob("**/*.jpeg"),
*path.glob("**/*.bmp"),
*path.glob("**/*.webp"),
*path.glob("**/*.PNG"),
*path.glob("**/*.JPG"),
*path.glob("**/*.JPEG"),
*path.glob("**/*.BMP"),
*path.glob("**/*.WEBP"),
]
image_files = {image_file.relative_to(path).with_suffix('').as_posix(): image_file for image_file in image_files}
if enable_metadata:
metadata_files = [*path.glob("**/*.json")]
metadata_files = {metadata_file.relative_to(path).with_suffix('').as_posix(): metadata_file for metadata_file in metadata_files}
keys = None
def join(new_set):
return new_set & keys if keys is not None else new_set
if enable_text:
keys = join(text_files.keys())
if enable_image:
keys = join(image_files.keys())
if enable_metadata:
keys = join(metadata_files.keys())
keys = list(sorted(keys))
return keys, text_files, image_files, metadata_files
After modifying the code, the inference process went smoothly and I successfully obtained the corresponding feature vectors for both images and texts.
I hope this can help the users with the same errors!
The text was updated successfully, but these errors were encountered:
Can you read #329 and propose a fix that make things work without breaking what this PR had fixed ?
ShuxunoO
changed the title
A possible solution to solve KeyError: 'xxx_0.txt' when using “clip-retrieval inference” command
A possible solution to solve KeyError: 'xxx.txt' when using “clip-retrieval inference” command
Jan 30, 2024
Can you read #329 and propose a fix that make things work without breaking what this PR had fixed ?
Sure~
The settings of my local folder and the output of the command line:
the output is:
This is reasonable because the code uses proxy paths relative to the root directory, resulting in all dictionary keys containing subdirectories of different levels.
text_files = {text_file.relative_to(path).as_posix(): text_file for text_file in text_files}
I met the same error as #345 when I used
clip-retrieval inference
command to extract images and corresponding texts features, my command is like following:My local directory structure is as follows:
and the output traceback is:
——————————————————————————————————————————————————————————
After my analysis, I think the problem is that the file suffix ".txt" in "key" at this location in the code causes an issue in finding the corresponding file in the image dictionary. This is because in the source code, the possible image file extensions are: ".png", ".jpg", ".jpeg", ".bmp", ".webp", ".PNG", ".JPG", ".JPEG", ".BMP", ".WEBP".
To elaborate further, the function
folder_to_keys(folder, enable_text=True, enable_image=True, enable_metadata=False)
at this location in the code incorrectly uses filenames with suffixes as keys while constructing the dictionaries "text_files", "image_files", and "metadata_files". In fact, it should only retain the filename (removing the suffix). Here is my modified version of the code:After modifying the code, the inference process went smoothly and I successfully obtained the corresponding feature vectors for both images and texts.
I hope this can help the users with the same errors!
The text was updated successfully, but these errors were encountered: