AttributeError: type object 'Dataset' has no attribute 'from_list' #36

Datta0 · 2023-04-05T13:24:25Z

I was trying to finetune on a raw text file. It has a few empty lines too. I'm getting this error.
When I looked into the Datasets class, I didn't find from_list function. There were others like from_dict and from_text ( reads from file). I wanted to know if this line of code needs to be changed.

PS: I tried replacing that line with data = datasets.Dataset.from_text(<file path>) and the training seems to be working fine. But I'm not sure how newline and multiple new line characters effect the training performance. Would appreciate some light shed on that.

To create a public link, set `share=True` in `launch()`.
Loading base model...
Number of samples: 28
Traceback (most recent call last):
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/blocks.py", line 1108, in process_api
    result = await self.call_function(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/blocks.py", line 915, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/helpers.py", line 588, in tracked_fn
    response = fn(*args)
  File "main.py", line 161, in tokenize_and_train
    data = datasets.Dataset.from_list(paragraphs)
AttributeError: type object 'Dataset' has no attribute 'from_list'
Number of samples: 11
Traceback (most recent call last):
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/blocks.py", line 1108, in process_api
    result = await self.call_function(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/blocks.py", line 915, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/helpers.py", line 588, in tracked_fn
    response = fn(*args)
  File "main.py", line 161, in tokenize_and_train
    data = datasets.Dataset.from_list(paragraphs)
AttributeError: type object 'Dataset' has no attribute 'from_list'

The text was updated successfully, but these errors were encountered:

lxe · 2023-04-06T23:52:03Z

I just released v2 where I rewrote the whole thing from scratch. Give it a shot!

cnbeining · 2023-04-07T03:01:45Z

Seems that this line still exists in the latest version https://github.com/lxe/simple-llm-finetuner/blob/master/trainer.py#L163 :-(

cnbeining · 2023-04-07T03:18:54Z

@Datta0 I suppose your issue was caused by stale datasets pre-installed in your environment.

Running pip install datasets -U should fix your issue.

@lxe - Maybe change the pip install -r requirements.txt to pip install -r requirements.txt -U in README.md for clairification?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: type object 'Dataset' has no attribute 'from_list' #36

AttributeError: type object 'Dataset' has no attribute 'from_list' #36

Datta0 commented Apr 5, 2023 •

edited

lxe commented Apr 6, 2023

cnbeining commented Apr 7, 2023

cnbeining commented Apr 7, 2023

AttributeError: type object 'Dataset' has no attribute 'from_list' #36

AttributeError: type object 'Dataset' has no attribute 'from_list' #36

Comments

Datta0 commented Apr 5, 2023 • edited

lxe commented Apr 6, 2023

cnbeining commented Apr 7, 2023

cnbeining commented Apr 7, 2023

Datta0 commented Apr 5, 2023 •

edited