Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: type object 'Dataset' has no attribute 'from_list' #36

Open
Datta0 opened this issue Apr 5, 2023 · 3 comments
Open

Comments

@Datta0
Copy link

Datta0 commented Apr 5, 2023

I was trying to finetune on a raw text file. It has a few empty lines too. I'm getting this error.
When I looked into the Datasets class, I didn't find from_list function. There were others like from_dict and from_text ( reads from file). I wanted to know if this line of code needs to be changed.

PS: I tried replacing that line with data = datasets.Dataset.from_text(<file path>) and the training seems to be working fine. But I'm not sure how newline and multiple new line characters effect the training performance. Would appreciate some light shed on that.

To create a public link, set `share=True` in `launch()`.
Loading base model...
Number of samples: 28
Traceback (most recent call last):
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/blocks.py", line 1108, in process_api
    result = await self.call_function(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/blocks.py", line 915, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/helpers.py", line 588, in tracked_fn
    response = fn(*args)
  File "main.py", line 161, in tokenize_and_train
    data = datasets.Dataset.from_list(paragraphs)
AttributeError: type object 'Dataset' has no attribute 'from_list'
Number of samples: 11
Traceback (most recent call last):
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/blocks.py", line 1108, in process_api
    result = await self.call_function(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/blocks.py", line 915, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/datta0/.pyenv/versions/3.8.10/lib/python3.8/site-packages/gradio/helpers.py", line 588, in tracked_fn
    response = fn(*args)
  File "main.py", line 161, in tokenize_and_train
    data = datasets.Dataset.from_list(paragraphs)
AttributeError: type object 'Dataset' has no attribute 'from_list'

@lxe
Copy link
Owner

lxe commented Apr 6, 2023

I just released v2 where I rewrote the whole thing from scratch. Give it a shot!

@cnbeining
Copy link

Seems that this line still exists in the latest version https://github.com/lxe/simple-llm-finetuner/blob/master/trainer.py#L163 :-(

@cnbeining
Copy link

@Datta0 I suppose your issue was caused by stale datasets pre-installed in your environment.

Running pip install datasets -U should fix your issue.

@lxe - Maybe change the pip install -r requirements.txt to pip install -r requirements.txt -U in README.md for clairification?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants