New Own Dataset #296
Okeke-Stephen
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there
Thank you for this great work.
I have been trying to use my own data to fine-tune models with the data loader below:
from datasets import load_dataset
#data = load_dataset("Abirate/english_quotes")
Load dataset (you can process it here)
data = load_dataset("csv", data_files="train.csv", split="train")
Shuffle the dataset
data = data.map(lambda samples: tokenizer(samples["text"]), batched=True)
data = data.shuffle(seed=42)
data = data.select(range(50))
then add
train_dataset=data["text"], to the trainer
However, I have been getting errors and the method does recognize train_dataset=data["text"].
The code sample can only fine-tune models with the sample dataset you provided. Can this be resolved?
Regards,
Srt
Beta Was this translation helpful? Give feedback.
All reactions