Text Processing and Labelling the Dataset

To Label the semi-structured data(XML Files), it should be preprocess to be carried out, inorder to have a clean data & Label the data ,on which NLP Precesing can be done.

Load and Remove the XML, HTML tags & Alphanumeric characters

Load the whole xml dataset using tqdm library and clean the all the files using BeautifulSoup and regex Libraries from the xml documents.
Replace the numberic with space.

Labelling the Dataset

Using the Pandas append the summarizing documents with their labels
Load dataframe with labeling and documents.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
Test PubMed.ipynb		Test PubMed.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Test PubMed.ipynb

Test PubMed.ipynb

Repository files navigation

Text Processing and Labelling the Dataset

Load and Remove the XML, HTML tags & Alphanumeric characters

Labelling the Dataset

About

Releases

Packages

Languages

License

bellamkondaprakash/Classification_Text_Cancer_Data

Folders and files

Latest commit

History

Repository files navigation

Text Processing and Labelling the Dataset

Load and Remove the XML, HTML tags & Alphanumeric characters

Labelling the Dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Languages