Skip to content

lucasxlu/JiaYuan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JiaYuan Spider and Data Analysis

Introduction

  • scrape data from shijijiayuan with BeautifulSoup and requests in Python3.5
  • machine learning algorithm in R
  • visualize data and generate report in in MS PowerPoint2016, R ggplot2, TAGUL

Prerequisites

  • Python3.X (Python 3.5 is recommended)
  • 3rd party library(requests, BeautifulSoup)

Note

  • for later research, a Linux OS(Ubuntu 16.04 or CentOS 7 will be fine) is required. If you use Windows, that may bring you some trouble

Results

  • Basic statistics info

    cover img1 img2

  • With NLP

    img5 img6 img7 img8

The Next

Next, I want to train this spider with the avatar image set based on Computer Vision, in order to enable this spider has ability to rank your face. Anyone who is interested in computer vision, deep learning please commit your issues.

For more details, please visit my article at Zhihu.

With pleasure!