Skip to content

Latest commit

 

History

History
263 lines (194 loc) · 18.8 KB

CHANGELOG.md

File metadata and controls

263 lines (194 loc) · 18.8 KB

Change Log

0.1.7 (2016-01-30)

Full Changelog

Closed issues:

  • ImportError: cannot import name 'Image' #183
  • Won't let me import #182
  • Install on Mac - El Capitan Failed - "Operation not permitted" #181
  • Downgrades to old versions of required packages upon installation #174
  • Handling 404, 500, and other non-200 http response codes to prevent scraping error pages #142
  • Libray downgrading in installation #138

Merged pull requests:

  • Don't scrape error pages #190 (yprez)
  • Added Hebrew stop words for language support #188 (alon7)
  • Fix installation and build #187 (yprez)
  • Fix installation docs #184 (yprez)
  • Travis CI integration #180 (yprez)
  • requirements.txt - Use minimal instead of exact versions #179 (yprez)
  • Handle lxml raising ValueError on node.itertext() - Python 3 #178 (yprez)
  • Handle lxml raising ValueError on node.itertext() #144 (yprez)
  • Parse byline fix #132 (davecrumbacher)

0.1.6 (2016-01-10)

Full Changelog

Closed issues:

  • Critical leak in newspaper.mthreading.Worker #177
  • HTMLParseError #165
  • Take local paths to .html files #153
  • Wall Street Journal Full Text is not Correctly Scraped #150
  • Article HTML Returning Null #131
  • No articles #130
  • Loading Pages that use heavy javascript #127
  • Login handling for premium websites #126
  • Installation of nltk is failing #121

Merged pull requests:

0.1.5 (2015-03-04)

Full Changelog

Closed issues:

  • is there any kind of documentation on centos 7? #114
  • Add extraction publishing date from article. #3

Merged pull requests:

  • bumping nltk to 2.0.5 - see #824 in nltk #125 (hexelon)

0.1.4 (2015-02-04)

Full Changelog

Closed issues:

  • Getting rate limiting issue? #116
  • newspaper.build( ) error #111
  • Allow lists in Parser.clean_article_html() #108

Merged pull requests:

  • Fix incorrect log call while generating articles #115 (curita)
  • Allow lists in clean_article_html() - fixes #108 #112 (ecesena)
  • Fixed nodeToString() to return valid HTML #110 (ecesena)
  • Fixed empty return in top_meta_image #109 (ecesena)

0.1.3 (2015-01-15)

Full Changelog

Implemented enhancements:

  • Fulltext extraction improvement #1 #105

Closed issues:

  • Tags h1 in article_html - indented behavior? #107

Merged pull requests:

0.1.2 (2015-01-01)

Full Changelog

Closed issues:

  • Metatags on Vice.com #103
  • Can't extract images from german newspapers #96
  • article_html misses many of the images #89

Merged pull requests:

  • Integrate UnicodeDammit, deprecate parser_class, deprecate encodeValue, refactor, scaffolding for more unit tests #104 (codelucas)

0.1.1 (2014-12-27)

Full Changelog

Closed issues:

  • UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc #99
  • TypeError: Can't convert 'bytes' object to str implicitly #98
  • [Parse lxml ERR] Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. #78
  • UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11: ordinal not in range(128) #77
  • article.text and keywords error #47

Merged pull requests:

  • Huge bugfix to aid lxml DOM parsing + remove unhelpful and excess exception messages and added tracebacks to exception logging #102 (codelucas)
  • Decode bytestring returned from lxml's toString early on before sending it out to outer code #101 (codelucas)
  • Fixed #78: Remove encoding tag because lxml won't accept it for unicode #97 (mhall1)

0.1.0 (2014-12-17)

Full Changelog

0.0.9 (2014-12-17)

Full Changelog

Closed issues:

  • object has no attribute clean Error when using parse method #90
  • Questions #85
  • [nltk_data] Error loading brown: <urlopen error [Errno -2] Name or [nltk_data] service not known> #84
  • newspaper unable to find embeded youtube video #82
  • Bound for memory usage #81
  • Hosted demo #80
  • Having issues installing due to lxml #79
  • Add a BeautifulSoup4 parser. #44
  • python 3 support request #36

Merged pull requests:

0.0.8 (2014-10-13)

Full Changelog

Closed issues:

  • Parsing Raw HTML #74
  • Can't install newspaper #72
  • Refactor codebase so newspaper is actually pythonic #70
  • Article.top_node == Article.clean_top_node #65
  • article.movies missing 'http:' #64
  • KeyError when calling newspaper.languages() #62
  • Memoize Articles - Not Printing #61
  • Add URL headers while building a "paper" #60
  • AttributeError: 'module' object has no attribute 'build' #59
  • Typo in newspaper.build argument "memoize_articles" #58
  • issue with stopwords-tr.txt #51
  • Other language support. #34
  • Character encoding detection #2

Merged pull requests:

  • Huge refactor: entire codebase in PEP8, imports alphabetized, bugfixes, core changes #71 (codelucas)
  • Meta tag extraction fixes #69 (karls)
  • Test suite improvements #68 (karls)
  • Test suite fixes #67 (karls)
  • Revert "Added published date to the extractor+article" #66 (codelucas)
  • Added published date to the extractor+article #63 (parhammmm)

0.0.7 (2014-06-17)

Full Changelog

Closed issues:

  • no document on how to add language #57
  • Retain <a> tags in top article node? #56
  • DocumentCleaner is missing clean_body_classes #55
  • You must download and parse an article before parsing it #52
  • Not extracting UL LI text #50
  • article does not release_resources() #42
  • Doesn't work on http://www.le360.ma/fr #40
  • How to assign html content without downloading it? #37
  • Python venv only? #32
  • .nlp() could not work #27
  • Doesn't work with Arabic news sites #23
  • SyntaxError: invalid syntax #19
  • Retain HTML markup for extracted article #18
  • Portuguese is misspelled #14
  • Multi-threading article downloads not working #12
  • Timegm error? #10
  • Problem in Brazilian sites #9
  • Brazilian portuguese support #6

Merged pull requests:

0.0.6 (2014-01-18)

Full Changelog

Closed issues:

  • Port to Ruby #8
  • Huge internationalization / API revamp underway! #7
  • Multithread & gevent framework built into newspaper #4

Merged pull requests:

0.0.5 (2014-01-09)

Full Changelog

0.0.4 (2013-12-31)

Full Changelog

Closed issues:

  • Calling nlp() on an article causes 'tokenizers/punkt/english.pickle' Not Found Error #1

Merged pull requests:

  • Fix for keyword arg usage in print() on Python 2.7 #5 (michaelhood)

0.0.3 (2013-12-22)

Full Changelog

0.0.2 (2013-12-21)

Full Changelog

0.0.1 (2013-12-21)

* This Change Log was automatically generated by github_changelog_generator