Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate meza #7

Open
reubano opened this issue Jul 29, 2016 · 3 comments
Open

integrate meza #7

reubano opened this issue Jul 29, 2016 · 3 comments
Labels
Milestone

Comments

@reubano
Copy link
Member

reubano commented Jul 29, 2016

There is a lot of overlap in functionality with the utils module and meza. Where possible, I should merge redundant functions and move new ones.

@reubano reubano modified the milestone: 1.0.0-beta Jul 31, 2016
@COLABORATI
Copy link

Am I right to hope that the integration with sqlalchemy will bring a nice save-to-database feature to riko? I am currently looking what would be the best way to store things in db without breaking a pipeline, so that 'save-to-db' could be done seamless in different stages of the data pipeline.

@reubano
Copy link
Member Author

reubano commented Aug 2, 2016

Technically, you can already do something similar to that example with riko since it shares a similar data structure.

>>> from riko.modules.fetch import pipe
>>> from meza import fntools as ft
>>> from .models import Table

# Table is a sqlalchemy.Model class
# db is a sqlalchemy database instance
>>> stream = pipe(conf={'url': 'https://news.ycombinator.com/rss'})
>>> for data in ft.chunk(stream, chunk_size):
...     db.engine.execute(Table.__table__.insert(), data)

But you are correct in that this will "break" the pipeline since you are consuming the iterator. I do have future plans to have a pipe that outputs to sqlalchemy supported dbs. Maybe something like this:

>>> from riko.collections.sync import SyncPipe
>>> flow = (
...     SyncPipe('fetch', conf={'url': 'https://news.ycombinator.com/rss'})        
...         .write(conf={'uri': 'sqlite:////tmp/test.db', 'table': 'raw'})
...         .filter(conf={'rule': {'field': 'link', 'op': 'contains', 'value': 'python'}})
...         .write(conf={'uri': 'sqlite:////tmp/test.db', 'table': 'filtered'})
...         .sort(conf={'rule': {'sort_key': 'title'}}))

But this doesn't really have anything to do w.r.t meza.

@reubano
Copy link
Member Author

reubano commented Aug 2, 2016

Add feel free to add this request to #8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants