Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promise please #249

Open
lwyj123 opened this issue Dec 7, 2017 · 14 comments
Open

Promise please #249

lwyj123 opened this issue Dec 7, 2017 · 14 comments
Labels

Comments

@lwyj123
Copy link

lwyj123 commented Dec 7, 2017

No description provided.

@mike442144
Copy link
Collaborator

will add in next version.

@cernadasjuan
Copy link

Hi! There is an estimated release date?

@mike442144
Copy link
Collaborator

@cernadasjuan No, actually. It is hard to support the promise API, because it is quite different from global function callback.

@camrymps
Copy link

If I may make a suggestion: why not just add an event emitter for the response? For example:

crawler.on('response', (err, res) => { // handle response here })

@mike442144
Copy link
Collaborator

@camrymps It is another choice, but has nothing to do with promise, am I right? Event is not so straightforward compare to the callback, for instance if we have two or more different parsers it will be specified by the callback, you can handle them easily, but if you use event how could you do?

@camrymps
Copy link

camrymps commented Mar 10, 2018

@mike442144 You are correct. Callbacks are often easier to deal with, generally. I'm only offering this as a suggestion as events may be a better solution than promises for some people, even though promises are the best way to go. Anyway, an event scenario could look something like this:

var crawler = new Crawler();

function doSomething(res) {
    console.log(res);
}

function doSomethingElse(res) {
    console.log(res);
}

crawler.on('response', (err, res) => {
    [res.options.parser](res);
}); 

crawler.queue({ uri: 'https://google.com', parser: 'doSomething' });
crawler.queue({ uri: 'https://yahoo.com', parser: 'doSomethingElse' });

@mike442144
Copy link
Collaborator

That's great! I think the only problem is the calling scope. it is similar with definition from scrapy, which requires a framework.
We'll have repeated code like:

crawler.on('response', (err, res) => {
    [res.options.parser](res);
});

If not,say, we can force developers to define a class with member method, and crawler call them in callbacks. That's a great idea, let's see if we can do more.

@awmv
Copy link

awmv commented May 16, 2018

exports.fix = new Promise((resolve, reject) => {
    const loop = sources['stuff'].urls.map((url) => {
        c.queue([{
                uri: url,
                userAgent: userAgent,
                referer: referer,
                callback: exec
            }]);
        async function exec(err, res, done) {
            if (err || res.statusCode !== 200) {
                throw new Error(err)
            }
            const $ = res.$;
            done();
        }
    });
    c.once('error', (error) => reject(error));
    c.once('drain', () => resolve(Promise.all(loop)));
});

Snippet from my own project. This code doesn't work, it's just to give you an idea on how to create the wrapper.

@mike442144
Copy link
Collaborator

Thanks @robjane, the problem is not how to create the wrapper, if you try your best to understand the conversations above.

@soakit
Copy link

soakit commented Jul 19, 2018

function getPageAsync(urls) {
	return new Promise((resolve, reject) => {
		const loop = urls.map((url) => {
			return new Promise((resolve, reject) => {
				c.queue([{
					uri: url,
					/* userAgent: userAgent,
					referer: referer, */
					callback: async function (err, res, done) {
						if (err || res.statusCode !== 200) {
							reject('err')
							throw new Error(err)
						}
						const $ = res.$;
						resolve($)
						done();
					}
				}]);
			})
		});
		c.once('error', (error) => reject(error));
		c.once('drain', () => {
			Promise.all(loop).then(results => {
				resolve(results)
			})
		})
	});
}

@bugs181
Copy link

bugs181 commented May 4, 2019

If I may offer my opinion - I don't believe Promises should be added to this library. Far too often, libraries try to do too much and become feature bloated. There's an endless supply of libraries that allow wrapping callbacks which can be used with this beautiful library. I much prefer to use minimal and simple libraries. Instead, you could focus more on Plugin support and extending this library to meet user's needs for that.

@mike442144
Copy link
Collaborator

@bugs181 Can't agree more! what's import in this case is that Promise is breaking the workflow, anyway, I'm looking for another way to get this done. Maybe a higher layer API.

@PhilTheAir
Copy link

Does anyone know how I can utilize the "done" in the callback to execute other functions when the crawler finishes its own work? Is it like done(otherFunc()) or so? Thanks.

@bugs181
Copy link

bugs181 commented Jul 29, 2019

Does anyone know how I can utilize the "done" in the callback to execute other functions when the crawler finishes its own work? Is it like done(otherFunc()) or so? Thanks.

I recommend starting a new Github issue. You're asking an unrelated question which is something generally frowned upon. We refer to this as "thread hijacking".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants