Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typescript and types #297

Open
chris74a opened this issue Dec 13, 2018 · 14 comments
Open

Typescript and types #297

chris74a opened this issue Dec 13, 2018 · 14 comments
Labels

Comments

@chris74a
Copy link

Hello,
I am trying to use node-crawler in typescript. I do npm install @types/crawler , but I have a 404.
Is there a way to use this module with ts?
Thanks

@mike442144
Copy link
Collaborator

mike442144 commented Jan 23, 2019

Actually I'm not familiar with ts, contribute is welcome. :)

@raynor85
Copy link

raynor85 commented May 7, 2019

I have added a index.d.ts in my project with the following basic type definitions. It is not exhaustive, but it is a start.

declare module 'crawler' {
  import { IncomingMessage } from 'http';

  export interface CrawlerResponse extends IncomingMessage {
    $: CheerioStatic;
    options: CrawlerOptions;
    request: any;
  }

  export type CrawlerCallback = (
    error: Error,
    res: CrawlerResponse,
    done: () => {}
  ) => void;

  interface JQueryObject {
    name: string;
    options: {
      normalizeWhitespace?: boolean;
      xmlMode?: boolean;
      decodeEntities?: boolean;
    };
  }

  type JQuery = boolean | 'cheerio' | JQueryObject;

  export interface CrawlerOptions {
    uri?: string;
    autoWindowClose?: boolean;
    forceUTF8?: boolean;
    gzip?: boolean;
    incomingEncoding?: string;
    jQuery?: JQuery;
    maxConnections?: number;
    method?: string;
    priority?: number;
    priorityRange?: number;
    rateLimit?: number;
    referer?: boolean;
    retries?: number;
    retryTimeout?: number;
    timeout?: number;
    skipDuplicates?: boolean;
    rotateUA?: boolean;
    homogeneous?: boolean;
    callback?: CrawlerCallback;
  }

  class Crawler {
    constructor(options: CrawlerOptions);
    queue(options: string | string[] | CrawlerOptions | CrawlerOptions[]): void;
  }

  export default Crawler;
}

@mike442144
Copy link
Collaborator

Though I don't know much about ts, I think it is a good start :)

@fanhuahaitang
Copy link

import request from 'request';
import events from 'events';

export interface CreateCrawlerOptions {
autoWindowClose?: boolean;
forceUTF8?: boolean;
gzip?: boolean;
incomingEncoding?: string;
jquery?: boolean | string | CheerioOptionsInterface | any;
jQuery?: boolean | string | CheerioOptionsInterface | any;
maxConnections?: number;
method?: string;
priority?: number;
priorityRange?: number;
rateLimit?: number;
referer?: false | string
retries?: number;
retryTimeout?: number;
timeout?: number;
skipDuplicates?: boolean;
rotateUA?: boolean;
userAgent?: string | string[];
homogeneous?: boolean;
debug?: boolean;
logger?: {
log: (level: string, ...args: any[]) => void;
};
seenreq?: any;
headers?: request.Headers;
callback?: (err: Error, res: CrawlerRequestResponse, done: () => void) => void;
[x: string]: any
}

export declare class Crawler extends events.EventEmitter {
queueSize: number;
on(channel: 'schedule', listener: (options: CrawlerRequestOptions) => void): this;
on(channel: 'limiterChange', listener: (options: CrawlerRequestOptions, limiter: string) => void): this;
on(channel: 'request', listener: (options: CrawlerRequestOptions) => void): this;
on(channel: 'drain', listener: () => void): this;
queue(uri: string): void;
queue(options: CrawlerRequestOptions): void;
direct(options: CrawlerRequestOptions & {
callback: (error: Error, response: CrawlerRequestResponse) => void;
}): void;
setLimiterProperty(limiter: string, property?: string, value?: any): void;
}

export interface CrawlerRequestOptions extends request.CoreOptions {
uri?: string;
html?: string;
proxy?: any;
proxies?: any[];
limiter?: string;
encoding?: string;
priority?: number;
jQuery?: boolean | string | CheerioOptionsInterface | any;
preRequest?: (options: CrawlerRequestOptions, doRequest: (err: Error) => void) => void;
callback?: (err: Error, res: CrawlerRequestResponse, done: () => void) => void;
[x: string]: any
}

export interface CrawlerRequestResponse {
statusMessage?: string;
statusCode: number;
body?: Buffer | string;
headers?: request.Headers;
request?: request.RequestAsJSON;
options?: CrawlerRequestOptions;
$?: CheerioAPI
[x: string]: any
}

@fanhuahaitang
Copy link

fanhuahaitang commented Aug 6, 2019

The code is too complicated,and it's packages of Bottleneck/seenreq/whacko don't support ts.

declare module 'crawler' {
  import { IncomingMessage } from 'http';

  import request from 'request';
  import events from 'events';

  class Crawler extends events.EventEmitter {
    constructor(options: CreateCrawlerOptions);
    queueSize: number;
    on(channel: 'schedule', listener: (options: CrawlerRequestOptions) => void): this;
    on(channel: 'limiterChange', listener: (options: CrawlerRequestOptions, limiter: string) => void): this;
    on(channel: 'request', listener: (options: CrawlerRequestOptions) => void): this;
    on(channel: 'drain', listener: () => void): this;
    queue(uri: string): void;
    queue(uri: string[]): void;
    queue(options: CrawlerRequestOptions | CrawlerRequestOptions[]): void;
    direct(options: CrawlerRequestOptions & {
      callback: (error: Error, response: CrawlerRequestResponse) => void;
    }): void;
    setLimiterProperty(limiter: string, property?: string, value?: any): void;
  }

  export interface CrawlerRequestOptions extends request.CoreOptions {
    uri?: string;
    html?: string;
    proxy?: any;
    proxies?: any[];
    limiter?: string;
    encoding?: string;
    priority?: number;
    jQuery?: boolean | "whacko" | "cheerio" | {
        name: "cheerio";
        options: CheerioOptionsInterface;
      } | {
        jsdom: any;
        [key: string]: any;
      } | any;
    preRequest?: (options: CrawlerRequestOptions, doRequest: (err: Error) => void) => void;
    callback?: (err: Error, res: CrawlerRequestResponse, done: () => void) => void;
    [x: string]: any
  }

  export interface CrawlerRequestResponse extends IncomingMessage {
    body?: Buffer | string;
    request?: request.RequestAsJSON;
    options?: CrawlerRequestOptions;
    $?: CheerioAPI
    [x: string]: any
  }

  export interface CreateCrawlerOptions {
    autoWindowClose?: boolean;
    forceUTF8?: boolean;
    gzip?: boolean;
    incomingEncoding?: string;
    jquery?: boolean | "whacko" | "cheerio" | {
        name: "cheerio";
        options: CheerioOptionsInterface;
      } | {
        jsdom: any;
        [key: string]: any;
      } | any;
    jQuery?: boolean | "whacko" | "cheerio" | {
        name: "cheerio";
        options: CheerioOptionsInterface;
      } | {
        jsdom: any;
        [key: string]: any;
      } | any;
    maxConnections?: number;
    method?: string;
    priority?: number;
    priorityRange?: number;
    rateLimit?: number;
    referer?: false | string;
    retries?: number;
    retryTimeout?: number;
    timeout?: number;
    skipDuplicates?: boolean;
    rotateUA?: boolean;
    userAgent?: string | string[];
    homogeneous?: boolean;
    debug?: boolean;
    logger?: {
      log: (level: string, ...args: any[]) => void;
    };
    seenreq?: any;
    headers?: request.Headers;
    callback?: (err: Error, res: CrawlerRequestResponse, done: () => void) => void;
    [x: string]: any;
  }

  export default Crawler;
}

@mike442144
Copy link
Collaborator

@raynor85 @wakhh thanks for your great job, I'm glad to merge your pull request.

@vczh
Copy link

vczh commented Sep 11, 2019

Have this typescript description file been submitted to npm?

@mike442144
Copy link
Collaborator

@vczh Nope, I don't know much about Typescript. Are you going to add this feature with confidence?

@vczh
Copy link

vczh commented Sep 18, 2019

@mike442144 maybe this is a good chance to learn, by rewriting the code to TypeScript, just need to add type annotation, don't need to change logic :)

The problem for me is that, I am not the author of this library, how could I know if the type description is actually correct.

@mike442144
Copy link
Collaborator

@vczh My issue is that I'm too busy to do this, so do not count on me. If you're using crawler, it is a good chance to do this to help others : ).

if you're not confident with the type definition, we can discuss in this issue.

@vczh
Copy link

vczh commented Sep 21, 2019

@mike442144 I tried this library before with @wakhh 's work in this thread and it works fine. I only used a very small subset of the feature so I am not sure. It only need a little update to remove warnings.

I was trying to write a tool to download my own website that host locally on my computer, but I found this library doesn't read into HTML data and therefore not do recursive requests. Maybe I was wrong, but I cannot find related information from the document, and I turned to website-scraper. Unfortunately I've deleted my update to the type.

@mike442144
Copy link
Collaborator

@vczh I'm sure you missed something. crawler provides almost all the functions you need, but need some efforts to compose and organize. You're welcomed to come back : )

@fanhuahaitang
Copy link

fanhuahaitang commented Oct 11, 2019

I update the ts code directly on last reply and fix the error of jquery key.
But i don't use it now,i rewirite a crawler api lib to crawler file,and use npm promise-queue to control it.
https://www.npmjs.com/package/crawler-api

@pzmarzly
Copy link
Contributor

pzmarzly commented Oct 12, 2020

I updated the types a bit, and pushed them to DefinitelyTyped/DefinitelyTyped#48692

It would be nice if someone could review or test them in their project (the simplest way is to copy index.d.ts to your project, and change its extension to just .ts).

EDIT: Also, tell me if you want to be added to Definition Owners (by putting your github username in Authors line in DefinitelyTyped definitions)

EDIT 2: Now available as @types/crawler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants