Skip to content

🇷🇴 A performant, battle-tested scraper for dexonline.ro to fetch information about words in the Romanian language.

License

Notifications You must be signed in to change notification settings

vxern/dexonline-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A tiny, battle-tested, performant and documented scraper for dexonline.ro.

Usage

To start using the scraper, first install it using the following command:

npm install dexonline-scraper

The simplest way of using the scraper is as follows:

import * as Dexonline from "dexonline-scraper";

const results = await Dexonline.get("word");

Alternatively, you can parse HTML of the website directly, bypassing the fetch step as follows. Notice that, as opposed to get(), parse() is synchronous:

import * as Dexonline from "dexonline-scraper";

const results = Dexonline.parse(html);

You can configure the mode according to which the parser will match results to the search term, ensuring that only terms identical to the search term are returned:

import * as Dexonline from "dexonline-scraper";

const results = await Dexonline.get("word", { mode: "strict" });

You can modify the results returned by Dexonline using flags:

import * as Dexonline from 'dexonline-scraper';
import { DictionaryFlags } from 'dexonline-scraper';

const results = await Dexonline.get('word', {
  flags: 
    | DictionaryFlags.UseCedillas // Use 'ÅŸ' and 'Å£' instead of 'È™' and 'È›'.
    | DictionaryFlags.MatchDiacritics // Do not return words where the only difference is a diacritic.
    | DictionaryFlags.UsePreReformOrthography // Use 'î' instead of 'â' in all cases except for the word 'român' and its derivatives.
    | DictionaryFlags.SearchOnlyNormativeDictionaries // Return results obtained only from the DEX and/or the DOOM.
});