SaLS: Semi-automatic Literature Survey

This project implements SaLS: a semi-automatic tool to survey research papers based on the systematic methodology proposed by Kitchenham et al.[1, 2]. The goal of this project is to semi-automate the research papers survey process while providing a framework to enable surveys reproducibility and evolution. An use case of SaLS following the mentioned methodology can be found here (under review).

SaLS automatically retrives papers metadata based on queries that users provide. These queries are used to consume the search APIs exposed by the most popular research papers repositories in different domains. Currently, SaLS retrieves papers information from the following repositories:

The retrieved metadata includes paper identifier (e.g., doi), publisher, publication date, title, url, and abstract.

SaLS merges papers information from different repositories, and then applies customised syntactic and semantic filters (i.e., semantic search)[3] to reduce the search space of papers according to users' interests.

Once automatic filters are applied, the tool prompts the title and abstract of the paper in a centralised interface where users can decide if the paper should be included or not in the review (i.e., papers filtered by abstract). The URL of the papers that passed the filter by abstract is then prompted in the last filter, which requires the user to skim the full paper and decide if it is included or no.

Then, the tool applies the snowballing step by retriving the metadata of the works that cited the selected papers in the last step (i.e., papers filtered by skimming the full text), and applies the automatic and semi-automatic filters on the citing papers.

The final list of papers is composed by the cited papers that passed the first round of filters, and the citing papers that passed the second round of filters (i.e., snowballing).

Requirements

Some of the APIs provided by the repositories require an access key to be consumed. You should request a key to each repository you want to include in your search. Each respository has its own steps to apply for a key as follows:

Alternatively, you can use the tool for requesting papers from arXiv which is open and do not need an access key. SaLS does not have control over the maintenance of the APIs. If an API produces an error, you can see the details in the log files. We recommend to stop using the API that produces errors for a while.

How to run it?

The following instructions were tested on the Windows PowerShell, Windows Subsystem for Linux (WSL) and an Ubuntu machine with Python 3.8.

Clone this repository

git clone https://github.com/cabrerac/semi-automatic-literature-survey.git

cd semi-automatic-literature-survey/

Create and activate virtual environment

For Linux distributions

python -m venv venv

source venv/bin/activate

For Windows

python -m venv ./venv

./venv/Scripts/activate

Install requirements

pip install -r requirements.txt

Install language package for spacy

python -m spacy download en_core_web_sm

Create a file ./config.json that will store the API access keys for the repositories you want to use. The file should have the following format:

 {
  "api_access_core": "CORE_API_ACCESS_KEY",
  "api_access_ieee": "IEEE_API_ACCESS_KEY",
  "api_access_springer": "SPRINGER_API_ACCESS_KEY",
  "api_access_elsevier": "ELSEVIER_API_ACCESS_KEY",
  "api_access_semantic_scholar": "SEMANTIC_SCHOLAR_API_ACCESS_KEY"
}

Ignore this step if you are testing the tool with arXiv. Also, you should only add the access keys of the repositories you want to use.

Run the main passing the search parameters file. For example:

python main.py parameters_ar.yaml

A simple self-explanatory example of a search parameters file can be found in ./parameters_ar.yaml. Alternatively, a parameters file including syntactic and semantic filters can be found in ./parameters_sys.yaml

A description of the semi-automatic methodology applied in a survey can be found in the paper "Real-world Machine Learning Systems: A survey from a Data-Oriented Architecture Perspective" [4].

References

[1] Barbara Kitchenham and Pearl Brereton. 2013. A systematic review of systematic review process research in software engineering. Information and Software Technology 55, 12 (2013), 2049–2075. https://doi.org/10.1016/j.infsof.2013.07.010

[2] Barbara Kitchenham and Stuart Charters. 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001. Keele University and Durham University Joint Report. https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf

[3] SBERT.net Sentence Transformers. 2024. Semantic Search Available online

[4] Christian Cabrera, Andrei Paleyes, Pierre Thodoroff, and Neil D. Lawrence. 2023. Real-world Machine Learning Systems: A survey from a Data-Oriented Architecture Perspective. arXiv preprint arXiv:2302.04810. Available online

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
analysis		analysis
clients		clients
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
parameters_ar.yaml		parameters_ar.yaml
parameters_sys.yaml		parameters_sys.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

clients

clients

util

util

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

main.py

main.py

parameters_ar.yaml

parameters_ar.yaml

parameters_sys.yaml

parameters_sys.yaml

requirements.txt

requirements.txt

Repository files navigation

SaLS: Semi-automatic Literature Survey

Requirements

How to run it?

References

About

Releases

Packages

Languages

License

cabrerac/semi-automatic-literature-survey

Folders and files

Latest commit

History

Repository files navigation

SaLS: Semi-automatic Literature Survey

Requirements

How to run it?

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages