rock-classification

Rock classification.

Reproducibility

This section present information and steps for reproducing our work. Unfortunately, we are not able to share the dataset we employed.

Information files

Before running the code, some files should be specified for the dataset. These files should be located in ./data/ or properly configured for another prefix directory in envconfig.py module through DATA_PREFIX variable.

`./data/CLASSES.txt`

This file should contain the name of all the classes in the dataset.

`./data/WELLS.txt`

This file should contain the name of all wells in the dataset.

`./data/DATASET_VERSIONS.csv`

Should be a CSV file specifying two columns: version and dirname. version should be an integer and dirname should be the name of the directory inside ./data/ that contains part of the dataset. It is useful when multiple datasets or multiple parts of the dataset is maintained. These parts are treated as “versions” in this case.

`./data/LABELS_ASSOCIATION.csv`

Should be a CSV file specifying three columns: well, depth, and label. It assumes that every image in the dataset is associated to a well (well; as in ./data/WELLS.txt) and to a depth (depth), so the label (label) is associated to those two information. label should be specified by name, as in ./data/CLASSES.txt. Note: The code assumes that well and depth can be extracted from the file name of each file in the dataset. More information in ./data/NUMBER_TO_WELL.csv.

`./data/NUMBER_TO_WELL.csv`

Should be a CSV file specifying two columns: number and well. The file names in our employed dataset contain a number associated to each well instead of the well identifier itself. This file specifies a way to map those numbers to the respective wells (as in ./data/WELLS.txt).

`./data/IGNORED_FILEPATHS.txt`

This file should list all file paths to the images that should be ignored from the dataset for usage. In the case of our work, these are the images that contain broken rock drill plugs.

`./data/LABELS_MANUAL.csv`

Should be a CSV file specifying two columns: filepath and index. It should list the files in the test set dedicated for manual classification in the filepath column. The index should contain indices ranging from 0 to the number of images minus one.

Create environment

This repository contains Dockerfiles that can be employed with Docker to run the software with a pre-configured environment. For opening an environment through Docker, run make docker. It will create the environment specified in Dockerfiles/nvidia-pytorch-23-05-py3. Check MakefileDocker file for details.

Another alternative is to use Conda for creating the environment. Make sure to have conda command installed in your system. In such case, run make conda-env-update to create the environment, which will read information from environment.yaml. Then run conda activate rock-classification to activate the environment. Check MakefileConda file for details.

Location for dataset

The dataset should be located in ./data/ or properly configured for another prefix directory in envconfig.py module through DATA_PREFIX variable. Each subdirectory inside ./data/ should represents a “version” (or part) of the dataset. In the case of our work, each of those subdirectories contains subdirectories for the type of image (long or top view). Each of the latter contains subdirectories associated to different wells. And inside each subdirectory associated to each well are the images.

Masks for the dataset

The masks for each image should be copied to ./data/ in such a way that the masks are “together” with the original images. The files for the masks should have the same “basename” of each respective image file and have _mask.png appended to it.

Modules for running

The modules that can be run for reproducing the experiments are specified in the Makefile. They are split_dataset.py, generate_patches.py, finetune.py, finetuned_extraction.py, classification.py, and gen_table_of_results.py. The module generate_patches.py depends on running split_dataset.py. The module finetune.py depends on running generate_patches.py. The module finetuned_extraction.py depends on running finetune.py. The module classification.py depends on running finetuned_extraction.py. The module gen_table_of_results.py depends on running classification.py.

The configuration for the experiment to be run, such as patch size, network architecture, etc., should be specified in experiments/main.py.

Prepare 16 partitions of the dataset (module `split_dataset.py`)

Although preparing 16 partitions, by now we are employing only the first 8 for experiments. For performing the split into the 16 partitions, run make split_dataset. (It should take less than 3 minutes to run.)

Details: By running this module, it creates the file records/partitions.db, which store the information for the partitions.

Generate patches (module `generate_patches.py`)

Once previous steps have been accomplished, patches can be extracted for each partition through make generate_patches. The partition should be specified in experiments/main.py. Alternatively, the partition can be specified through the command line by executing ./generate_patches.py --partition <N>, in which <N> is the partition number.

Note: These patches are not used for training the networks when True is specified for raw_dataset option in experiments/main.py but they are used for performing validation throughout the training process as well as for testing. When raw_dataset is False, the pre-extracted patches of the training part are used as examples; otherwise, a random patch is extracted from each loaded image.

Note: Even when specifying partition through command line for patch generation, other options specified in experiments/main.py are maintained.

Details: By running this module, it creates the files records/patches_*.db, which store the extracted patches.

Fine-tuning (module `finetune.py`)

Once the patches are extracted, fine-tuning of pretrained PyTorch networks can be run through make finetune.

Details: By running this module, it saves the trained network in checkpoints/ directory, also creating it if it does not yet exists.

Feature extraction (module `finetuned_extraction.py`)

Once a network is trained, the features associated to each patch can be extracted with make finetuned_extraction.

Details: By running this module, it saves the extracted features in features/features_*.db files. There will be a file for each of the train, validation and test parts.

Classification (module `classification.py`)

After feature extraction, classifiers can be run with make classification.

Details: By running this module, it saves the results from the classifiers in results/classification_results.db for the metrics implemented in rockmeasures.py and specified in experiments/main.py. Also, for facilitating further post-processing, it saves the prediction vector and the corresponding ground-truth vector of labels in results/classification_predicts.db.

Printing results (module `gen_table_of_results.py`)

Once the classification is performed, the saved results can be read and printed in the screen through make gen_table_of_results.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Dockerfiles		Dockerfiles
experiments		experiments
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
MakefileConda		MakefileConda
MakefileDocker		MakefileDocker
README.md		README.md
classification.py		classification.py
colors.sh		colors.sh
common.mk		common.mk
envconfig.py		envconfig.py
environment.yaml		environment.yaml
finetune.py		finetune.py
finetuned_extraction.py		finetuned_extraction.py
gen_table_of_results.py		gen_table_of_results.py
generate_patches.py		generate_patches.py
parmapper.py		parmapper.py
parse_and_plot_logs.py		parse_and_plot_logs.py
rockdecorators.py		rockdecorators.py
rockexperiment.py		rockexperiment.py
rocklib.py		rocklib.py
rocklibdata.py		rocklibdata.py
rocklongfiles.py		rocklongfiles.py
rockmeasures.py		rockmeasures.py
rockparams.py		rockparams.py
rockresults.py		rockresults.py
split_dataset.py		split_dataset.py

License

pedrormjunior/rock-classification-cageo

Folders and files

Latest commit

History

Repository files navigation

rock-classification

Reproducibility

Information files

./data/CLASSES.txt

./data/WELLS.txt

./data/DATASET_VERSIONS.csv

./data/LABELS_ASSOCIATION.csv

./data/NUMBER_TO_WELL.csv

./data/IGNORED_FILEPATHS.txt

./data/LABELS_MANUAL.csv

Create environment

Location for dataset

Masks for the dataset

Modules for running

Prepare 16 partitions of the dataset (module split_dataset.py)

Generate patches (module generate_patches.py)

Fine-tuning (module finetune.py)

Feature extraction (module finetuned_extraction.py)

Classification (module classification.py)

Printing results (module gen_table_of_results.py)

About

Resources