Example TFX Pipeline for Text Classification

Who is this for?

While trying to build my own production ML pipeline, I was unable to understand what was happening under the hood. Found a few examples, but none of them clear enough, so I decided to build a couple of my own so I and anybody else who may need it can use them.

The Basics

The fundamentals of this are fairly simple. We have several steps in building a model

Find the data
Evaluate the data
Preprocess the data
Train the model with the data
Save the trained model wuth serving signatures
Push the saved model into production

Reading the code, it will be fairly simple to understand with attached documentation what each component is doing

This example is only illustrative, not exhaustive. There are several other components that can be attached to the pipeline for more completeness, but to get started with, this is a sufficiently deep overview

Data

You can really use any data that is in a CSV format, with columns of label, text

If you want to use a different file format, you will need to make respective changes to the ExampleGen component in the file pipeline.py

If you wish to carry out any more preprocessing, add the needed steps in the function preprocessing_fn() in the file transform_file.py (e.g. stripping away HTML tags, reducing punctuation, one hot encoding etc.)

Just save the file in a directory, and specify the location of the directory in the variable _data_root in the file pipeline.py

Model

If you want to change the model you are making use of, go to the file trainer_file.py and modify the function build_keras_model() according to what you require

Defining how to run the training session, including parallelisation, data ingestion options etc can be done by modifying the trainer_file.py module

Execution

After adding all the global variables, and downloading data, creating directories for each requirement, just run the file pipeline.py with the command. Python3 is recommended

>>python3 pipeline.py

It will automatically trigger the pipeline execution

Please make sure the following libraries are installed:

TensorFlow pip3 install tensorflow
TensorFlow Extended pip3 install tfx
TFX BSL pip3 install tfx-bsl

Interactive Execution

At times we want to see things executing component by component instead of running the whole thing at once, maybe for purposes of debugging or simply testing each item separately

To achieve this, we can make use of the module InteractiveContext available by using the following import from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

Once imported, the module needs to be loaded separately by using the following command for any iPython notebook %load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip

Then, we need to create an InteractiveContext object with context = InteractiveContext()

And to run any of the components, simply use the context.run(Component) command. Please note that the order of execution remains the same, if any component depends on another component, you need to execute the first component before running the dependent component

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
pipeline		pipeline
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline

pipeline

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Example TFX Pipeline for Text Classification

Who is this for?

The Basics

Data

Model

Execution

Interactive Execution

About

Releases

Packages

Languages

License

microcoder-py/example-tfx-pipeline-text-classifier

Folders and files

Latest commit

History

Repository files navigation

Example TFX Pipeline for Text Classification

Who is this for?

The Basics

Data

Model

Execution

Interactive Execution

About

Topics

Resources

License

Stars

Watchers

Forks

Languages