Oracle Cloud Infrastructure Data Science and AI services Examples

The Oracle Cloud Infrastructure (OCI) Data Science service has created this repo to make demos, tutorials, and code examples that highlight various features of the OCI Data Science service and AI services. We welcome your feedback and would like to know what content is useful and what content is missing. Open an issue to do this. We know that a lot of you are creating great content and we would like to help you share it. See the contributions document.

Oracle Cloud Infrastructure (OCI) Data Science Services provide a powerful suite of tools for data scientists, enabling faster machine learning model development and deployment. With features like the Accelerated Data Science (ADS) SDK, distributed training, batch processing and machine learning pipelines, OCI Data Science Services offer the scalability and flexibility needed to tackle complex data science and machine learning challenges. Whether you're a beginner or an experienced machine learning practitioner or data scientist, OCI Data Science Services provide the resources you need to build, train, and deploy your models with ease.

Topics

Notebook Examples

The Accelerated Data Science (ADS) SDK is a data scientist-friendly library that speeds up common data science tasks and provides an interface to other OCI services. In this section, we provide JupyterLab notebooks that offer tutorials on how to use ADS. For example, the vault.ipynb notebook demonstrates how easy it is to store your secrets in the OCI Vault service.

Conda Environment Notebooks

The OCI Data Science service uses conda environments to manage available libraries that a notebook can use. OCI Data Science provides several conda environments designed to offer the best libraries for common data science tasks. Each family of conda environments has notebooks that demonstrate how to perform various data science tasks. This section is organized around these conda environment families and provides notebooks to help you get started quickly.

Labs

This section provides examples of how to train machine learning models and deploy them on the OCI Data Science service, making it ideal for anyone looking to walk through an end-to-end problem.

Large Language Models

OCI Data Science supports LLMs in several ways:

Fine-tune, deploy and evaluate without writing code via AI Quick Actions
LangChain integration via ADS
Directly by coding Python in the service. You can find more information here

Model Catalog Examples

The Model Catalog offers a managed and centralized storage space for models. ADS helps you create the artifacts you need to use this service. However, you must provide a score.py file that loads the model and a function that makes predictions. The runtime.yaml provides information about the runtime conda environment if you want to deploy the model. You can also document a comprehensive set of metadata about the provenance of the model. This section provides examples of how to create your score.py and runtime.yaml files for various common machine learning models and configurations.

Jobs

Oracle Cloud Infrastructure (OCI) Data Science Jobs is a powerful tool that allows you to define and run repeatable machine learning tasks on a fully managed infrastructure. With Jobs, you have the flexibility to apply custom tasks to meet your specific use cases, such as data preparation, model training, hyperparameter optimization, batch inference, large model training and more.

On-demand jobs and batch processing are especially important for businesses that need to process large volumes of data on a regular basis, as they enable companies to automate data processing workflows, reduce the need for manual intervention, and save costs associated with running compute resources for extended periods of time. With the ability to define and schedule jobs to run at specific times, businesses can automate their data processing workflows and reduce the need for manual intervention. This helps to improve efficiency, reduce errors, and save valuable time and resources. Additionally, by using a fully managed infrastructure, businesses can ensure that their data processing workflows are secure and compliant with industry regulations. Overall, OCI Data Science Jobs is a powerful tool that can help businesses to scale their machine learning workflows and improve their data processing capabilities.

Distributed Training

Distributed training support with Jobs for machine learning for faster and more efficient model training on large datasets, allowing for more complex models and larger workloads to be handled. Distributed training could be used when the size of the dataset or the complexity of the model makes it difficult or impossible to train on a single machine, and when there is a need for faster model training to keep up with changing data or business requirements. This section describes our support for distributed training with Jobs for the following frameworks: Dask, Horovod, TensorFlow Distributed, and PyTorch Distributed.

Pipelines

Pipelines are essential for complex machine learning and data science tasks as they streamline and automate the model building and deployment process, enabling faster and more consistent results. They could be used when there is a need to build, train, and deploy complex models with multiple components and steps, and when there is a need to automate the machine learning process to reduce manual labor and errors. The Oracle Cloud Infrastructure Data Science Pipelines services helps automates and streamlines the process of building and deploying machine learning pipelines.

Data Labeling Examples

The data labeling service helps identify properties (labels) of documents, text, and images (records) and annotates (labels) them with those properties. This section contains Python and Java scripts to annotate bulk numbers of records in OCI Data Labeling Service (DLS).

Notebook Lifecycle Script Examples

The OCI Data Science service offers managed notebook(jupyterlab) sessions. Notebook lifecycle script features execute the customer provided scripts during CREATE/ACTIVATE/DEACTIVATE/DELETE notebook session lifecycle. This folder contains the examples script which needs little to no editing and ready to be used as lifecycle scripts input.

Feature Store

The Feature store service solves many of the problems because it is a centralized way to transform and access data for training and serving time, Feature stores help define a standardised pipeline for ingestion of data and querying of data.

Documentation

Check out the following resources for more information about the OCI Data Science and AI services:

Need Help?

Create a GitHub issue.

Contributing

This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide.

Security

The Security Guide contains information about security vulnerability disclosure process. If you discover a vulnerability, consider filing an issue.

License

Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/.

Name		Name	Last commit message	Last commit date
Latest commit History 1,643 Commits
.github/workflows		.github/workflows
LLM		LLM
actions		actions
ai-quick-actions		ai-quick-actions
ai_services		ai_services
data-wrangling		data-wrangling
data_labeling_examples		data_labeling_examples
distributed_training		distributed_training
jobs		jobs
labs		labs
llm_application		llm_application
ml-insights		ml-insights
model-deployment		model-deployment
model_catalog_examples		model_catalog_examples
model_deploy_examples		model_deploy_examples
notebook_examples		notebook_examples
notebook_lifecycle_scripts_examples		notebook_lifecycle_scripts_examples
pipelines		pipelines
pre_commit_scripts		pre_commit_scripts
use_cases		use_cases
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
.pre-commit-hooks.yaml		.pre-commit-hooks.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_LICENSES.TXT		THIRD_PARTY_LICENSES.TXT
package.json		package.json

License

oracle-samples/oci-data-science-ai-samples

Folders and files

Latest commit

History

Repository files navigation

Oracle Cloud Infrastructure Data Science and AI services Examples

Topics

Documentation

Need Help?

Contributing

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages