Image Classification with Microsoft Vision Model ResNet-50

The Microsoft Vision Model ResNet-50 is a powerful pretrained vision model created by the Multimedia Group at Microsoft Bing. It is a 50-layer deep convolutional neural network (CNN) trained on more than 1 million images from ImageNet. By leveraging multi-task learning and optimizing separately for four datasets, including ImageNet-22k, Microsoft COCO, and two web-supervised datasets containing 40 million image-label pairs, the model achieves state-of-the-art performance in image classification tasks.

This project utilizes the Hono framework to build a Cloudflare Worker that exposes an API endpoint for image classification. It integrates with Cloudflare AI to run the Microsoft Vision Model ResNet-50 and classify images based on either image URLs or file uploads.

Technologies Used

Hono: A lightweight web framework for building fast and scalable applications on Cloudflare Workers.
Cloudflare Workers: A serverless execution environment that allows running JavaScript and TypeScript code at the edge, close to users.
Cloudflare AI: A set of APIs and tools provided by Cloudflare for integrating AI capabilities into applications.

Features

Accepts both image URLs and file uploads for classification.
Validates input using Zod schema validation.
Supports CORS and CSRF protection middleware.
Implements JWT authentication middleware for secure access to the API.
Handles errors gracefully and returns appropriate error responses.
Provides an optional model parameter to specify the model for additional analysis.
- Supported models: llama and gemma.
- If the model parameter is not provided or is set to a value other than llama or gemma, only image classification is performed without additional analysis.

API Endpoint

URL: /api/classify/:model?
- :model (optional): Specifies the model to use for additional analysis. Supported values: llama and gemma.
Method: POST
Authentication: JWT token required in the Authorization header.
Request Body: JSON array of image objects, each containing either a url or file property.
- url: The URL of the image to classify (optional).
- file: The uploaded image file to classify (optional).
Response: JSON object containing an array of responses for each image.
- Each response includes:
  - classification: An array of classification results, each containing a label and a score.
  - analysis (optional): The analysis summary generated by the specified model, if a supported model is provided.

Usage

Set up a Cloudflare Worker and configure the necessary environment variables:
- AI: Your Cloudflare AI API token.
- JWT_SECRET: The secret key used for JWT authentication.
Deploy the worker code to your Cloudflare Worker.

Make a POST request to the /api/classify endpoint with the following payload:

[
	{
		"url": "https://example.com/image1.jpg"
	},
	{
		"file": "<uploaded_file>"
	}
]

Replace <uploaded_file> with the actual file upload.

You can also specify an optional model parameter in the URL to use a specific model for analysis. The available models are llama and gemma. If the model parameter is not provided or is set to a value other than llama or gemma, only image classification will be performed without additional analysis.

Here are example cURL commands to classify images:

Classify an image using a URL:

curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <your-jwt-token>" -d '[{"url": "https://example.com/image1.jpg"}]' https://your-worker-url.com/api/classify

Classify an image using a file upload:

curl -X POST -H "Content-Type: multipart/form-data" -H "Authorization: Bearer <your-jwt-token>" -F "file=@/path/to/image.jpg" https://your-worker-url.com/api/classify

Classify an image using a URL with the llama model for analysis:

curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <your-jwt-token>" -d '[{"url": "https://example.com/image1.jpg"}]' https://your-worker-url.com/api/classify/llama

Classify an image using a file upload with the gemma model for analysis:

curl -X POST -H "Content-Type: multipart/form-data" -H "Authorization: Bearer <your-jwt-token>" -F "file=@/path/to/image.jpg" https://your-worker-url.com/api/classify/gemma

Replace <your-jwt-token> with your actual JWT token and https://your-worker-url.com with the URL of your deployed Cloudflare Worker.

The API will return a JSON response with the classification results and analysis (if applicable) for each image:

{
	"responses": [
		{
			"classification": [
				{
					"label": "dog",
					"score": 0.9
				},
				{
					"label": "animal",
					"score": 0.8
				}
			],
			"analysis": "The image contains a dog, which is a type of animal. The classification scores indicate a high confidence in the presence of a dog in the image."
		},
		{
			"classification": [
				{
					"label": "cat",
					"score": 0.95
				},
				{
					"label": "animal",
					"score": 0.85
				}
			],
			"analysis": "The image depicts a cat, which belongs to the animal category. The high classification scores suggest a strong likelihood of a cat being present in the image."
		}
	]
}

If the model parameter is not provided or is set to a value other than llama or gemma, the analysis field will be absent in the response.

Limitations

The Microsoft Vision Model ResNet-50 is pretrained on a specific set of image categories. It may not perform well on images outside its training domain.
The model accepts only certain image formats, such as JPEG, PNG, and GIF. Other formats may not be supported.
The performance of the model may vary depending on the quality and resolution of the input images.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
wrangler.example.toml		wrangler.example.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.editorconfig

.editorconfig

.gitignore

.gitignore

.prettierrc

.prettierrc

LICENSE

LICENSE

README.md

README.md

package-lock.json

package-lock.json

package.json

package.json

tsconfig.json

tsconfig.json

wrangler.example.toml

wrangler.example.toml

Repository files navigation

Image Classification with Microsoft Vision Model ResNet-50

Technologies Used

Features

API Endpoint

Usage

Limitations

Contributing

License

About

Releases

Packages

Languages

License

llegomark/image-classification-resnet-50

Folders and files

Latest commit

History

Repository files navigation

Image Classification with Microsoft Vision Model ResNet-50

Technologies Used

Features

API Endpoint

Usage

Limitations

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages