Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Speed up _auto_encoder() by 15% in embedchain/helpers/json_serializable.py #1265

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

misrasaurabh1
Copy link
Contributor

Description

📄 _auto_encoder() in embedchain/helpers/json_serializable.py

📈 Performance went up by 15% (0.15x faster)

⏱️ Runtime went down from 163.10μs to 142.40μs

Explanation and details

(click to show)

Here's a more optimized version of the Python program provided. It provides an improved runtime by avoiding unnecessary dictionary copying and using dictionary comprehension instead of a loop to populate the new dictionary.

This version doesn't change the signature or return value of the function and preserves all the original's necessary features. However, it can provide some runtime efficiency because it directly creates the dct dictionary from obj.__dict__, skipping the need to make a copy first, and avoiding an unnecessary dictionary size change during iteration. It also replaces the del dct[key] operation with simply not adding the py-value pair to the dictionary if json.dumps(value) fails.

Type of change

Please delete options that are not relevant.

  • Refactor (does not change functionality, e.g. code style improvements, linting)

How Has This Been Tested?

The new optimized code was tested for correctness. The results are listed below.

  • Test Script (please provide)

✅ 7 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
import pytest  # used for our unit tests
import json
from typing import Any, Union
from embedchain.helpers.json_serializable import JSONSerializable
# Define a Template class for testing purposes
class Template:
    def __init__(self, template_data):
        self.template = template_data

# Define a subclass of JSONSerializable for testing purposes
class SerializableSubclass(JSONSerializable):
    def serialize(self):
        return json.dumps(self.__dict__)

# Helper function to create a JSONSerializable instance with a given dictionary
def create_json_serializable_instance(attributes):
    instance = JSONSerializable()
    for key, value in attributes.items():
        setattr(instance, key, value)
    return instance

# unit tests

# Test simple data types
def test_simple_data_types():
    obj = create_json_serializable_instance({
        'int': 1,
        'str': 'string',
        'float': 1.5,
        'bool': True
    })
    encoded = JSONSerializable._auto_encoder(obj)
    assert encoded['int'] == 1
    assert encoded['str'] == 'string'
    assert encoded['float'] == 1.5
    assert encoded['bool'] is True
    assert encoded['__class__'] == 'JSONSerializable'

# Test nested JSONSerializable instances
def test_nested_json_serializable():
    nested_obj = create_json_serializable_instance({'nested_int': 2})
    obj = create_json_serializable_instance({'nested': nested_obj})
    encoded = JSONSerializable._auto_encoder(obj)
    assert encoded['nested']['nested_int'] == 2
    assert encoded['__class__'] == 'JSONSerializable'

# Test non-serializable attributes
def test_non_serializable_attributes():
    class NonSerializable:
        pass

    obj = create_json_serializable_instance({'non_serializable': NonSerializable()})
    with pytest.raises(TypeError):
        JSONSerializable._auto_encoder(obj)

# Test Template instances
def test_template_instance():
    template_obj = Template('template_data')
    obj = create_json_serializable_instance({'template': template_obj})
    encoded = JSONSerializable._auto_encoder(obj)
    assert encoded['template']['__type__'] == 'Template'
    assert encoded['template']['data'] == 'template_data'
    assert encoded['__class__'] == 'JSONSerializable'

# Test list, tuple, and dictionary attributes
def test_collection_attributes():
    obj = create_json_serializable_instance({
        'list': [1, 'two', 3.0],
        'tuple': (1, 'two', 3.0),
        'dict': {'key': 'value'}
    })
    encoded = JSONSerializable._auto_encoder(obj)
    assert encoded['list'] == [1, 'two', 3.0]
    assert encoded['tuple'] == [1, 'two', 3.0]  # Tuples should be converted to lists
    assert encoded['dict'] == {'key': 'value'}
    assert encoded['__class__'] == 'JSONSerializable'

# Test objects not instances of JSONSerializable
def test_non_json_serializable_instance():
    obj = object()  # A plain object
    with pytest.raises(TypeError):
        JSONSerializable._auto_encoder(obj)

# Test circular references
def test_circular_references():
    obj = create_json_serializable_instance({})
    obj.circular = obj  # Create a circular reference
    # Circular references are not handled by the function and will cause
    # a RecursionError when trying to serialize with json.dumps.
    with pytest.raises(RecursionError):
        JSONSerializable._auto_encoder(obj)

# Test special methods that could affect serialization
def test_special_methods():
    class CustomRepr(JSONSerializable):
        def __repr__(self):
            return "CustomRepr()"

    obj = CustomRepr()
    encoded = JSONSerializable._auto_encoder(obj)
    assert encoded['__class__'] == 'CustomRepr'

# Test attributes that change during serialization
def test_attributes_change_during_serialization():
    from datetime import datetime
    obj = create_json_serializable_instance({'date': datetime(2021, 1, 1)})
    # The datetime object is not serializable by default, so it should be removed
    encoded = JSONSerializable._auto_encoder(obj)
    assert 'date' not in encoded
    assert encoded['__class__'] == 'JSONSerializable'

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Made sure Checks passed

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 16, 2024
Copy link

codecov bot commented Feb 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2985b66) 56.60% compared to head (292061f) 56.59%.
Report is 20 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1265      +/-   ##
==========================================
- Coverage   56.60%   56.59%   -0.02%     
==========================================
  Files         146      146              
  Lines        5923     5956      +33     
==========================================
+ Hits         3353     3371      +18     
- Misses       2570     2585      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant