Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Speed up is_readable by 101% in embedchain/utils/misc.py #1258

Conversation

misrasaurabh1
Copy link
Contributor

@misrasaurabh1 misrasaurabh1 commented Feb 15, 2024

Description

📄 is_readable() in embedchain/utils/misc.py

📈 Performance went up by 101% (1.01x faster)

⏱️ Runtime went down from 1012.51μs to 504.30μs

Explanation and details

(click to show)

The current program is using a try-except block and calling the logging.warning function when the length of the string is zero. However, this does not need to be caught as an exception and can be handled in a simple if condition check. Also, the in operator is inefficient as it scans through all the characters in string.printable for every character in s resulting in a time complexity of O(n*m). A one-time creation of set(string.printable) can greatly improve the execution time. Here is the optimized version of the code:

This version of the program should run faster for larger inputs. Note that this program makes a trade-off between runtime and memory, using additional memory for storing string.printable in a set.

This performance optimization was generated with codeflash.ai

Type of change

  • Refactor (does not change functionality, e.g. code style improvements, linting)

How Has This Been Tested?

Generated and ran 12 tests for regression

Click to show generated tests
# imports
import pytest  # used for our unit tests
import string
import logging
from io import StringIO
from unittest.mock import patch
from embedchain.utils.misc import is_readable

# unit tests

def test_empty_string():
    # Test that an empty string returns False and logs a warning
    with patch('logging.warning') as mock_warning:
        assert not is_readable("")
    mock_warning.assert_called_once_with("Empty string processed as unreadable")

def test_100_percent_printable():
    # Test that a string with 100% printable characters returns True
    assert is_readable("Hello, World!")

def test_less_than_95_percent_printable():
    # Test that a string with less than 95% printable characters returns False
    assert not is_readable("Hello\x00\x01\x02\x03World")

def test_exactly_95_percent_printable():
    # Test that a string with exactly 95% printable characters returns True
    assert is_readable("A" * 19 + "\x00")

def test_very_short_string():
    # Test that a very short string that is readable returns True
    assert is_readable("A")

def test_very_long_string():
    # Test that a very long string that is readable returns True
    assert is_readable("A" * 10000)

def test_high_proportion_non_printable():
    # Test that a string with a high proportion of non-printable characters returns False
    assert not is_readable("\x00\x01\x02" * 10 + "ABC")

def test_mixed_scripts():
    # Test that a string with mixed scripts still returns True if it's readable
    assert is_readable("English中文العربيةрусский")

def test_random_printable_characters():
    # Test that a string of random printable characters returns True
    assert is_readable("!@#$%^&*()_+")

def test_non_standard_whitespace():
    # Test that a string with non-standard whitespace characters returns True
    assert is_readable("\f\vHello World\f\v")

def test_unicode_printable():
    # Test that a string with Unicode characters that are printable returns True
    assert is_readable("你好,世界")

def test_unicode_non_printable():
    # Test that a string with Unicode characters that are not printable returns False
    assert not is_readable("\u200b\u200c\u200d")

def test_non_string_input():
    # Test that non-string inputs raise an appropriate exception
    with pytest.raises(TypeError):
        is_readable(["Hello", "World"])
    with pytest.raises(TypeError):
        is_readable({"message": "Hello World"})

def test_logging_empty_string():
    # Test that passing an empty string triggers a warning log message
    with patch('logging.warning') as mock_warning:
        is_readable("")
    mock_warning.assert_called_with("Empty string processed as unreadable")

def test_emojis():
    # Test that a string with emojis which are printable returns True
    assert is_readable("Hello 👋 World 🌍")

# Note: The test_non_string_input assumes that the is_readable function should raise a TypeError when
# the input is not a string. This behavior is not implemented in the original function, but it would be
# a reasonable expectation and thus is included here as a potential improvement to the function.
  • Test Script (please provide)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Made sure Checks passed

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Feb 15, 2024
Copy link
Collaborator

@deshraj deshraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for fixing this @misrasaurabh1. 🚀

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 15, 2024
Copy link

codecov bot commented Feb 15, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (2985b66) 56.60% compared to head (56fd5d6) 56.56%.
Report is 19 commits behind head on main.

Files Patch % Lines
embedchain/utils/misc.py 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1258      +/-   ##
==========================================
- Coverage   56.60%   56.56%   -0.05%     
==========================================
  Files         146      146              
  Lines        5923     5944      +21     
==========================================
+ Hits         3353     3362       +9     
- Misses       2570     2582      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@deshraj deshraj merged commit 9a11683 into mem0ai:main Feb 16, 2024
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants