Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Add type-tracking flow for class (instance) attributes #16670

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

RasmusWL
Copy link
Member

@RasmusWL RasmusWL commented Jun 4, 2024

class MyClass:
    def set_foo(self):
        self.foo = <value>

    def uses(self):
        print(self.foo)

This PR adds flow from <value> to the use in print for type-tracking. It also handles instances, and class-level attributes. (reviewing commit-by-commit is recommended)

Implementation questions

The implementation so far uses loadStoreStep to add flow to the self/cls parameters of normal/classmethods on a class. As highlighted in the comment in the code, compared with a "simple" jump-step or levelStepNoCall, this allows any potential flow-summaries to still work.

(note: Ruby currently uses levelStepNoCall, which is where the inspiration for doing so came from).

However, there's a few things I'm not 100% sure of, so let me call those out:

  1. whether there's an implicit assumption that loadStoreStep is not allowed to cross function borders (if so, we should add that as a consistency check). If that's the case, we can implement this with jumpStep, but will sacrifice some functionality since we will need to target attribute-reads directly.

  2. performance! Since this adds a step from all n writes to every m reads of an attribute, worst case is O(n * m) = O(n²) 😞 We should be able to overcome this by adding an intermediary node that represents class-instance that self.foo = <value> can target, resulting in O(n + m) steps 👍 I haven't done anything about this yet, since I wanted to get the basic functionality in first.

  3. This PR does simple assumption that any self.foo = <value> will end up being available after the function has finished... but we could see code like self.foo = <value1>; self.foo = <value2>, where only <value2> would be available afterwards. I don't expect this will become a huge issue, so I propose we wait until we see this being a problem in real code.

  4. Subclass flow is currently not handled. I think we need proper performance solved first, and also wanted to ensure we could get agreement on basic functionality become making implementation more complex.

  5. Instance flow. This felt like the right thing to do, both so we would be able to handle examples like the one below, but also so instance and self reference within a method would behave in the same way.

    custom_sql_conn.connect() # sets the 'cursor' attribute in this function
    custom_sql_conn.cursor.execute(...)

@RasmusWL RasmusWL marked this pull request as ready for review June 10, 2024 08:31
@RasmusWL RasmusWL requested a review from a team as a code owner June 10, 2024 08:31
@RasmusWL
Copy link
Member Author

haven't written a change-note yet, I'll do that later 👍

@hvitved
Copy link
Contributor

hvitved commented Jun 10, 2024

  1. there's an implicit assumption that loadStoreStep is not allowed to cross function borders

loadStoreSteps are assumed to be local (as are loadSteps and storeSteps).

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me; I think the performance concerns are not so severe, as the reads and writes all target the class expression as an intermediate node? We should have a DCA run for sure, but it feels like it should be OK. Regarding your other questions:

  1. This sounds like a problem given Tom's comment, but I got the impression that you had an off-line conversation sorting this out?
  2. [skipped by initial comment]
  3. I agree that this is likely fine in practice and that we should see it being a problem before complicating the code. (If we had an easy "get the last locally assigned value" we could have used that.)
  4. I wonder if subclass flow will become important; I agree it can wait.
  5. I think it is cool to have instance flow. I notice that our tracking of instances is only local, but I think a proper implementation will require the full type-tracking-recursive-with-call-graph-construction-machinery that we are not using yet (and which comes with its own performance concerns).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants