Python: Add type-tracking flow for class (instance) attributes #16670

RasmusWL · 2024-06-04T12:36:35Z

class MyClass:
    def set_foo(self):
        self.foo = <value>

    def uses(self):
        print(self.foo)

This PR adds flow from <value> to the use in print for type-tracking. It also handles instances, and class-level attributes. (reviewing commit-by-commit is recommended)

Implementation questions

The implementation so far uses loadStoreStep to add flow to the self/cls parameters of normal/classmethods on a class. As highlighted in the comment in the code, compared with a "simple" jump-step or levelStepNoCall, this allows any potential flow-summaries to still work.

(note: Ruby currently uses levelStepNoCall, which is where the inspiration for doing so came from).

However, there's a few things I'm not 100% sure of, so let me call those out:

whether there's an implicit assumption that loadStoreStep is not allowed to cross function borders (if so, we should add that as a consistency check). If that's the case, we can implement this with jumpStep, but will sacrifice some functionality since we will need to target attribute-reads directly.
performance! Since this adds a step from all n writes to every m reads of an attribute, worst case is O(n * m) = O(n²) 😞 We should be able to overcome this by adding an intermediary node that represents class-instance that self.foo = <value> can target, resulting in O(n + m) steps 👍 I haven't done anything about this yet, since I wanted to get the basic functionality in first.
This PR does simple assumption that any self.foo = <value> will end up being available after the function has finished... but we could see code like self.foo = <value1>; self.foo = <value2>, where only <value2> would be available afterwards. I don't expect this will become a huge issue, so I propose we wait until we see this being a problem in real code.
Subclass flow is currently not handled. I think we need proper performance solved first, and also wanted to ensure we could get agreement on basic functionality become making implementation more complex.
Instance flow. This felt like the right thing to do, both so we would be able to handle examples like the one below, but also so instance and self reference within a method would behave in the same way.
```
custom_sql_conn.connect() # sets the 'cursor' attribute in this function
custom_sql_conn.cursor.execute(...)
```

RasmusWL · 2024-06-10T08:33:32Z

haven't written a change-note yet, I'll do that later 👍

hvitved · 2024-06-10T09:03:13Z

there's an implicit assumption that loadStoreStep is not allowed to cross function borders

loadStoreSteps are assumed to be local (as are loadSteps and storeSteps).

yoff

This looks good to me; I think the performance concerns are not so severe, as the reads and writes all target the class expression as an intermediate node? We should have a DCA run for sure, but it feels like it should be OK. Regarding your other questions:

This sounds like a problem given Tom's comment, but I got the impression that you had an off-line conversation sorting this out?
[skipped by initial comment]
I agree that this is likely fine in practice and that we should see it being a problem before complicating the code. (If we had an easy "get the last locally assigned value" we could have used that.)
I wonder if subclass flow will become important; I agree it can wait.
I think it is cool to have instance flow. I notice that our tracking of instances is only local, but I think a proper implementation will require the full type-tracking-recursive-with-call-graph-construction-machinery that we are not using yet (and which comes with its own performance concerns).

RasmusWL added 8 commits June 3, 2024 11:59

Python: Add tests for class/instance attribute flow (type-tracking)

80bce76

Python: Add tt flow from attr-write on cls to class reference

41265dd

Python: Add tt flow from class-attr to self (in methods)

7854ee4

Python: Also flow to instances

2ee7bd6

Python: Also flow from writes to self

c3b4c9f

Python: Add tt flow from class attribute to cls param of classmethods

156fad3

Python: Expand internal comment

2b8e00c

Python: Expand subclass flow example

064c383

RasmusWL mentioned this pull request Jun 4, 2024

Python: Dataflow fails when Class attributes are accessed as Instance attributes. #16501

Open

github-actions bot added the Python label Jun 4, 2024

RasmusWL mentioned this pull request Jun 4, 2024

Python: Add tracking steps for class level attributes #16526

Closed

RasmusWL added 2 commits June 10, 2024 10:25

Python: Accept improvements to other tests

76419d8

Merge branch 'main' into class-self-attributes

6f69e2d

RasmusWL marked this pull request as ready for review June 10, 2024 08:31

RasmusWL requested a review from a team as a code owner June 10, 2024 08:31

yoff approved these changes Dec 10, 2024

View reviewed changes

RasmusWL mentioned this pull request Jan 6, 2025

General issue: Missing vulnerability reports due to incomplete self variable reference relationships in Python classes #18374

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Add type-tracking flow for class (instance) attributes #16670

Python: Add type-tracking flow for class (instance) attributes #16670

RasmusWL commented Jun 4, 2024

RasmusWL commented Jun 10, 2024

hvitved commented Jun 10, 2024

yoff left a comment

Python: Add type-tracking flow for class (instance) attributes #16670

Are you sure you want to change the base?

Python: Add type-tracking flow for class (instance) attributes #16670

Conversation

RasmusWL commented Jun 4, 2024

Implementation questions

RasmusWL commented Jun 10, 2024

hvitved commented Jun 10, 2024

yoff left a comment

Choose a reason for hiding this comment