Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds documentation and example recommending Vec<ArrayRef> over ChunkedArray #6527

Merged
merged 5 commits into from
Oct 10, 2024

Conversation

efredine
Copy link
Contributor

@efredine efredine commented Oct 8, 2024

Which issue does this PR close?

Closes #5295.

Rationale for this change

As described in issue.

What changes are included in this PR?

Adds documentation and an example

Are there any user-facing changes?

Only documentation and example.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 8, 2024
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example could be drastically simplified by using the iterators fully, e.g.

let values: Vec<f32> = batches.iter().flat_map(|x| x.column(0).as_primitive::<Float32Type>().values()).collect();

let values: Vec<&str> = batches.iter().flat_map(|x| x.column(1).as_string()).map(Option::unwrap).collect();

I can't see an obvious reason to collect into separate Vecs, this just adds additional overheads, and is less concise.

I think it is also worth highlighting that many use-cases won't need to cast down from the ArrayRef at all, with most kernels accept &dyn Array.

arrow/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines 51 to 57
// chunked_array_by_index is an array of two Vec<ArrayRef> where each Vec<ArrayRef> is a column
let mut chunked_array_by_index = [Vec::new(), Vec::new()];
for batch in &batches {
for (i, array) in batch.columns().iter().enumerate() {
chunked_array_by_index[i].push(array.clone());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collecting into separate Vec like this is unnecessary, see below

arrow/examples/chunked_arrays.rs Outdated Show resolved Hide resolved
arrow/src/lib.rs Outdated Show resolved Hide resolved
@efredine
Copy link
Contributor Author

efredine commented Oct 8, 2024

Thanks @tustvold - I've updated the text, simplified the example as suggested and moved it to arrow_array. (I also deleted the superfluous example I had created).

@@ -21,7 +21,7 @@

- [`builders.rs`](builders.rs): Using the Builder API
- [`collect.rs`](collect.rs): Using the `FromIter` API
- [`dynamic_types.rs`](dynamic_types.rs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change - this example's description was missing from the README.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you

arrow-array/src/lib.rs Outdated Show resolved Hide resolved
@tustvold tustvold merged commit 44b6ded into apache:master Oct 10, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document ChunkedArray Abstractions
2 participants