-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(query): support javascript/python script User Defined Aggregate Function #17054
Conversation
5a56a73
to
2318578
Compare
Signed-off-by: coldWater <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed the metadata section of this PR.
For the remaining sections, I recommend having them reviewed by team members with direct expertise in those components, as they would be better positioned to evaluate the specific implementation details.
@sundy-li
Reviewed 7 of 52 files at r1.
Reviewable status: 7 of 52 files reviewed, 7 unresolved discussions (waiting on @forsaken628 and @sundy-li)
src/meta/proto-conv/tests/it/v115_add_udaf_script.rs
line 100 at r1 (raw file):
common::test_pb_from_to(func_name!(), want())?; common::test_load_old(func_name!(), bytes.as_slice(), 115, want()) }
Is this case related to the new type UDAFScript
? It looks like a duplicate of the UDF test.
Code quote:
#[test]
fn test_decode_udf_script() -> anyhow::Result<()> {
let bytes: Vec<u8> = vec![
10, 5, 109, 121, 95, 102, 110, 18, 21, 84, 104, 105, 115, 32, 105, 115, 32, 97, 32, 100,
101, 115, 99, 114, 105, 112, 116, 105, 111, 110, 50, 78, 10, 9, 115, 111, 109, 101, 32, 99,
111, 100, 101, 18, 5, 109, 121, 95, 102, 110, 26, 6, 112, 121, 116, 104, 111, 110, 34, 17,
154, 2, 8, 58, 0, 160, 6, 115, 168, 6, 24, 160, 6, 115, 168, 6, 24, 42, 17, 154, 2, 8, 74,
0, 160, 6, 115, 168, 6, 24, 160, 6, 115, 168, 6, 24, 50, 6, 51, 46, 49, 50, 46, 50, 160, 6,
115, 168, 6, 24, 42, 23, 49, 57, 55, 48, 45, 48, 49, 45, 48, 49, 32, 48, 48, 58, 48, 48,
58, 48, 48, 32, 85, 84, 67, 160, 6, 115, 168, 6, 24,
];
let want = || UserDefinedFunction {
name: "my_fn".to_string(),
description: "This is a description".to_string(),
definition: UDFDefinition::UDFScript(UDFScript {
code: "some code".to_string(),
handler: "my_fn".to_string(),
language: "python".to_string(),
arg_types: vec![DataType::Number(NumberDataType::Int32)],
return_type: DataType::Number(NumberDataType::Float32),
runtime_version: "3.12.2".to_string(),
}),
created_on: DateTime::<Utc>::default(),
};
common::test_pb_from_to(func_name!(), want())?;
common::test_load_old(func_name!(), bytes.as_slice(), 115, want())
}
src/meta/proto-conv/src/udf_from_to_protobuf_impl.rs
line 179 at r1 (raw file):
.arg_types .into_iter() .map(|arg_type| Ok(DataType::from(&TableDataType::from_pb(arg_type)?)))
Could be simplified like this?
Suggestion:
.map(|arg_type| DataType::from(&TableDataType::from_pb(arg_type)))
src/meta/proto-conv/src/udf_from_to_protobuf_impl.rs
line 191 at r1 (raw file):
DataType::from(&TableDataType::from_pb(data_type)?), )) })
Same here:
Code quote:
.map(|(name, data_type)| {
Ok(DataField::new(
name,
DataType::from(&TableDataType::from_pb(data_type)?),
))
})
src/meta/app/src/principal/user_defined_function.rs
line 56 at r1 (raw file):
pub return_type: DataType, pub runtime_version: String, }
Please add comment explaining these essential fields.
Code quote:
pub struct UDAFScript {
pub code: String,
pub language: String,
pub arg_types: Vec<DataType>,
pub state_fields: Vec<DataField>,
pub return_type: DataType,
pub runtime_version: String,
}
src/meta/protos/proto/udf.proto
line 62 at r1 (raw file):
repeated DataType arg_types = 5; repeated string state_names = 6; repeated DataType state_types = 7;
Why not using a map of <state_name, state_type>
? Does the order matter?
Code quote:
repeated string state_names = 6;
repeated DataType state_types = 7;
src/meta/protos/proto/udf.proto
line 75 at r1 (raw file):
UDFServer udf_server = 4; UDFScript udf_script = 6; // UDAFServer udaf_server = 7;
If this field is intentionally reserved, add a comment explaining the purpose of reserving it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 52 files reviewed, 7 unresolved discussions (waiting on @drmingdrmer and @sundy-li)
src/meta/proto-conv/tests/it/v115_add_udaf_script.rs
line 100 at r1 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
Is this case related to the new type
UDAFScript
? It looks like a duplicate of the UDF test.
Tests for databend_common_meta_app::principal::UDFScript are missing
src/meta/protos/proto/udf.proto
line 62 at r1 (raw file):
Previously, drmingdrmer (张炎泼) wrote…
Why not using a map of
<state_name, state_type>
? Does the order matter?
Compared to [(a,b)], ([a],[b]) will have better compatibility, and smaller size. The cost is not so intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 52 files at r1.
Reviewable status: 8 of 52 files reviewed, 7 unresolved discussions (waiting on @forsaken628 and @sundy-li)
src/meta/proto-conv/tests/it/v115_add_udaf_script.rs
line 100 at r1 (raw file):
Previously, forsaken628 (coldWater) wrote…
Tests for databend_common_meta_app::principal::UDFScript are missing
Oh. Thank you!
UDFScript
seems to be introduced in commit 6777c17, in #14799.
Thus it belongs to v081_udf_script.rs
. Please move it there. So that when a v081
is no longer supported, just to remove v081_udf_script.rs
would completely remove all the associated tests.
src/meta/protos/proto/udf.proto
line 62 at r1 (raw file):
Previously, forsaken628 (coldWater) wrote…
Compared to [(a,b)], ([a],[b]) will have better compatibility, and smaller size. The cost is not so intuitive.
The cost might be unclear, but my main concern is the relationship between the name and value. If there is a one-to-one mapping between state_name
and state_type
, it would be more appropriate to use a map instead of two separate vectors. Otherwise, the higher-level logic would need to constantly validate their correspondence before using them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
documents about UDAF @soyeric128 |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
part of #16729
Tests
Type of change
This change is