Skip to content

Commit

Permalink
[SPARK-50755][SQL] Pretty plan display for InsertIntoHiveTable
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Add `toString` for `HiveFileFormat` and `HiveTempPath` to make the display of `InsertIntoHiveTable` plan pretty.

### Why are the changes needed?

I found the current plan replacing rules does not handle tailing object hash properly https://github.com/apache/spark/blob/36d23eff4b4c3a2b8fd301672e532132c96fdd68/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala#L62

instead of fixing the replacing rule(see #49396, and please let me know if any reviewer think we should fix that too), seems we can override the `toString` of those classes to make it display pretty.

Minor improvements of plan display for `InsertIntoHiveTable`, and make it consistent with `DataSource` plan like `InsertIntoHadoopFsRelationCommand`

`InsertIntoHadoopFsRelationCommand`:
```
-- !query
insert into t6 values (97)
-- !query analysis
InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_dir}/t6, false, Parquet, [path=file:[not included in comparison]/{warehouse_dir}/t6], Append, `spark_catalog`.`default`.`t6`, org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included in comparison]/{warehouse_dir}/t6), [ascii]
+- Project [cast(col1#x as bigint) AS ascii#xL]
   +- LocalRelation [col1#x]
```

`InsertIntoHiveTable`:
```patch
 -- !query
 insert into table spark_test_json_2021_07_16_01 values(1, 'a')
 -- !query analysis
-InsertIntoHiveTable `spark_catalog`.`default`.`spark_test_json_2021_07_16_01`, false, false, [c1, c2], org.apache.spark.sql.hive.execution.HiveFileFormatxxxxxxxx, org.apache.spark.sql.hive.execution.HiveTempPath69beda67
+InsertIntoHiveTable `spark_catalog`.`default`.`spark_test_json_2021_07_16_01`, false, false, [c1, c2], Hive, HiveTempPath(file:[not included in comparison]/{warehouse_dir}/spark_test_json_2021_07_16_01)
 +- Project [cast(col1#x as int) AS c1#x, cast(col2#x as string) AS c2#x]
    +- LocalRelation [col1#x, col2#x]
```

### Does this PR introduce _any_ user-facing change?

It affects the `EXPLAIN` outputs and Spark UI `SQL/DataFrame` tab plan display

### How was this patch tested?

See the above examples.

Spark does not have SQL tests related to the `hive` module, I identified this issue when porting internal test cases to the 4.0. Since all existing SQL tests live on the `sql` module, adding hive-related tests is impossible.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49400 from pan3793/SPARK-50755.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
pan3793 authored and dongjoon-hyun committed Jan 8, 2025
1 parent cfb2e40 commit 0b443f4
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)

override def shortName(): String = "hive"

override def toString: String = "Hive"

override def inferSchema(
sparkSession: SparkSession,
options: Map[String, String],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,6 @@ class HiveTempPath(session: SparkSession, val hadoopConf: Configuration, path: P
def deleteIfNotStagingDir(path: Path, fs: FileSystem): Unit = {
if (Option(path) != stagingDirForCreating) fs.delete(path, true)
}

override def toString: String = s"HiveTempPath($path)"
}

0 comments on commit 0b443f4

Please sign in to comment.