[SPARK-50755][SQL] Pretty plan display for InsertIntoHiveTable

### What changes were proposed in this pull request? Add `toString` for `HiveFileFormat` and `HiveTempPath` to make the display of `InsertIntoHiveTable` plan pretty. ### Why are the changes needed? I found the current plan replacing rules does not handle tailing object hash properly https://github.com/apache/spark/blob/36d23eff4b4c3a2b8fd301672e532132c96fdd68/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala#L62 instead of fixing the replacing rule(see #49396, and please let me know if any reviewer think we should fix that too), seems we can override the `toString` of those classes to make it display pretty. Minor improvements of plan display for `InsertIntoHiveTable`, and make it consistent with `DataSource` plan like `InsertIntoHadoopFsRelationCommand` `InsertIntoHadoopFsRelationCommand`: ``` -- !query insert into t6 values (97) -- !query analysis InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_dir}/t6, false, Parquet, [path=file:[not included in comparison]/{warehouse_dir}/t6], Append, `spark_catalog`.`default`.`t6`, org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included in comparison]/{warehouse_dir}/t6), [ascii] +- Project [cast(col1#x as bigint) AS ascii#xL] +- LocalRelation [col1#x] ``` `InsertIntoHiveTable`: ```patch -- !query insert into table spark_test_json_2021_07_16_01 values(1, 'a') -- !query analysis -InsertIntoHiveTable `spark_catalog`.`default`.`spark_test_json_2021_07_16_01`, false, false, [c1, c2], org.apache.spark.sql.hive.execution.HiveFileFormatxxxxxxxx, org.apache.spark.sql.hive.execution.HiveTempPath69beda67 +InsertIntoHiveTable `spark_catalog`.`default`.`spark_test_json_2021_07_16_01`, false, false, [c1, c2], Hive, HiveTempPath(file:[not included in comparison]/{warehouse_dir}/spark_test_json_2021_07_16_01) +- Project [cast(col1#x as int) AS c1#x, cast(col2#x as string) AS c2#x] +- LocalRelation [col1#x, col2#x] ``` ### Does this PR introduce _any_ user-facing change? It affects the `EXPLAIN` outputs and Spark UI `SQL/DataFrame` tab plan display ### How was this patch tested? See the above examples. Spark does not have SQL tests related to the `hive` module, I identified this issue when porting internal test cases to the 4.0. Since all existing SQL tests live on the `sql` module, adding hive-related tests is impossible. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49400 from pan3793/SPARK-50755. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
apache · Jan 8, 2025 · 0b443f4 · 0b443f4
1 parent cfb2e40
commit 0b443f4
Show file tree

Hide file tree

Showing 2 changed files with 4 additions and 0 deletions.
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
@@ -55,6 +55,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)
 
   override def shortName(): String = "hive"
 
+  override def toString: String = "Hive"
+
   override def inferSchema(
       sparkSession: SparkSession,
       options: Map[String, String],

diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala
@@ -165,4 +165,6 @@ class HiveTempPath(session: SparkSession, val hadoopConf: Configuration, path: P
   def deleteIfNotStagingDir(path: Path, fs: FileSystem): Unit = {
     if (Option(path) != stagingDirForCreating) fs.delete(path, true)
   }
+
+  override def toString: String = s"HiveTempPath($path)"
 }