共计 776 个字符,预计需要花费 2 分钟才能阅读完成。
深夜水文一篇,这是n天前碰到的一个问题:
Exception in thread “main” org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
出现这个问题的是发生在join的时候,同时也会报一些 broadcast 异常,为了解决这个问题,你需要做以下两个配置
spark.sql.broadcastTimeout |
300 | Timeout in seconds for the broadcast wait time in broadcast joins | 1.3.0 |
---|---|---|---|
spark.sql.autoBroadcastJoinThreshold |
10485760 (10 MB) | Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 broadcasting can be disabled. Note that currently statistics are only supported for Hive Metastore tables where the command ANALYZE TABLE <tableName> COMPUTE STATISTICS noscan has been run. |
1.1.0 |
上面是默认的配置,为了解决上面的问题,我设置如下
spark = SparkSession
.builder
.appName("test")
.config("spark.sql.broadcastTimeout", "1800")
.config("spark.sql.autoBroadcastJoinThreshold","-1")
.getOrCreate()
正文完
请博主喝杯咖啡吧!