关注 spark技术分享,
撸spark源码 玩spark最佳实践

Hint Framework

Hint Framework

Structured queries can be optimized using Hint Framework that allows for specifying query hints.

Query hints allow for annotating a query and give a hint to the query optimizer how to optimize logical plans. This can be very useful when the query optimizer cannot make optimal decision, e.g. with respect to join methods due to conservativeness or the lack of proper statistics.

Spark SQL supports COALESCE and REPARTITION and BROADCAST hints. All remaining unresolved hints are silently removed from a query plan at analysis.

Note
Hint Framework was added in Spark SQL 2.2.

Specifying Query Hints

You can specify query hints using Dataset.hint operator or SELECT SQL statements with hints.

SELECT SQL Statements With Hints

SELECT SQL statement supports query hints as comments in SQL query that Spark SQL translates into a UnresolvedHint unary logical operator in a logical plan.

COALESCE and REPARTITION Hints

Spark SQL 2.4 added support for COALESCE and REPARTITION hints (using SQL comments):

  • SELECT /*+ COALESCE(5) */ …​

  • SELECT /*+ REPARTITION(3) */ …​

Broadcast Hints

Spark SQL 2.2 supports BROADCAST hints using broadcast standard function or SQL comments:

  • SELECT /*+ MAPJOIN(b) */ …​

  • SELECT /*+ BROADCASTJOIN(b) */ …​

  • SELECT /*+ BROADCAST(b) */ …​

broadcast Standard Function

While hint operator allows for attaching any hint to a logical plan broadcast standard function attaches the broadcast hint only (that actually makes it a special case of hint operator).

broadcast standard function is used for broadcast joins (aka map-side joins), i.e. to hint the Spark planner to broadcast a dataset regardless of the size.

Spark Analyzer

There are the following logical rules that Spark Analyzer uses to analyze logical plans with the UnresolvedHint logical operator:

  1. ResolveBroadcastHints resolves UnresolvedHint operators with BROADCAST, BROADCASTJOIN, MAPJOIN hints to a ResolvedHint

  2. ResolveCoalesceHints resolves UnresolvedHint logical operators with COALESCE or REPARTITION hints

  3. RemoveAllHints simply removes all UnresolvedHint operators

The order of executing the above rules matters.

Hint Operator in Catalyst DSL

You can use hint operator from Catalyst DSL to create a UnresolvedHint logical operator, e.g. for testing or Spark SQL internals exploration.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Hint Framework
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏