对于Command和BaseRelation,在sql.hive包内有更多实现
MetastoreRelation的作用在Hive一节会说明。
Command略。
UnaryNode
BinaryNode
Spark Plan
SparkPlan类继承结构如下图:
在SQL模块的execution package的basicOperator类里,有许多SparkPlan的实现,包括
Project,Filter,Sample,Union,StopAfter,TopK,Sort,ExsitingRdd
这些实现和Catalyst的basicOperator类里有很多重了,区别在于,SparkPlan是QueryPlan的实现,同logical plan不同的是,SparkPlan会被Spark实现的Strategy真正执行,所以SQL模块里的basicOperator内的这些caseclass,比Catalyst多了execute()方法
具体Spark策略的实现参考下一小节。
Planning
Query Planner
QueryPlanner的职责是把逻辑执行计划转化成为物理执行计划,具备一系列Strategy的实现。
abstract class QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]] {
/** A list of execution strategies that can be used by the planner */
def strategies: Seq[Strategy]
/**
* Given a [[plans.logical.LogicalPlan LogicalPlan]], returns a list of `PhysicalPlan`s that can
* be used for execution. If this strategy does not apply to the give logical operation then an
* empty list should be returned.
*/
abstract protected class Strategy extends Logging {
def apply(plan: LogicalPlan): Seq[PhysicalPlan]
}
/**
* Returns a placeholder for a physical plan that executes `plan`. This placeholder will be
* filled in automatically by the QueryPlanner using the other execution strategies that are
* available.
*/
protected def planLater(plan: LogicalPlan) = apply(plan).next()
def apply(plan: LogicalPlan): Iterator[PhysicalPlan] = {
// Obviously a lot to do here still...
val iter = strategies.view.flatMap(_(plan)).toIterator
assert(iter.hasNext, s"No plan for $plan")
iter
}
}
QueryPlanner impl
目前的实现是SparkStrategies
在SQLContext里的使用是SparkPlanner:
protected[sql] class SparkPlanner extends SparkStrategies {
val sparkContext = self.sparkContext
val strategies: Seq[Strategy] =
TopK ::
PartialAggregation ::
SparkEquiInnerJoin ::
BasicOperators ::
CartesianProduct ::
BroadcastNestedLoopJoin :: Nil
}
在HiveContext里的使用是带了hive策略的SparkPlanner:
val hivePlanner = new SparkPlanner with HiveStrategies {
val hiveContext = self
override val strategies: Seq[Strategy] = Seq(
TopK,
ColumnPrunings,
PartitionPrunings,
HiveTableScans,
DataSinks,
Scripts,
PartialAggregation,
SparkEquiInnerJoin,
BasicOperators,
CartesianProduct,
BroadcastNestedLoopJoin
)
}
Strategy & impl
Strategy的实现主要包含Spark Strategy和Hive Strategy。前者基本上对应了sql.execution包里的类。后者是在Spark策略的基础上附加的一些策略。Expression
Expression几个属性:
1. 带DataType,并且自带一些inline方法帮助一些dataType的转换
2. 带reference,reference是Seq[Attribute],Attribute是NamedExpression子类。
3. foldable ,即静态可以直接执行的表达式
Expression里只有Literal可折叠,Literal是LeafExpression,根据dataType生成不同类型表达式
object Literal {
def apply(v: Any): Literal = v match {
case i: Int => Literal(i, IntegerType)
case l: Long => Literal(l, LongType)
case d: Double => Literal(d, DoubleType)
case f: Float => Literal(f, FloatType)
case b: Byte => Literal(b, ByteType)
case s: Short => Literal(s, ShortType)
case s: String => Literal(s, StringType)
case b: Boolean => Literal(b, BooleanType)
case null => Literal(null, NullType)
}
}
case class Literal(value: Any, dataType: DataType) extends LeafExpression {
override def foldable = true
def nullable = value == null
def references = Set.empty
override def toString = if (value != null) value.toString else "null"
type eva luatedType = Any
override def apply(input: Row):Any = value // 执行这个叶子表达式的话就是返回value值
}
4. resolved 具体关心children是否都resolved。
childeren是TreeNode里的概念,在TreeNode里是一个Seq[BaseType],而BaseType是TreeNode[T]里的范型。在Expression这里,即TreeNode[Expression],BaseType就是Expression。
Expression继承结构
抽象子类如下:
abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression] {
self: Product =>
def symbol: String