Data Source Filter Predicate (For Filter Pushdown)
Filter
is the contract for filter predicates that can be pushed down to a relation (aka data source).
Filter
is used when:
-
(Data Source API V1)
BaseRelation
is requested for unhandled filter predicates (and henceBaseRelation
implementations, i.e. JDBCRelation) -
(Data Source API V1)
PrunedFilteredScan
is requested for build a scan (and hencePrunedFilteredScan
implementations, i.e. JDBCRelation) -
FileFormat
is requested to buildReader (and henceFileFormat
implementations, i.e. OrcFileFormat, CSVFileFormat, JsonFileFormat, TextFileFormat and Spark MLlib’sLibSVMFileFormat
) -
FileFormat
is requested to build a Data Reader with partition column values appended (and henceFileFormat
implementations, i.e. OrcFileFormat, ParquetFileFormat) -
RowDataSourceScanExec
is created (for a simple text representation (in a query plan tree)) -
DataSourceStrategy
execution planning strategy is requested to pruneFilterProject (when executed for LogicalRelation logical operators with a PrunedFilteredScan or a PrunedScan) -
DataSourceStrategy
execution planning strategy is requested to selectFilters -
(Data Source API V2)
SupportsPushDownFilters
is requested to pushFilters and for pushedFilters
1 2 3 4 5 6 7 8 9 10 11 |
package org.apache.spark.sql.sources abstract class Filter { // only required methods that have no implementation // the others follow def references: Array[String] } |
Method | Description |
---|---|
|
Used when:
|
Filter | Description |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Finding Column References in Any Value — findReferences
Method
1 2 3 4 5 |
findReferences(value: Any): Array[String] |
findReferences
takes the references from the value
filter is it is one or returns an empty array.
Note
|
findReferences is used when EqualTo, EqualNullSafe, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual and In filters are requested for their column references.
|