Abstraction |
Low level, provides a basic and simple abstraction. |
High level, built on top of RDDs. Provides a structured and tabular view on data. |
High level, built on top of DataFrames. Provides a structured and strongly-typed view on data. |
Type Safety |
Provides compile-time type safety, since it is based on objects. |
Doesn't provide compile-time type safety, as it deals with semi-structured data. |
Provides compile-time type safety, as it deals with structured data. |
Optimization |
Optimization needs to be manually done by the developer (like using mapreduce ). |
Makes use of Catalyst Optimizer for optimization of query plans, leading to efficient execution. |
Makes use of Catalyst Optimizer for optimization. |
Processing Speed |
Slower, as operations are not optimized. |
Faster than RDDs due to optimization by Catalyst Optimizer. |
Similar to DataFrame, it's faster due to Catalyst Optimizer. |
Ease of Use |
Less easy to use due to the need of manual optimization. |
Easier to use than RDDs due to high-level abstraction and SQL-like syntax. |
Similar to DataFrame, it provides SQL-like syntax which makes it easier to use. |
Interoperability |
Easy to convert to and from other types like DataFrame and DataSet. |
Easy to convert to and from other types like RDD and DataSet. |
Easy to convert to and from other types like DataFrame and RDD. |