RDD, Dataframe and Dataset

Criteria RDD (Resilient Distributed Dataset) DataFrame DataSet
Abstraction Low level, provides a basic and simple abstraction. High level, built on top of RDDs. Provides a structured and tabular view on data. High level, built on top of DataFrames. Provides a structured and strongly-typed view on data.
Type Safety Provides compile-time type safety, since it is based on objects. Doesn't provide compile-time type safety, as it deals with semi-structured data. Provides compile-time type safety, as it deals with structured data.
Optimization Optimization needs to be manually done by the developer (like using mapreduce). Makes use of Catalyst Optimizer for optimization of query plans, leading to efficient execution. Makes use of Catalyst Optimizer for optimization.
Processing Speed Slower, as operations are not optimized. Faster than RDDs due to optimization by Catalyst Optimizer. Similar to DataFrame, it's faster due to Catalyst Optimizer.
Ease of Use Less easy to use due to the need of manual optimization. Easier to use than RDDs due to high-level abstraction and SQL-like syntax. Similar to DataFrame, it provides SQL-like syntax which makes it easier to use.
Interoperability Easy to convert to and from other types like DataFrame and DataSet. Easy to convert to and from other types like RDD and DataSet. Easy to convert to and from other types like DataFrame and RDD.