The dtypes
attribute in Spark is used to retrieve the schema of a DataFrame in the form of a list of tuples, where each tuple contains the column name and its corresponding data type. This is particularly useful for inspecting the structure of the data and understanding the data types of each column.
PySpark:
Spark SQL:
DESCRIBE table_name
to achieve similar results.string
, int
, double
).PySpark:
Output:
PySpark:
Output:
PySpark:
Output:
PySpark:
Output:
PySpark:
Output:
PySpark:
Output:
dtypes
is lightweight and does not involve data movement or processing.dtypes
attribute is used to retrieve the schema of a DataFrame in the form of a list of tuples.dtypes
is a metadata operation and does not involve data processing, making it very efficient.DESCRIBE table_name
.