Data Type | Example | Use Case | When to Use | Range of Values |
---|---|---|---|---|
ByteType | 1, 127, -128 | Small integers, memory efficiency is key | Small datasets like age, small integer IDs, counters | -128 to 127 |
ShortType | 100, 32767, -100 | Slightly larger integers | Small integer IDs, counters where ByteType’s range is insufficient | -32768 to 32767 |
IntegerType | 1000, 2147483647 | Most common integers | Default for integers unless larger range or memory efficiency is paramount | -2,147,483,648 to 2,147,483,647 |
LongType | 1e9, 9223372036854775807 | Very large integers | Timestamps (milliseconds), large IDs, counters | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
FloatType | 3.14, -2.718 | Single-precision floating-point numbers | Floating-point numbers where memory efficiency is a concern, scientific data | Approximately ±3.4 × 10−38 to ±3.4 × 1038 |
DoubleType | 3.14159, -2.71828 | Double-precision floating-point numbers | Default for floating-point numbers, higher precision needed | Approximately ±4.9 × 10−324 to ±1.8 × 10308 |
DecimalType | 123.45, 0.0001 | Arbitrary-precision decimal numbers | High precision for financial/scientific calculations | Arbitrary precision (defined by precision and scale) |
StringType | 'hello', 'world' | Text data | Any textual information | Variable length, limited only by available memory |
BooleanType | true, false | True/false values | Flags, indicators, binary choices | true , false |
DateType | 2024-01-27 | Dates | Storing and manipulating dates | YYYY-MM-DD (date only) |
TimestampType | 2024-01-27 10:00 | Dates and times | Storing and manipulating timestamps | YYYY-MM-DD HH:mm:ss[.fffffffff] (date and time) |
BinaryType | b'data' | Raw binary data | Images, audio, other non-textual binary data | Variable length, sequence of bytes |
ArrayType | [1,2,3] | Lists/arrays of values | Single column storing multiple values | Variable length, all elements of the same type |
MapType | {'a':1, 'b':2} | Key-value pairs | Storing structured data where each key maps to a value | Variable length, keys and values of the same type respectively |
StructType | {'name':'Bob'} | Complex data structures | Storing nested/hierarchical data | Variable length, defined by the schema |
YearMonthIntervalType | ||||
DayTimeIntervalType |
Spark Data Type | SQL Server Data Type | Notes |
---|---|---|
ByteType | tinyint | Direct equivalent. |
ShortType | smallint | Direct equivalent. |
IntegerType | int | Direct equivalent. |
LongType | bigint | Direct equivalent. |
FloatType | real | Direct equivalent. |
DoubleType | float | Direct equivalent. |
DecimalType | decimal , numeric | Requires matching precision and scale. |
StringType | varchar , nvarchar , char , nchar | Choose based on length and character encoding (Unicode or not). |
BooleanType | bit | Direct equivalent. |
DateType | date | Direct equivalent. |
TimestampType | datetime2 , datetimeoffset | datetime2 is generally preferred for better precision. datetimeoffset handles time zones. |
BinaryType | varbinary , binary | Choose based on length. |
ArrayType | table (with appropriate schema) | Requires creating a separate table to represent the array. No direct equivalent. |
MapType | table (with appropriate schema) | Requires creating a separate table to represent the map. No direct equivalent. |
StructType | table (with appropriate schema) | Requires creating a separate table to represent the struct. No direct equivalent. |
Q: Which numeric data type should be used if you want to avoid rounding error?
Q: What is the maximum number of characters can be stored in StringType?