Data TypeExampleUse CaseWhen to UseRange of Values
ByteType1, 127, -128Small integers, memory efficiency is keySmall datasets like age, small integer IDs, counters-128 to 127
ShortType100, 32767, -100Slightly larger integersSmall integer IDs, counters where ByteType’s range is insufficient-32768 to 32767
IntegerType1000, 2147483647Most common integersDefault for integers unless larger range or memory efficiency is paramount-2,147,483,648 to 2,147,483,647
LongType1e9, 9223372036854775807Very large integersTimestamps (milliseconds), large IDs, counters-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
FloatType3.14, -2.718Single-precision floating-point numbersFloating-point numbers where memory efficiency is a concern, scientific dataApproximately ±3.4 × 10−38 to ±3.4 × 1038
DoubleType3.14159, -2.71828Double-precision floating-point numbersDefault for floating-point numbers, higher precision neededApproximately ±4.9 × 10−324 to ±1.8 × 10308
DecimalType123.45, 0.0001Arbitrary-precision decimal numbersHigh precision for financial/scientific calculationsArbitrary precision (defined by precision and scale)
StringType'hello', 'world'Text dataAny textual informationVariable length, limited only by available memory
BooleanTypetrue, falseTrue/false valuesFlags, indicators, binary choicestrue, false
DateType2024-01-27DatesStoring and manipulating datesYYYY-MM-DD (date only)
TimestampType2024-01-27 10:00Dates and timesStoring and manipulating timestampsYYYY-MM-DD HH:mm:ss[.fffffffff] (date and time)
BinaryTypeb'data'Raw binary dataImages, audio, other non-textual binary dataVariable length, sequence of bytes
ArrayType[1,2,3]Lists/arrays of valuesSingle column storing multiple valuesVariable length, all elements of the same type
MapType{'a':1, 'b':2}Key-value pairsStoring structured data where each key maps to a valueVariable length, keys and values of the same type respectively
StructType{'name':'Bob'}Complex data structuresStoring nested/hierarchical dataVariable length, defined by the schema
YearMonthIntervalType
DayTimeIntervalType

Reference: Spark SQL Data Types

Compare Data Types in Spark with SQL Server

Spark Data TypeSQL Server Data TypeNotes
ByteTypetinyintDirect equivalent.
ShortTypesmallintDirect equivalent.
IntegerTypeintDirect equivalent.
LongTypebigintDirect equivalent.
FloatTyperealDirect equivalent.
DoubleTypefloatDirect equivalent.
DecimalTypedecimal, numericRequires matching precision and scale.
StringTypevarchar, nvarchar, char, ncharChoose based on length and character encoding (Unicode or not).
BooleanTypebitDirect equivalent.
DateTypedateDirect equivalent.
TimestampTypedatetime2, datetimeoffsetdatetime2 is generally preferred for better precision. datetimeoffset handles time zones.
BinaryTypevarbinary, binaryChoose based on length.
ArrayTypetable (with appropriate schema)Requires creating a separate table to represent the array. No direct equivalent.
MapTypetable (with appropriate schema)Requires creating a separate table to represent the map. No direct equivalent.
StructTypetable (with appropriate schema)Requires creating a separate table to represent the struct. No direct equivalent.

QnA