Rajanand home page
Rajanand
  • About
  • Contact
  • Newsletter
  • Newsletter
Home
Databricks
Spark
SQL
Python
Notes
Interview
Glossary

ACID Properties

Atomicity, Consistency, Isolation and Durability

Availability

Availability is a critical aspect of system design, ensuring that a system remains operational and accessible to users when needed.

CAP Theorem

Consistency, Availability and Partition Tolerance

Data Engineering

Data Engineering is the practice of designing, building, and maintaining systems for collecting, storing, processing, and analyzing large volumes of data.

Data Lake

A data lake is a centralized repository designed to store vast amounts of data in its native, raw format.

Data Lakehouse

A data lakehouse is a unified architecture that combines the scalability and flexibility of a data lake with the reliability and queryability of a data warehouse.

Data Transformation

Data Transformation is the process of converting data from one format, structure, or type into another to make it suitable for analysis, storage, or integration.

Data Warehouse

A data warehouse is a centralized repository designed to store, manage, and analyze large volumes of structured data from various sources.

Distributed System

A distributed system is a collection of independent computers that appear to its users as a single coherent system.

Fault Tolerance

Fault tolerance is the ability of a system to continue operating correctly even when some of its components fail.

ELT

ELT is a modern approach to data integration that differs from the traditional ETL process. In ELT, data is first extracted from source systems, loaded into a target system, and then transformed within the target system.

ETL

ETL is a process used in data integration and data warehousing to collect data from various sources, transform it into a consistent format, and load it into a target system (e.g., a data warehouse or database).

Lazy Evaluation

Spark transformations on RDDs, DataFrames, and Datasets are not executed immediately when they are defined. Instead, they are only executed when an action is called.

Online Analytical Processing

OLAP is a type of database system designed to analyze large volumes of historical data from multiple perspectives. It enables users to perform complex analytical queries and generate reports in DW/BI.

Online Transaction Processing

OLTP is a type of database system designed to manage transactional applications. It focuses on processing large numbers of small, short-lived transactions in real-time, ensuring data integrity and consistency.

Operational Data Store

ODS is a database designed to integrate data from multiple sources for operational reporting and real-time decision-making.

Reliability

Reliability is the ability of a system to perform its required functions under stated conditions for a specified period of time.

Scalability

Scalability is the ability of a system to handle increased load or growth without compromising performance, reliability, or functionality.

Cloud Computing

Cloud Computing is a technology that delivers computing services—such as servers, storage, databases, networking, software, and analytics—over the internet (“the cloud”).

Availability

Availability is a critical aspect of system design, ensuring that a system remains operational and accessible to users when needed.

Scalability

Scalability is the ability of a system to handle increased load or growth without compromising performance, reliability, or functionality.