Have a great day! đ€©
âK
Data Basics
Overview
ACID Properties
Availability
Big Data
CAP Theorem
Consistency
Data Analytics
Data Engineering
Data Science
Database
DBMS
Distributed System
Encoding
ETL
ELT
Fault Tolerance
Lazy evaluation
NoSQL
OLAP
OLTP
Reliability
Scalability
Data Storage & Formats
ADLS
ORC
CSV
Delta Lake
Distributed File Systems
HDFS
JSON
Amazon S3
Schema Enforcement
Schema Evolution
Schema-on-Read
Schema-on-Write
Storage
XML
YAML File Format
Data Processing
Apache Hadoop
Batch Processing
Compute Engines
Data Processing
MapReduce
Stream Processing
Data Pipelines
Change Data Capture
Data Ingestion
Data Integration
Data Orchestration
Data Pipelines
Data Transformation
ETL
ELT
Data Governance
Data Catalog
Data Discovery
Data Governance
Data Lineage
Data Mapping
Data Quality
Metadata Management
Unity Catalog
Cloud
Cloud Computing
Cloud Data Warehouse
Cloud Native
Cloud Object Storage
Consensus algorithms
Distributed File Systems
Distributed System
IaaS
PaaS
Software as a Service
FaaS
Serverless Computing
Virtual Machine
Data Warehousing
Data Lake
Data Lakehouse
Data Mart
Data Warehouse
Apache Hudi
Apache Iceberg
Medallion Architecture
Operational Data Store
Data Analytics
Business Intelligence
Data Visualization
OLAP
Self-Service Analytics
Artificial Intelligence
Artificial Intelligence
Deep Learning
Gen AI
Large Language Models
Machine Learning
Machine Learning Models
Networking and Security
Authentication
Authorization
Data Security
Data Sovereignty
Disaster Recovery
Encryption
Load Balancing
TCP/IP
Rajanand home page
Rajanand
đ» Tech
Home
Spark
SQL
Python
Notes
Glossary
Contact
Newsletter
Newsletter
Search...
Navigation
Data Basics
Glossary: Overview
Glossary
ACID Properties
Atomicity, Consistency, Isolation and Durability
Availability
Availability is a critical aspect of system design, ensuring that a system remains operational and accessible to users when needed.
CAP Theorem
Consistency, Availability and Partition Tolerance
Cloud Computing
Cloud Computing is a technology that delivers computing servicesâsuch as servers, storage, databases, networking, software, and analyticsâover the internet (âthe cloudâ).
Data Engineering
Data Engineering is the practice of designing, building, and maintaining systems for collecting, storing, processing, and analyzing large volumes of data.
Data Lake
A data lake is a centralized repository designed to store vast amounts of data in its native, raw format.
Data Lakehouse
A data lakehouse is a unified architecture that combines the scalability and flexibility of a data lake with the reliability and queryability of a data warehouse.
Data Transformation
Data Transformation is the process of converting data from one format, structure, or type into another to make it suitable for analysis, storage, or integration.
Data Warehouse
A data warehouse is a centralized repository designed to store, manage, and analyze large volumes of structured data from various sources.
Distributed System
A distributed system is a collection of independent computers that appear to its users as a single coherent system.
ELT
ELT is a modern approach to data integration that differs from the traditional ETL process. In ELT, data is first extracted from source systems, loaded into a target system, and then transformed within the target system.
ETL
ETL is a process used in data integration and data warehousing to collect data from various sources, transform it into a consistent format, and load it into a target system (e.g., a data warehouse or database).
Fault Tolerance
Fault tolerance is the ability of a system to continue operating correctly even when some of its components fail.
Lazy Evaluation
Spark transformations on RDDs, DataFrames, and Datasets are
not
executed immediately when they are defined. Instead, they are only executed when an action is called.
Online Analytical Processing
OLAP is a type of database system designed to analyze large volumes of historical data from multiple perspectives. It enables users to perform complex analytical queries and generate reports in DW/BI.
Online Transaction Processing
OLTP is a type of database system designed to manage transactional applications. It focuses on processing large numbers of small, short-lived transactions in real-time, ensuring data integrity and consistency.
Operational Data Store
ODS is a database designed to integrate data from multiple sources for operational reporting and real-time decision-making.
Reliability
Reliability is the ability of a system to perform its required functions under stated conditions for a specified period of time.
Scalability
Scalability is the ability of a system to handle increased load or growth without compromising performance, reliability, or functionality.
Assistant
Responses are generated using AI and may contain mistakes.