Data Sources

Flat Files, Databases, Storage, Big Data, NoteBook ..

Introduction

Let's turn our attention toward common data sources:

Flat Files - Simple text files that contain data in a basic format (like CSV or TXT) which can be accessed using text editors, spreadsheet programs, or programming languages like Python.

Databases - Organized collections of structured data stored in tables with rows and columns that can be easily accessed, managed, queried, and updated through database management systems (DBMS).

Storage - Cloud-based or network storage systems that provide scalable data repositories for storing various data formats and types accessible across distributed environments.

Big Data - Large-scale datasets that are too complex or voluminous for traditional processing methods, requiring specialized distributed computing frameworks like Hadoop or Spark for analysis.

NoteBook - Interactive computational environments (like Jupyter notebooks) that combine executable code, visualizations, and documentation in a single interface for exploratory data analysis and development.

Flat Files

Flat files are simple text files that contain data, while databases are organized collections of data that can be accessed, managed, and updated easily. To access flat files, you can use a text editor or a spreadsheet program like Microsoft Excel. You can also use programming languages like Python to read and write data from flat files.

Structured

Structured data is considered the most traditional form of data storage, as early database management systems (DBMS) were designed to handle this format. This type of data relies on a predefined data model, which outlines how the data is stored, processed, and accessed. The model ensures each piece of data, or field, is distinct, enabling targeted or comprehensive queries across multiple data points. This feature makes structured data exceptionally versatile, allowing for efficient aggregation of information from different database segments.

At its core, structured data follows a specific format, making it easily analyzable. It fits into a tabular structure, with clear relationships between the rows and columns, similar to those found in Excel spreadsheets or SQL databases. These containers organize data into defined rows and columns, facilitating straightforward sorting and manipulation.

Last updated

Was this helpful?