# Flat Files

{% hint style="info" %}

#### **Flat Files**

**Structured flat files** are the most common type used in data integration, containing data organized in a consistent, predictable format with clearly defined fields and delimiters. Examples include **CSV (Comma-Separated Values)** files where each row represents a record and columns are separated by commas (e.g., `CustomerID,Name,Email,Purchase_Date`), **TSV (Tab-Separated Values)** files that use tabs as delimiters, and **fixed-width files** where each field occupies a specific number of characters (common in legacy mainframe systems). These files are ideal for Pentaho transformations because their predictable structure makes them easy to parse, with each row mapping directly to a database record and each column corresponding to a specific field.&#x20;

**Unstructured flat files**, by contrast, contain free-form text without any predefined schema or organization, such as plain text documents, email bodies, or raw application log files that lack consistent formatting - these require more sophisticated text parsing and natural language processing techniques to extract meaningful data.
{% endhint %}

<figure><img src="/files/6fpACs9xGd9PiMzkxFmc" alt=""><figcaption><p>Flat Files</p></figcaption></figure>

{% hint style="info" %}
**Semi-structured flat files** occupy a middle ground, containing data with some organizational structure but without the rigid schema of databases or structured files. The most prominent examples are **JSON (JavaScript Object Notation)** files, which use key-value pairs and nested objects (e.g., `{"customer": {"id": 123, "orders": [{"item": "laptop", "price": 899}]}}`), and **XML (eXtensible Markup Language)** files that use hierarchical tags to define data relationships. These formats are self-describing and flexible, making them popular for APIs, web services, and modern application data exchange.&#x20;

**Metadata in flat files** refers to descriptive information about the data itself - this can include header rows that define column names in CSV files, schema definitions that specify data types and constraints, file-level documentation about data source and creation date, or embedded comments that explain field meanings. In Pentaho, understanding and properly handling metadata is crucial for accurate data mapping, as it helps define how the ETL process should interpret field types (string vs. integer vs. date), handle null values, and validate data quality during transformation steps.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-integration/data-integration/data-sources/flat-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
