Skip to main content

Flat Files and Text Blobs

Reading and writing delimited and unstructured text files in Savant

Flat files are the most common format for exchanging data between systems. They are simple, portable, and widely supported, making them a reliable default for most workflows.

Savant supports two modes for working with plain-text files:

  • Flat files – Structured, delimited files where each row is a record and each column is a field (.csv, .tsv, .txt)

  • Text blobs – Unstructured .txt files read as a single block of content, useful for logs, freeform exports, or any file where row-by-row parsing doesn't apply


Supported extensions

Extension

Type

Read limit

Write limit

.csv

Comma-separated flat file

1 GB

1 GB

.tsv

Tab-separated flat file

1 GB

1 GB

.txt

Flat file or text blob

1 GB

1 GB


Read features

Reading a single file

When you configure a dataset with a File URL, Savant reads exactly that one file every time the dataset runs. Savant does not scan the surrounding folder or consider other files at that location. Each run fetches the latest version of the file, so the dataset always reflects the most current content available. If the file is moved, deleted, or access is revoked, the run fails with an explicit error.

File replacement and stale reads: Savant tracks source files by file ID, not filename. If you update a file by uploading a new version rather than editing the original in place, the original file moves to the trash with its ID intact – and Savant keeps reading it silently until it is permanently deleted.

As a best practice, always edit source files in place. If you do replace a file, update the dataset with the new file URL before the next run.

Delimiter

The delimiter tells Savant how to split each row into fields. If the delimiter matches the file's format, columns align correctly. If it doesn't, you may see a single merged column or malformed structure. The default delimiter in Savant is a comma (,).

Text qualifier

The text qualifier prevents delimiter characters inside quoted values from being treated as column boundaries. For example, with a comma delimiter, the value "New York, NY" stays in a single column rather than being split into two. The default text qualifier in Savant is a double quote (").

Escape character

The escape character allows special characters – such as quotes – to appear inside field values without confusing the parser. This is relevant for values like Toys "R" Us or He said "yes", where internal quotes need to be distinguished from field boundaries. Depending on the file's format, quotes may be escaped by doubling them ("") or by a preceding escape character (\"). The default escape character in Savant is a backslash (\).

Encoding

Encoding determines how the raw bytes in a file are translated into readable text. The default in Savant is UTF-8, which handles the widest range of characters. If you see unexpected symbols or broken characters – particularly in files with non-English content – verify that the file's encoding matches this setting.

Skip rows

Skip Rows instructs Savant to ignore a specified number of rows at the top of the file before reading content. This is useful for files that include titles, notes, or metadata above the actual data table. Savant skips those rows and builds the dataset from the first valid row of content.

Headerless

When enabled, Headerless tells Savant not to treat the first row as column names. Instead, Savant generates column names automatically and keeps all rows as data. Use this for files that contain only data rows with no header.

This example file:

Acme 1

Acme 2

Acme 3

Alpha

Beta

Charlie

Is read as:

Column 1

Column 2

Column 3

Acme 1

Acme 2

Acme 3

Alpha

Beta

Charlie

Common headers

Common Headers lets you define a fixed set of column names for the dataset. When reading a single file, this is optional – it is primarily useful when you want to enforce a known schema rather than relying on whatever headers are present in the file.


Reading from a folder

When you configure a dataset with a Folder URL, Savant treats the folder as the input source. At run time, Savant enumerates the files in the folder and applies the selection rules below to determine which files to ingest.

File pattern

File Pattern restricts ingestion to files whose names match a specified pattern. This prevents unintended files from being picked up in a shared folder. For example, a pattern like billings_*.csv ingests only files matching that naming convention and ignores everything else.

Scan subfolders

When enabled, Scan Subfolders extends the read to include files in nested subdirectories, not just the top-level folder. This is useful when data is organized by date, region, or business unit across multiple folders. Savant supports reading from up to 10 subfolders within any given folder.

Use only most recent file

When enabled, this option instructs Savant to ingest a single file: the newest matching file based on last-modified time. This is the right choice for recurring workflows where a fresh export is delivered to the folder on a schedule and you always want the latest version without updating the dataset configuration.

Common headers in folder mode

When a folder read brings in multiple files, Common Headers becomes especially important. It aligns all files to a consistent column set, even when headers vary across files or some older files are missing fields.


Write features

Folder path

Folder Path defines the destination where Savant writes the output file. For recurring workflows, keeping this path consistent ensures that downstream systems and users always know where to find the latest output.

Subfolder name

Subfolders controls how output files are organized within the destination path. Subfolders can be defined statically (a fixed path) or dynamically, where Savant generates a separate subfolder for each unique value of one or more fields in the dataset – for example, one folder per region or one folder per day. This allows a single workflow to partition output automatically without requiring multiple pipeline configurations.

File name

File Naming determines how each output file is named within its folder. As with subfolders, names can be static or dynamic. Dynamic naming generates one file per unique value of a selected field – for example, one file per customer or one file per category – within a single run.

Write operations

Savant supports three write operations for flat file outputs:

  • Create – Writes a new file at the destination path

  • Update – Overwrites an existing file with fresh output

  • Insert – Appends new rows to an existing file without replacing its contents

Create and Update produce a clean, complete file on each run. Insert is suited for incremental pipelines where new data should accumulate over time.

Delimiter, text qualifier, escape character and encoding

These settings control how Savant serializes data into the output file. They mirror the read parsing settings and use the same defaults: comma delimiter, double-quote text qualifier, backslash escape character, and UTF-8 encoding. Keeping these consistent between read and write configurations ensures output files are reliably readable by downstream systems.

Headerless

When enabled, Savant writes data rows only, with no column name header row. Use this when the receiving system expects raw data without headers.

Skip rows

Skip Rows controls how many blank rows Savant leaves at the top of the output file before writing data. This is used when the output must conform to a predefined layout – for example, reserving space for a title or metadata section that will be added separately.

Sorting

Sorting controls the order in which rows are written to the output file. Specifying a sort order ensures consistent, predictable output across runs – particularly useful for reporting workflows where row order affects readability or downstream


Next steps

Did this answer your question?