Skip to main content

Parquet

Reading and writing Parquet files for large-scale data workloads in Savant

Parquet is a file format designed for large-scale data workloads. Unlike flat files, which store data row by row, Parquet stores data column by column – a structure that makes it significantly faster to read and process large datasets, particularly when only a subset of columns is needed.

Parquet files embed their own schema, meaning data types are defined within the file itself rather than inferred at read time. This makes Parquet the most reliable format for large datasets where consistent column types and predictable structure are important.

Parquet is typically organized as a folder of files rather than a single file, which allows large datasets to be partitioned and processed in parallel.


Supported extension

Extension

Read limit

Write limit

.parquet

1 GB

1 GB


Read features

Schema-first ingestion

When Savant reads a Parquet file, it uses the schema embedded in the file to determine column names and data types directly. This eliminates the parsing ambiguity common with CSV files – there is no delimiter to configure, no encoding to guess, and no risk of a column being read as the wrong type.

Folder-based reads

Parquet datasets are commonly organized as a folder of multiple files rather than a single file – particularly when data is partitioned by date, region, or another key. Savant reads the full folder as a single dataset, combining all matching files into one unified output.

Savant supports reading from up to 10 subfolders within any given folder, making it possible to work with deeply partitioned datasets using a single dataset configuration.


Write features

Output structure

Parquet outputs are written to a folder path rather than a single file. For large datasets, Savant may produce multiple files within that folder. This is expected behavior and consistent with how Parquet is designed to be used – downstream systems that consume Parquet data are built to handle folder-based outputs.

File name

Savant supports both static and dynamic file naming for Parquet outputs. Dynamic naming generates one file per unique value of a selected field, allowing outputs to be automatically partitioned – for example, one file per region or one file per month – within a single run.

Write operations

Savant supports Create and Update operations for Parquet outputs. Each run generates a clean, complete dataset at the destination path.

Note: Parquet does not support row-level insert (append) in the traditional sense. Incremental data patterns in Parquet are handled through partitioning and folder organization rather than appending rows to an existing file.


Next steps

Review the file type guides for detailed configuration options, parsing behavior, and known limitations:

Did this answer your question?