DuckDB
Batch process all your records to store structured outputs in a DuckDB installation.
The requirements are as follows.
-
A persistent database, for example by running the DuckDB CLI command
duckdb <my-database-filename>.db
orduckdb <my-database-filename>.duckdb
, replacing<my-database-filename>
with the name of the target file. -
The path to the target persistent database file.
-
A schema in the target database.
-
You can list available schemas and their parent catalogs by running the following DuckDB CLI command:
The DuckDB connector uses the default schema name of
main
if not otherwise specified. -
A table in the target schema.
-
You can list available tables in a schema by running the following DuckDB CLI commands, replacing the target catalog and schema names:
The DuckDB connector uses the default table name of
elements
if not otherwise specified.For maximum compatibility, Unstructured recommends the following table schema:
You can list the schema of a table by running the following DuckDB CLI commands, replacing the target catalog, schema, and table names:
The DuckDB connector dependencies:
You might also need to install additional dependencies, depending on your needs. Learn more.
The following environment variables:
DUCKDB_DATABASE
- The path to the target DuckDB persistent database file with the extension.db
or.duckdb
, represented by--database
(CLI) ordatabase
(Python).DUCKDB_DB_SCHEMA
- The name of the target schema in the database, represented by--db-schema
(CLI) ordb_schema
(Python).DUCKDB_TABLE
- The name of the target table in the schema, represented by--table
(CLI) ortable
(Python).
These environment variables:
UNSTRUCTURED_API_KEY
- Your Unstructured API key value.UNSTRUCTURED_API_URL
- Your Unstructured API URL.
Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector: