cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks write to iceberg managed table with pyiceberg

tdata
New Contributor

Hello,

Im trying to write to databricks managed icerberg table using pyiceberg inside a spark_python_task (Serverless compute).

Im facing an error when writing :

 

Error writing to Iceberg table: When reading information for key '' in bucket '' : AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 35, SSL connect error; Details: Recv failure: Connection reset by peer.
 
Im running this code :

catalog_config = {
        "type": "rest",
        "uri": (
            f"https://{workspace_url}/api/2.1/unity-catalog/iceberg-rest"
        ),
        "warehouse": catalog_name,
        "token": databricks_pat,
    }

catalog = load_catalog("catalog_name", **catalog_config)
table = catalog.load_table(table_identifier)
table.append(df)
 
I want to know if there is a special uri to use because im inside databricks compute, or if its just not possible.
This code run find outside databricks.
 
Thank you 
2 REPLIES 2

stbjelcevic
Databricks Employee
Databricks Employee

Hi @tdata ,

The error youโ€™re seeing is most likely a networking or permissions issue, not the URI.

Can you double-check that a metastore admin has enabled โ€œExternal data accessโ€ for the metastore? Also, your principal needs EXTERNAL USE SCHEMA on the target schema (and standard USE CATALOG/USE SCHEMA/SELECT on the table).

If all of that looks good, you can check bucket policy/network reachability for serverless: ensure your bucket policy allows access for the storage credential role and does not block requests based on VPC source or IP ranges that exclude Databricks serverless egress.

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @tdata,

The approach you are using (PyIceberg with the Unity Catalog Iceberg REST catalog) does support writes to managed Iceberg tables, so the pattern is correct. The SSL error you are seeing is almost certainly caused by running this from within Databricks serverless compute rather than from an external machine.

Here is what is happening and how to resolve it.


WHY THE SSL ERROR OCCURS ON SERVERLESS COMPUTE

When PyIceberg writes to a table through the Iceberg REST catalog, the flow is:

1. PyIceberg calls the REST catalog endpoint to get table metadata and temporary cloud credentials (credential vending).
2. PyIceberg then uses those vended credentials to write data files directly to the underlying cloud storage (S3, ADLS, GCS).

On Databricks serverless compute, outbound network access to cloud storage is restricted. The compute environment is designed so that data access goes through Unity Catalog external locations and managed paths, not through arbitrary S3/ADLS calls from user code. When PyIceberg tries to write directly to the S3 bucket using the vended credentials, the connection gets blocked, which produces the SSL/connection reset error you see.

This is the same reason your code works fine outside of Databricks: from an external machine, PyIceberg can freely reach S3 with the vended credentials.


OPTIONS TO RESOLVE THIS

1. Run PyIceberg from outside Databricks (recommended if possible)

If you can move the PyIceberg write step to an external compute environment (a VM, container, local machine, or a non-Databricks orchestrator), your existing code should work as-is. The REST catalog + credential vending flow is designed for exactly this use case.

2. Use classic (non-serverless) compute with proper network configuration

If you need to run on Databricks, try using a classic cluster instead of serverless. Classic clusters in a customer-managed VPC have fewer outbound network restrictions, so PyIceberg should be able to reach cloud storage with the vended credentials. You would install PyIceberg as a notebook-scoped library:

%pip install "pyiceberg[pyarrow]"

3. Write using Spark SQL or DataFrame API instead of PyIceberg

Since you are already on Databricks compute, you can write to managed Iceberg tables natively using Spark without PyIceberg at all. Databricks Runtime supports Iceberg tables directly:

df.writeTo("catalog_name.schema_name.table_name").append()

Or with SQL:

INSERT INTO catalog_name.schema_name.table_name SELECT * FROM temp_view

This avoids the credential vending round-trip entirely because Databricks handles storage access internally.


CONFIGURATION NOTES FOR PYICEBERG (WHEN RUNNING EXTERNALLY)

Your catalog configuration looks correct, but make sure:

a) The workspace URL includes the workspace ID to avoid 303 redirects:

catalog_config = {
"type": "rest",
"uri": "https://<workspace-host>/?o=<workspace-id>/api/2.1/unity-catalog/iceberg-rest",
"warehouse": "<uc-catalog-name>",
"token": "<your-pat-token>",
}

b) The user or service principal has the EXTERNAL USE SCHEMA privilege on the target schema:

GRANT EXTERNAL USE SCHEMA ON SCHEMA catalog_name.schema_name TO `user@example.com`;

c) The table is a managed Iceberg table (created with USING ICEBERG). Write access from external Iceberg clients is only supported for managed Iceberg tables. Delta tables with Iceberg reads (UniForm) enabled are read-only from external clients.

You can verify your table type with:

DESCRIBE EXTENDED catalog_name.schema_name.table_name;

Look for Provider = ICEBERG in the output.


REFERENCE DOCUMENTATION

- Accessing Databricks tables from Apache Iceberg clients: https://docs.databricks.com/aws/en/external-access/iceberg.html
- UniForm (Iceberg reads for Delta tables): https://docs.databricks.com/en/delta/uniform.html
- Serverless compute limitations: https://docs.databricks.com/aws/en/compute/serverless/limitations.html
- Creating managed Iceberg tables: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.