cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yzhang
by Contributor
  • 3116 Views
  • 10 replies
  • 3 kudos

iceberg with partitionedBy option

I am able to create a UnityCatalog iceberg format table:    df.writeTo(full_table_name).using("iceberg").create()However, if I am adding option partitionedBy I will get an error.  df.writeTo(full_table_name).using("iceberg").partitionedBy("ingest_dat...

  • 3116 Views
  • 10 replies
  • 3 kudos
Latest Reply
LazyGenius
New Contributor III
  • 3 kudos

@Sanjeeb2024 If your question is for me, then I will say it depends on use case!!As if you have very big data to be ingested in table then you would prefer creating table and then ingest data into it using simultaneous jobs

  • 3 kudos
9 More Replies
Ajay-Pandey
by Databricks MVP
  • 4053 Views
  • 9 replies
  • 2 kudos

Databricks Job cluster for continuous run

Hi AllI am having situation where I wanted to run job as continuous trigger by using job cluster, cluster terminating and re-creating in every run within continuous trigger.I just wanted two know if we have any option where I can use same job cluster...

AjayPandey_0-1728973783760.png
  • 4053 Views
  • 9 replies
  • 2 kudos
Latest Reply
mukul1409
Contributor II
  • 2 kudos

Hi @Ajay-Pandey only solution for you 1. Create an all-purpose cluster called for example:     continuous-job-cluster and Disable auto-termination or set it to a large value.2. Configure job to use existing_cluster_id    In Jobs UI or DAB YAML:   exi...

  • 2 kudos
8 More Replies
parth_db
by New Contributor III
  • 980 Views
  • 5 replies
  • 7 kudos

Resolved! AutoLoader Type Widening

I have a few doubts regarding AutoLoader behavior and capabilities. Please check and correct wherever my assumptions or understanding are incorrect, much appreciated. Below is my specific code Example scenario:Target Managed Delta Table (Type Widenin...

  • 980 Views
  • 5 replies
  • 7 kudos
Latest Reply
Sanjeeb2024
Valued Contributor
  • 7 kudos

Thank you @nayan_wylde for the details. This is really useful.

  • 7 kudos
4 More Replies
vamsi_simbus
by Databricks Partner
  • 840 Views
  • 2 replies
  • 3 kudos

Resolved! Databricks Apps - Auto Terminate Option

Hi Everyone,I’m exploring Databricks Apps and have two questions:Is there a way to automatically terminate an app after a certain period of inactivity?Does Databricks provide any scheduling mechanism for apps, similar to how Jobs can be scheduled?Any...

  • 840 Views
  • 2 replies
  • 3 kudos
Latest Reply
Sanjeeb2024
Valued Contributor
  • 3 kudos

Hi @vamsi_simbus - One option you can explore to start and stop apps using Databricks API. Have a look on the below document link - https://docs.databricks.com/api/workspace/apps/stop

  • 3 kudos
1 More Replies
slangenborg
by Databricks Partner
  • 536 Views
  • 3 replies
  • 1 kudos

Resolved! DAB Job - Serverless Cluster using configured base environment

I have configured a base serverless environment for my workspace that includes libraries from a private repositoryThis base environment has been set to default, and behaves as expected when running notebooks manually in the workspace with Serverless ...

slangenborg_0-1767733286443.png
  • 536 Views
  • 3 replies
  • 1 kudos
Latest Reply
mukul1409
Contributor II
  • 1 kudos

Hi @slangenborg  According to the official Databricks Jobs REST API documentation, notebook tasks use the notebook environment only implicitly when no environment_key is provided. The API lets you explicitly configure environments only via an environ...

  • 1 kudos
2 More Replies
tonkol
by New Contributor II
  • 341 Views
  • 1 replies
  • 0 kudos

Migrate on-premise delta tables to Databricks (Azure)

Hi There,I have the situation that we've decided to migrate our on-premise delta-lake to Azure Databricks.Because of networking I can only "push" the data from on-prem to cloud.What would be the best way to replicate all tables: schema+partitioning i...

  • 341 Views
  • 1 replies
  • 0 kudos
Latest Reply
mukul1409
Contributor II
  • 0 kudos

The correct solution is not SQL based.Delta tables are defined by the contents of the delta log directory, not by CREATE TABLE statements. That is why SHOW CREATE TABLE cannot reconstruct partitions, properties or constraints.The only reliable migrat...

  • 0 kudos
dikla
by New Contributor II
  • 877 Views
  • 4 replies
  • 1 kudos

Resolved! Issues Creating Genie Space via API Join Specs Are Not Persisted

Hi,I’m experimenting with the new API to create a Genie Space.I’m able to successfully create the space, but the join definitions are not created, even though I’m passing a join_specs object in the same format returned by GET /spaces/{id} for an exis...

  • 877 Views
  • 4 replies
  • 1 kudos
Latest Reply
mtaran
Databricks Employee
  • 1 kudos

The serialized space JSON is incorrect. It has `join_specs` and `sql_snippets` nested under `data_sources`, but they should be nested under `instructions` instead. There they apply as expected.

  • 1 kudos
3 More Replies
Maxrb
by New Contributor III
  • 526 Views
  • 1 replies
  • 1 kudos

Resolved! Import functions in databricks asset bundles using source: WORKSPACE

Hi,We are using Databricks asset bundles, and we create functions which we import in notebooks, for instance:from utils import helperswhere utils is just a folder in our root. When running this with source: WORKSPACE, it will fail to resolve the impo...

  • 526 Views
  • 1 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

In Git folders, the repo root is auto-added to the Python path, so imports like from utils import helpers work, while in workspace folders, only the notebook’s directory is on the path, which is why it breaks. The quick fix is a tiny bootstrap that a...

  • 1 kudos
ramsai
by New Contributor II
  • 659 Views
  • 3 replies
  • 3 kudos

Resolved! Serverless Compute Access Restriction Not Supported at User Level

The requirement is to disable serverless compute access for specific users while allowing them to use only their assigned clusters, without restricting serverless compute at the workspace level. After reviewing the available configuration options, th...

  • 659 Views
  • 3 replies
  • 3 kudos
Latest Reply
Masood_Joukar
Contributor
  • 3 kudos

Hi @ramsai ,how about a workaround ?setting budget policies at account level.Attribute usage with serverless budget policies | Databricks on AWS

  • 3 kudos
2 More Replies
RyanHager
by Contributor
  • 725 Views
  • 2 replies
  • 2 kudos

Resolved! Liquid Clustering and S3 Performance

Are there any performance concerns when using liquid clustering and AWS S3.  I believe all the parquet files go in the same folder (Prefix in AWS S3 Terms) verses folders per partition when using "partition by".  And there is this note on S3 performa...

  • 725 Views
  • 2 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

Even though liquid clustering removes Hive-style partition folders, it typically doesn’t cause S3 prefix performance issues on Databricks. Delta tables don’t rely on directory listing for reads; they use the transaction log to locate exact files. In ...

  • 2 kudos
1 More Replies
EdemSeitkh
by New Contributor III
  • 9435 Views
  • 6 replies
  • 0 kudos

Resolved! Pass catalog/schema/table name as a parameter to sql task

Hi, i am trying to pass catalog name as a parameter into query for sql task, and it pastes it with single quotes, which results in error. Is there a way to pass raw value or other possible workarounds? query:INSERT INTO {{ catalog }}.pas.product_snap...

  • 9435 Views
  • 6 replies
  • 0 kudos
Latest Reply
detom
New Contributor II
  • 0 kudos

This works USE CATALOG IDENTIFIER({{ catalog_name }});USE SCHEMA IDENTIFIER({{ schema_name }});

  • 0 kudos
5 More Replies
Gilad-Shai
by New Contributor III
  • 1068 Views
  • 12 replies
  • 12 kudos

Resolved! Creating Serverless Cluster

Hi everyone,I am trying to create a cluster in Databricks Free Edition, but I keep getting the following error:"Cannot create serverless cluster, please try again later."I have attempted this on different days and at different times, but the issue pe...

  • 1068 Views
  • 12 replies
  • 12 kudos
Latest Reply
Gilad-Shai
New Contributor III
  • 12 kudos

Thank you all ( @Sanjeeb2024 , @Sanjeeb2024, @JAHNAVI , @Manoj12421 ), it works!It was not a DataBricks Free Edition as @Masood_Joukar  said.    

  • 12 kudos
11 More Replies
Sainath368
by Contributor
  • 557 Views
  • 4 replies
  • 2 kudos

Migrating from directory-listing to Autoloader Managed File events

Hi everyone,We are currently migrating from a directory listing-based streaming approach to managed file events in Databricks Auto Loader for processing our data in structured streaming.We have a function that handles structured streaming where we ar...

  • 557 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raman_Unifeye
Honored Contributor III
  • 2 kudos

Yes, for your setup, Databricks Auto Loader will create a separate event queue for each independent stream running with the cloudFiles.useManagedFileEvents = true option.As you are running - 1 stream per table, 1 unique directory per stream and 1 uni...

  • 2 kudos
3 More Replies
halsgbs
by New Contributor III
  • 389 Views
  • 3 replies
  • 2 kudos

Alerts V2 Parameters

Hi, I'm working on using Databricks python SDK to create an alert using a notebook, but it seems with V1 there is no way to add subscribers and with V2 there is no option for adding parameters. Is my understanding correct or am I missing something? A...

  • 389 Views
  • 3 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

Alerts V2 (Public Preview) do not support query parameters yet. This is a documented limitation. Legacy alerts (V1) do support parameters and will use the default values defined in the SQL editor. For notifications, both legacy alerts and Alerts V2 a...

  • 2 kudos
2 More Replies
lziolkow2
by Databricks Partner
  • 1016 Views
  • 4 replies
  • 5 kudos

Resolved! Strange DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE error

I use databricks 17.3 runtime.I try to run following code.CREATE OR REPLACE TABLE default.target_table (key1 INT,key2 INT,key3 INT,val STRING) USING DELTA;INSERT INTO target_table(key1, key2, key3, val) VALUES(1, 1, 1, 'a');CREATE OR REPLACE TABLE de...

  • 1016 Views
  • 4 replies
  • 5 kudos
Latest Reply
emma_s
Databricks Employee
  • 5 kudos

Hi, you need to put all of the keys in the ON part of the clause rather then in the where condition. This code works: MERGE INTO target_table AS target USING source_table AS source ON target.key1 = source.key1 AND target.key2 = source.key2 AND target...

  • 5 kudos
3 More Replies
Labels