cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

angelop
by New Contributor
  • 35 Views
  • 1 replies
  • 0 kudos

Databricks Clean Rooms creation

I am trying to create a Databricks Clean Rooms instance, I have been following the video from Databricks youtube channel.As I only have one workspace, to create a clean rooms I have added my own Clean Room sharing identifier,when I do that I get the ...

  • 35 Views
  • 1 replies
  • 0 kudos
Latest Reply
TakuyaOmi
Contributor
  • 0 kudos

@angelop I tried it as well and encountered the same error. A new collaborator needs to be set up. If that’s not feasible, it would be advisable to reach out to Databricks support.By the way, the following video provides a more detailed explanation a...

  • 0 kudos
The_Demigorgan
by New Contributor
  • 1322 Views
  • 1 replies
  • 0 kudos

Autoloader issue

I'm trying to ingest data from Parquet files using Autoloader. Now, I have my custom schema, I don't want to infer the schema from the parquet files.During readstream everything is fine. But during writestream, it is somehow inferring the schema from...

  • 1322 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

In this case, please make sure you specify the schema explicitly when reading the Parquet files and do not specify any inference options. Something like spark.readStream.format("cloudFiles").schema(schema)... If you want to more easily grab the schem...

  • 0 kudos
vmpmreistad
by New Contributor II
  • 3894 Views
  • 4 replies
  • 0 kudos

How to make structured streaming with autoloader efficiently and incrementally read files

TLDR format: How do I make a structured streaming job using autoloader read files using InMemoryFileIndex instead of DeltaFileOperations?I'm running a structured streaming job from an external (ADLS Gen2, abfss://), storage account which has avro fil...

vmpmreistad_0-1696330325302.png
  • 3894 Views
  • 4 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

@Wundermobility  The best way to debug this is to look at the Spark UI to see if a job has been launched. One thing to call out is that trigger.Once is deprecated - we recommend using trigger.availableNow instead to avoid overwhelming the cluster.

  • 0 kudos
3 More Replies
jonathan-dufaul
by Valued Contributor
  • 36 Views
  • 1 replies
  • 0 kudos

where to report a bug in the sql formatter?

I was wondering where I go to report a bug in the sql formatter. I tried sending an email to the helpdesk but they think I'm asking for support. I'm not. I just want to report a bug in the application because I think they should know about it. I don'...

jonathandufaul_0-1733417237553.png jonathandufaul_1-1733417252375.png
  • 36 Views
  • 1 replies
  • 0 kudos
Latest Reply
TakuyaOmi
Contributor
  • 0 kudos

HI, @jonathan-dufaul For such reports, I think it would be appropriate to click on your profile icon in the top-right corner of the workspace and use the "Send Feedback" option. 

  • 0 kudos
jonathan-dufaul
by Valued Contributor
  • 1504 Views
  • 2 replies
  • 1 kudos

How do I specify column types when writing to a MSSQL server using the JDBC driver (

I have a pyspark dataframe that I'm writing to an on-prem MSSQL server--it's a stopgap while we convert data warehousing jobs over to databricks. The processes that use those tables in the on-prem server rely on the tables maintaining the identical s...

  • 1504 Views
  • 2 replies
  • 1 kudos
Latest Reply
dasanro
New Contributor II
  • 1 kudos

It's happenging to me too!Did you find any solution @jonathan-dufaul  ?Thanks!!

  • 1 kudos
1 More Replies
holychs
by New Contributor
  • 40 Views
  • 1 replies
  • 0 kudos

Concurrent Workflow Jobs

Hi Community, I am trying to run a Databricks workflow job using run_job_task under a for_loop. I have set the concurrent jobs as 2. I can see 2 iteration jobs getting triggered successfully. But both fail with an error:"ConnectException: Connection ...

  • 40 Views
  • 1 replies
  • 0 kudos
Latest Reply
TakuyaOmi
Contributor
  • 0 kudos

Hi, @holychs Did you encounter any error messages related to an OOM (Out of Memory) error?It’s possible that the driver node of the cluster doesn’t have sufficient resources (CPU, memory) to handle multiple concurrent jobs.

  • 0 kudos
Yaadhudbe
by New Contributor
  • 30 Views
  • 1 replies
  • 0 kudos

AWS Databricks- Out of Memory issue in Delta live tables

I have been using Delta live tables more than a year and have implemented good number of DLT pipelines ingesting the data from S3 bucket using the SQS. One of my pipelines process large volume of data. The DLT pipeline reads the data using CloudFiles...

Yaadhudbe_0-1733411659495.png
  • 30 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Yaadhudbe, We would need to review your DLT setup, cluster settings and spark processing to better understand the OOM errors and possible suggestions to mitigate the issue. I suggest to file a case with us to conduct a proper investigation.  http...

  • 0 kudos
devpdi
by New Contributor
  • 987 Views
  • 2 replies
  • 0 kudos

Re-use jobs as tasks with the same cluster.

Hello,I am facing an issue with my workflow.I have a job (name it main job) that, among others, runs 5 concurrent tasks, which are defined as jobs (not notebooks).Each of these jobs is identical to the others (name them sub-job-1), with the only diff...

  • 987 Views
  • 2 replies
  • 0 kudos
Latest Reply
holychs
New Contributor
  • 0 kudos

Were you able to find any solution to the problem?I am also having the similar use-case where I need to run multiple run_job_task and everytime it is spinning up new cluster of its own that is defined in child job.I am not able to find any relevant s...

  • 0 kudos
1 More Replies
susanne
by New Contributor
  • 61 Views
  • 2 replies
  • 0 kudos

Possibilities and Limitations of Delta Live Tables (DLT) with Direct Publish mode

Hi all,I am implementing a DLT table with the new Direct Publish feature which is still in Private Preview.Is it a limitation of DLT with Direct Publish that you can not query the event_log of the DLT?When I use this query:SELECT * FROM event_log('pi...

  • 61 Views
  • 2 replies
  • 0 kudos
Latest Reply
susanne
New Contributor
  • 0 kudos

@Walter_Cthank you so much, that worked perfectly

  • 0 kudos
1 More Replies
Hoviedo
by New Contributor III
  • 49 Views
  • 3 replies
  • 0 kudos

Apply expectations only if column exists

Hi, is there any way to apply a expectations only if that column exists? I am creating multiple dlt tables with the same python function so i would like to create diferent expectations based in the table name, currently i only can create expectations...

  • 49 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

To apply expectations only if a column exists in Delta Live Tables (DLT), you can use the @Dlt.expect decorator conditionally within your Python function. Here is a step-by-step approach to achieve this: Check if the Column Exists: Before applying th...

  • 0 kudos
2 More Replies
MarkD
by New Contributor II
  • 3311 Views
  • 10 replies
  • 0 kudos

SET configuration in SQL DLT pipeline does not work

Hi,I'm trying to set a dynamic value to use in a DLT query, and the code from the example documentation does not work.SET startDate='2020-01-01'; CREATE OR REFRESH LIVE TABLE filtered AS SELECT * FROM my_table WHERE created_at > ${startDate};It is g...

Data Engineering
Delta Live Tables
dlt
sql
  • 3311 Views
  • 10 replies
  • 0 kudos
Latest Reply
anardinelli
Databricks Employee
  • 0 kudos

@smit_tw Have you tried setting it on the "Advanced" tab as my previous suggests?

  • 0 kudos
9 More Replies
Maatari
by New Contributor III
  • 58 Views
  • 3 replies
  • 0 kudos

AvailableNow Trigger and failure

Hi, I wonder what is the supposed to be the behavior of spark structured streaming when using the AvailableNow Trigger and there is a query failure during the query ? More specifically, what happens to the initial end offset set ? Does it change ? Wh...

  • 58 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The AvailableNow trigger processes all available data as a single batch and then stops. This is different from continuous or micro-batch processing where the system continuously checks for new data. When a query starts with the AvailableNow trigger, ...

  • 0 kudos
2 More Replies
Balram-snaplogi
by New Contributor
  • 70 Views
  • 1 replies
  • 0 kudos

How can we customize the access token expiry duration?

Hi,I am using OAuth machine-to-machine (M2M) authentication. I created a service principal and wrote a Java application that allows me to connect to the Databricks warehouse. My question is regarding the code below:String url = "jdbc:databricks://<se...

  • 70 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

I would say that your token should be manually refreshed as mentioned in the following statement in docs: Databricks tools and SDKs that implement the Databricks client unified authentication standard will automatically generate, refresh, and use Dat...

  • 0 kudos
chethankumar
by New Contributor III
  • 134 Views
  • 3 replies
  • 0 kudos

How to execute SQL statement using terraform

Is there a way to execute SQL statements using Terraform I can see it can be possible using API as bellow, https://docs.databricks.com/api/workspace/statementexecution/executestatementbut I want to know is a strength way to run like bellow code provi...

  • 134 Views
  • 3 replies
  • 0 kudos
Latest Reply
chethankumar
New Contributor III
  • 0 kudos

I have used the bellow provider to run the query https://registry.terraform.io/providers/hashicorp/http/latest

  • 0 kudos
2 More Replies
RateVan
by New Contributor II
  • 2446 Views
  • 4 replies
  • 0 kudos

Spark last window dont flush in append mode

The problem is very simple, when you use TUMBLING window with append mode, then the window is closed only when the next message arrives (+watermark logic). In the current implementation, if you stop incoming streaming data, the last window will NEVER...

3P1l3
  • 2446 Views
  • 4 replies
  • 0 kudos
Latest Reply
Dtank
New Contributor
  • 0 kudos

Do you have any solution for this ?

  • 0 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels