cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

martkev
by New Contributor
  • 1402 Views
  • 1 replies
  • 0 kudos

Networking Setup in Standard Tier – VNet Integration and Proxy Issues

Hi everyone,We are working on an order forecasting model using azure databricks and an ml model from Hugging Face and are running into an issue where the connection over SSL (port 443) fails during the handshake (EOF Error SSL 992). We suspect that a...

  • 1402 Views
  • 1 replies
  • 0 kudos
Latest Reply
arjun_kr
Databricks Employee
  • 0 kudos

It may depend on your UDR setup. If you have a UDR rule routing the traffic to any firewall appliance, it may possibly be related to traffic not being allowed in the firewall. If there is no UDR or UDR rule routes this traffic to the Internet, it wou...

  • 0 kudos
Anonymous
by Not applicable
  • 17036 Views
  • 8 replies
  • 14 kudos

Resolved! MetadataChangedException

A delta lake table is created with identity column and I'm not able to load the data parallelly from four process. i'm getting the metadata exception error.I don't want to load the data in temp table . Need to load directly and parallelly in to delta...

  • 17036 Views
  • 8 replies
  • 14 kudos
Latest Reply
cpc0707
New Contributor II
  • 14 kudos

I'm having the same issue, need to load a large amount of data from separate files into a delta table and I want to do it with a for each loop so I don't have to run it sequentially which will take days. There should be a way to handle this 

  • 14 kudos
7 More Replies
Ulman
by New Contributor II
  • 4296 Views
  • 9 replies
  • 1 kudos

Switching to File Notification Mode with ADLS Gen2 - Encountering StorageException

Hello,We are currently utilizing an autoloader with file listing mode for a stream, which is experiencing significant latency due to the non-incremental naming of files in the directory—a condition that cannot be altered.In an effort to mitigate this...

Data Engineering
ADLS gen2
autoloader
file notification mode
  • 4296 Views
  • 9 replies
  • 1 kudos
Latest Reply
Rah_Cencora
New Contributor II
  • 1 kudos

You should also reevaluate your use of premium storage for your landing area files. Typically, storage for raw files does not need to be the fastest and most resilient and expensive tier. Unless you have a compelling reason for premium storage for la...

  • 1 kudos
8 More Replies
vanverne
by New Contributor II
  • 1049 Views
  • 2 replies
  • 1 kudos

Assistance with Capturing Auto-Generated IDs in Databricks SQL

Hello,I am currently working on a project where I need to insert multiple rows into a table and capture the auto-generated IDs for each row. I am using databricks sql connector. Here is a simplified version of my current workflow:I create a temporary...

  • 1049 Views
  • 2 replies
  • 1 kudos
Latest Reply
vanverne
New Contributor II
  • 1 kudos

Thanks for the reply, Alfonso. I noticed you mentioned "Below are a few alternatives...", however, I am not seeing those. Please let me know if I am missing something. Also, do you know if Databricks is working on supporting the RETURNING clause soon...

  • 1 kudos
1 More Replies
angelop
by New Contributor
  • 223 Views
  • 1 replies
  • 0 kudos

Databricks Clean Rooms creation

I am trying to create a Databricks Clean Rooms instance, I have been following the video from Databricks youtube channel.As I only have one workspace, to create a clean rooms I have added my own Clean Room sharing identifier,when I do that I get the ...

  • 223 Views
  • 1 replies
  • 0 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 0 kudos

@angelop I tried it as well and encountered the same error. A new collaborator needs to be set up. If that’s not feasible, it would be advisable to reach out to Databricks support.By the way, the following video provides a more detailed explanation a...

  • 0 kudos
The_Demigorgan
by New Contributor
  • 1538 Views
  • 1 replies
  • 0 kudos

Autoloader issue

I'm trying to ingest data from Parquet files using Autoloader. Now, I have my custom schema, I don't want to infer the schema from the parquet files.During readstream everything is fine. But during writestream, it is somehow inferring the schema from...

  • 1538 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

In this case, please make sure you specify the schema explicitly when reading the Parquet files and do not specify any inference options. Something like spark.readStream.format("cloudFiles").schema(schema)... If you want to more easily grab the schem...

  • 0 kudos
vmpmreistad
by New Contributor II
  • 5913 Views
  • 4 replies
  • 0 kudos

How to make structured streaming with autoloader efficiently and incrementally read files

TLDR format: How do I make a structured streaming job using autoloader read files using InMemoryFileIndex instead of DeltaFileOperations?I'm running a structured streaming job from an external (ADLS Gen2, abfss://), storage account which has avro fil...

vmpmreistad_0-1696330325302.png
  • 5913 Views
  • 4 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

@Wundermobility  The best way to debug this is to look at the Spark UI to see if a job has been launched. One thing to call out is that trigger.Once is deprecated - we recommend using trigger.availableNow instead to avoid overwhelming the cluster.

  • 0 kudos
3 More Replies
jonathan-dufaul
by Valued Contributor
  • 337 Views
  • 1 replies
  • 0 kudos

where to report a bug in the sql formatter?

I was wondering where I go to report a bug in the sql formatter. I tried sending an email to the helpdesk but they think I'm asking for support. I'm not. I just want to report a bug in the application because I think they should know about it. I don'...

jonathandufaul_0-1733417237553.png jonathandufaul_1-1733417252375.png
  • 337 Views
  • 1 replies
  • 0 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 0 kudos

HI, @jonathan-dufaul For such reports, I think it would be appropriate to click on your profile icon in the top-right corner of the workspace and use the "Send Feedback" option. 

  • 0 kudos
jonathan-dufaul
by Valued Contributor
  • 2600 Views
  • 2 replies
  • 1 kudos

How do I specify column types when writing to a MSSQL server using the JDBC driver (

I have a pyspark dataframe that I'm writing to an on-prem MSSQL server--it's a stopgap while we convert data warehousing jobs over to databricks. The processes that use those tables in the on-prem server rely on the tables maintaining the identical s...

  • 2600 Views
  • 2 replies
  • 1 kudos
Latest Reply
dasanro
New Contributor II
  • 1 kudos

It's happenging to me too!Did you find any solution @jonathan-dufaul  ?Thanks!!

  • 1 kudos
1 More Replies
Yaadhudbe
by New Contributor II
  • 394 Views
  • 1 replies
  • 0 kudos

AWS Databricks- Out of Memory issue in Delta live tables

I have been using Delta live tables more than a year and have implemented good number of DLT pipelines ingesting the data from S3 bucket using the SQS. One of my pipelines process large volume of data. The DLT pipeline reads the data using CloudFiles...

Yaadhudbe_0-1733411659495.png
  • 394 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Yaadhudbe, We would need to review your DLT setup, cluster settings and spark processing to better understand the OOM errors and possible suggestions to mitigate the issue. I suggest to file a case with us to conduct a proper investigation.  http...

  • 0 kudos
Hoviedo
by New Contributor III
  • 665 Views
  • 3 replies
  • 0 kudos

Apply expectations only if column exists

Hi, is there any way to apply a expectations only if that column exists? I am creating multiple dlt tables with the same python function so i would like to create diferent expectations based in the table name, currently i only can create expectations...

  • 665 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

To apply expectations only if a column exists in Delta Live Tables (DLT), you can use the @Dlt.expect decorator conditionally within your Python function. Here is a step-by-step approach to achieve this: Check if the Column Exists: Before applying th...

  • 0 kudos
2 More Replies
MarkD
by New Contributor II
  • 4742 Views
  • 10 replies
  • 0 kudos

SET configuration in SQL DLT pipeline does not work

Hi,I'm trying to set a dynamic value to use in a DLT query, and the code from the example documentation does not work.SET startDate='2020-01-01'; CREATE OR REFRESH LIVE TABLE filtered AS SELECT * FROM my_table WHERE created_at > ${startDate};It is g...

Data Engineering
Delta Live Tables
dlt
sql
  • 4742 Views
  • 10 replies
  • 0 kudos
Latest Reply
anardinelli
Databricks Employee
  • 0 kudos

@smit_tw Have you tried setting it on the "Advanced" tab as my previous suggests?

  • 0 kudos
9 More Replies
Maatari
by New Contributor III
  • 677 Views
  • 3 replies
  • 0 kudos

AvailableNow Trigger and failure

Hi, I wonder what is the supposed to be the behavior of spark structured streaming when using the AvailableNow Trigger and there is a query failure during the query ? More specifically, what happens to the initial end offset set ? Does it change ? Wh...

  • 677 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The AvailableNow trigger processes all available data as a single batch and then stops. This is different from continuous or micro-batch processing where the system continuously checks for new data. When a query starts with the AvailableNow trigger, ...

  • 0 kudos
2 More Replies
Balram-snaplogi
by New Contributor II
  • 712 Views
  • 1 replies
  • 1 kudos

How can we customize the access token expiry duration?

Hi,I am using OAuth machine-to-machine (M2M) authentication. I created a service principal and wrote a Java application that allows me to connect to the Databricks warehouse. My question is regarding the code below:String url = "jdbc:databricks://<se...

  • 712 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

I would say that your token should be manually refreshed as mentioned in the following statement in docs: Databricks tools and SDKs that implement the Databricks client unified authentication standard will automatically generate, refresh, and use Dat...

  • 1 kudos
RateVan
by New Contributor II
  • 3349 Views
  • 4 replies
  • 0 kudos

Spark last window dont flush in append mode

The problem is very simple, when you use TUMBLING window with append mode, then the window is closed only when the next message arrives (+watermark logic). In the current implementation, if you stop incoming streaming data, the last window will NEVER...

3P1l3
  • 3349 Views
  • 4 replies
  • 0 kudos
Latest Reply
Dtank
New Contributor II
  • 0 kudos

Do you have any solution for this ?

  • 0 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels