cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

ksenija
by Contributor
  • 417 Views
  • 5 replies
  • 0 kudos

DLT pipeline - DebeziumJDBCMicroBatchProvider not found

Hi!I created DLT pipeline and I'm getting this error:[STREAM_FAILED] Query [id = ***, runId = ***] terminated with exception: object com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found.I'm using Serverless.How to verify that the require...

Data Engineering
DebeziumJDBCMicroBatchProvider
dlt
  • 417 Views
  • 5 replies
  • 0 kudos
Latest Reply
ksenija
Contributor
  • 0 kudos

@Dnirmania, @jlachniet I didn’t manage to resolve this issue, but I created a regular notebook and I’m using MERGE statement. If you can’t merge all data at once, you can use a loop with hourly intervals

  • 0 kudos
4 More Replies
EDDatabricks
by Contributor
  • 1161 Views
  • 3 replies
  • 0 kudos

Concurrency issue with append only writed

Dear all,We have a pyspark streaming job (DBR: 14.3) that continuously writes new data on a Delta Table (TableA).On this table, there is a pyspark batch job (DBR: 14.3) that operates every 15 minuted and in some cases it may delete some records from ...

Data Engineering
Concurrency
DBR 14.3
delta
MERGE
  • 1161 Views
  • 3 replies
  • 0 kudos
Latest Reply
Dilisha
New Contributor II
  • 0 kudos

Hi @EDDatabricks  - were you able to find the fix for this? I am also facing a similar issue. Added more details here  - Getting concurrent Append exception after upgradin... - Databricks Community - 76521

  • 0 kudos
2 More Replies
Yulei
by New Contributor III
  • 8329 Views
  • 5 replies
  • 1 kudos

Resolved! Could not reach driver of cluster

 Hi, Rencently, I am seeing issue Could not reach driver of cluster <some_id> with my structure streaming job when migrating to unity catalog and found this when checking the traceback:Traceback (most recent call last):File "/databricks/python_shell/...

  • 8329 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kub4S
New Contributor II
  • 1 kudos

To expand on the same error "Could not reach driver of cluster XX" but different cause;the reason in my case (ADF triggered databricks job which runs into this error) was a problem with a numpy library version, where solution is to downgrade the libr...

  • 1 kudos
4 More Replies
FhSpZ
by New Contributor II
  • 165 Views
  • 2 replies
  • 0 kudos

Error AgnosticEncoder.isStruct() in Intellij using Scala locally.

I've been trying to execute a connect to Azure Databricks from Intellij using Scala locally, but I've got this error below: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.AgnosticEncoder.isStruct()Zat o...

  • 165 Views
  • 2 replies
  • 0 kudos
Latest Reply
FhSpZ
New Contributor II
  • 0 kudos

Hi @Kaniz_Fatma,I ensured that I was using the correct Spark version that matched the version of my databricks runtime, which was the same. But I tried use the Spark version 3.5.1 locally in the .sbt dependencies, then this worked, kind strange.Anywa...

  • 0 kudos
1 More Replies
avrm91
by Contributor
  • 500 Views
  • 4 replies
  • 1 kudos

How to load xlsx Files to Delta Live Tables (DLT)?

I want to load a .xlsx file to DLT but struggling as it is not available with Autoloader.With the Assistant I tried to load the .xlsx first to a data frame and then send it to DLT.  import dlt from pyspark.sql import SparkSession # Load xlsx file in...

  • 500 Views
  • 4 replies
  • 1 kudos
Latest Reply
avrm91
Contributor
  • 1 kudos

Added a feature request into Azure Community PortalXLSX - DLT Autoloader · Community (azure.com)

  • 1 kudos
3 More Replies
avrm91
by Contributor
  • 201 Views
  • 2 replies
  • 2 kudos

Resolved! XBRL File Format

I was searching for some XBRL documentation with Databricks as it is a business reporting standard format.Especially for DLT and Autoloader. Is there anything in the development pipeline? 

  • 201 Views
  • 2 replies
  • 2 kudos
Latest Reply
avrm91
Contributor
  • 2 kudos

I added the XBRL to the Azure communityXBRL · Community (azure.com)

  • 2 kudos
1 More Replies
aothman
by New Contributor
  • 122 Views
  • 1 replies
  • 0 kudos

Databricks on AWS Outposts rack

We need to know if Databricks can be implemented on a private cloud infrastructure or on AWS Outposts rack  

  • 122 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @aothman, Databricks currently supports public cloud providers such as Azure, GCP, and AWS, but it does not offer direct support for private cloud implementations beyond these.While Databricks itself does not directly support Outposts, you ca...

  • 0 kudos
hfyhn
by New Contributor
  • 134 Views
  • 1 replies
  • 0 kudos

DLT, combine LIVE table with data masking and row filter

I need to apply data masking and row filters to my table. At the same time I would like to use DLT Live tables. However, as far as I can see, DLT Live tables are not compatble with Live tables. What are my options? Move the tables from out of the mat...

  • 134 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @hfyhn, You’re correct that DLT Live tables don’t inherently support writing data from multiple live tables into a single target Delta table. This means you cannot directly configure two or more DLT pipelines to update the same table simultaneousl...

  • 0 kudos
FrankTa
by New Contributor II
  • 493 Views
  • 2 replies
  • 2 kudos

Resolved! Unstable workflow runs lately

Hi!We are using Databricks on Azure on production since about 3 months. A big part of what we use Databricks for is processing data using a workflow with various Python notebooks. We run the workflow on a 'Pools' cluster and on a 'All-purpose compute...

  • 493 Views
  • 2 replies
  • 2 kudos
Latest Reply
FrankTa
New Contributor II
  • 2 kudos

Hi holly,Thanks for your reply, good to hear that the 403 errors are on the radar and due to be fixed. I will reach out to support in case of further issues.

  • 2 kudos
1 More Replies
PaulMarsh
by New Contributor II
  • 194 Views
  • 2 replies
  • 0 kudos

Python databricks sql.connect takes 10minutes exact to connect to Serverless SQL Datawarehouse

Have tried this with both the SQL Warehouse up and running and turned off. The connection eventually is made and works without issue, but the initial connect() takes 10minutes to establish. No other connectivity issues between the box and the Warehou...

  • 194 Views
  • 2 replies
  • 0 kudos
Latest Reply
PaulMarsh
New Contributor II
  • 0 kudos

@Kaniz_Fatma thank you for the reply, it seems the issue came down to the script was running with Python 3.12 and I should have been using 3.11.9 

  • 0 kudos
1 More Replies
lprevost
by New Contributor III
  • 601 Views
  • 7 replies
  • 2 kudos

Incremental Loads from a Catalog/DLT

The databricks guide outlines several use cases for loading data via delta live tables:https://docs.databricks.com/en/delta-live-tables/load.html This includes autoloader from cloudfiles, kafka messages, small static datasets, etc.But, one use case i...

  • 601 Views
  • 7 replies
  • 2 kudos
Latest Reply
lucasrocha
New Contributor III
  • 2 kudos

Hello @lprevost , I hope this message finds you well. In order to install this library, you can follow any of the steps below:- First step - search for libraries and versions: Go the cluster;Libraries;Install new;Maven;Search Packages;Change from Spa...

  • 2 kudos
6 More Replies
ameya
by New Contributor
  • 175 Views
  • 1 replies
  • 0 kudos

Adding new columns to a Delta Live table in a CDC process

Hi I am new to databricks and still learning.I am trying to do a CDC on a table.  APPLY CHANGES INTO LIVE.table1 FROM schema2.table2 KEYS (Id) SEQUENCE BY orderByColumn COLUMNS * EXCEPT (col1, col2) STORED AS SCD TYPE 1 ;  table1 is in schema1 and ...

  • 175 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
Honored Contributor II
  • 0 kudos

Hi @ameya , Scenario 1: Enabling Delta schema evolution in your table or at DLT pipeline level should suffice for the scenario of new fields being added to the schema.  Scenario 2: The INSERT statement doesn't support schema evolution as described in...

  • 0 kudos
NarenderKumar
by New Contributor III
  • 235 Views
  • 2 replies
  • 2 kudos

Resolved! Can we parameterize the compute in job cluster

I have created a workflow job in databricks with job parameters.I want to run the job same with different workloads and data volume.So I want the compute cluster to be parametrized so that I can pass the compute requirements(driver, executor size and...

  • 235 Views
  • 2 replies
  • 2 kudos
Latest Reply
brockb
Valued Contributor
  • 2 kudos

Hi @NarenderKumar , Have you considered leveraging autoscaling for the existing cluster?If this does not meet your needs, are the differing volume/workloads known in advance? If so, could different compute be provisioned using Infrastructure as Code ...

  • 2 kudos
1 More Replies
yusufd
by New Contributor III
  • 1621 Views
  • 9 replies
  • 8 kudos

Resolved! Pyspark serialization

Hi,I was looking for comprehensive documentation on implementing serialization in pyspark, most of the places I have seen is all about serialization with scala. Could you point out where I can get a detailed explanation on it?

  • 1621 Views
  • 9 replies
  • 8 kudos
Latest Reply
yusufd
New Contributor III
  • 8 kudos

Thank you @Kaniz_Fatma  for the prompt reply. This clears the things and also distinguishes between spark-scala and pyspark. Appreciate your explanation. Will apply this and also share any findings based on this which will help the community!

  • 8 kudos
8 More Replies
tseader
by New Contributor III
  • 426 Views
  • 3 replies
  • 1 kudos

Resolved! Python SDK clusters.create_and_wait - Sourcing from cluster-create JSON

I am attempting to create a compute cluster using the Python SDK while sourcing a cluster-create configuration JSON file, which is how it's done for the databricks-cli and what databricks provides through the GUI.  Reading in the JSON as a Dict fails...

  • 426 Views
  • 3 replies
  • 1 kudos
Latest Reply
tseader
New Contributor III
  • 1 kudos

@Kaniz_Fatma The structure of the `cluster-create.json` is perfectly fine.  The issue is as stated above related to the structure is that the SDK does not allow nested structures from the JSON file to be used, and instead they need to be cast to spec...

  • 1 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors