cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

AkifCakir
by New Contributor II
  • 18331 Views
  • 4 replies
  • 3 kudos

Resolved! Why Spark Save Modes , "overwrite" always drops table although "truncate" is true ?

Hi Dear Team, I am trying to import data from databricks to Exasol DB. I am using following code in below with Spark version is 3.0.1 ,dfw.write \ .format("jdbc") \ .option("driver", exa_driver) \ .option("url", exa_url) \ .option("db...

  • 18331 Views
  • 4 replies
  • 3 kudos
Latest Reply
Gembo
New Contributor II
  • 3 kudos

@AkifCakir , Were you able to find a way to truncate without dropping the table using the .write function as I am facing the same issue as well.

  • 3 kudos
3 More Replies
feed
by New Contributor III
  • 12266 Views
  • 4 replies
  • 2 kudos

OSError: No wkhtmltopdf executable found: "b''"

OSError: No wkhtmltopdf executable found: "b''"If this file exists please check that this process can read it or you can pass path to it manually in method call, check README. Otherwise please install wkhtmltopdf - https://github.com/JazzCore/python-...

  • 12266 Views
  • 4 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi, When did you receive this error? running a code insde a notebook , or running a cluster? or a job?Also, please tag @Debayan​ with your next response which will notify me. Thank you!

  • 2 kudos
3 More Replies
raghav99
by New Contributor II
  • 3439 Views
  • 4 replies
  • 2 kudos

Resolved! how to stream change feed from delta table when its schema is changed?

Hi Team,I would like to know how we can continue streaming change data feed from a delta table when its schema is changed ( non-additive schema changes like drop/rename column / schema migration ).I came across schemaTrackingLocation in readStream bu...

  • 3439 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 2 kudos
3 More Replies
ManojReddy
by New Contributor II
  • 857 Views
  • 1 replies
  • 0 kudos

Materialized views Refresh strategy in Triggerd vs Continuous DLT mode

Does Materialized views gets completely recalculated when we trigger DLT pipeline? Can't we start from where it left?In continuous mode of DLT pipeline Materialized view tries to optimizes the updates and computes the data?

  • 857 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ManojReddy , Certainly! Let’s break it down in a way that’s easy to understand:   Materialized Views: Imagine materialized views as precomputed results. They store data that you can query efficiently.When you refresh a materialized view, it recal...

  • 0 kudos
george_ognyanov
by New Contributor III
  • 3978 Views
  • 7 replies
  • 3 kudos

Resolved! Terraform Azure Databricks Unity Catalogue - Failed to check metastore quota limit for region

I am trying to create a metastore via the Terraform Azure databricks_metastore resource but I keep getting the error: This is the exact code I am using to create the resource:I have tried using both my Databricks account and a service principal appli...

george_ognyanov_1-1699523634061.png george_ognyanov_0-1699523597833.png
  • 3978 Views
  • 7 replies
  • 3 kudos
Latest Reply
george_ognyanov
New Contributor III
  • 3 kudos

Hi @Kaniz_Fatma as far as I understand one region can have one metastore. I am able to create a metastore in the same region if I log into the Databricks GUI and do it there.Alternatively, if I already have a metastore created and try to execute the ...

  • 3 kudos
6 More Replies
Nathant93
by New Contributor III
  • 1962 Views
  • 2 replies
  • 0 kudos

SQL Server OUTPUT clause alternative

I am looking at after a merge or insert has happened to get the records in that batch that had been inserted via either method, much like the OUTPUT clause in sql server.Does anyone have any suggestions, the only thing I can think of is to add a time...

  • 1962 Views
  • 2 replies
  • 0 kudos
Latest Reply
Nathant93
New Contributor III
  • 0 kudos

I've managed to do it like this qry = spark.sql(f"DESCRIBE history <table_name> limit 1").collect()current_version = int(qry[0][0])prev_version = current_version - 1Then do an except statement between the versions. 

  • 0 kudos
1 More Replies
Anonymous47
by New Contributor II
  • 1887 Views
  • 1 replies
  • 0 kudos

Best practices to load single delta table in parallel from multiple processes.

Hi all,A delta lake table is created with identity column, and it is not possible to load the data parallelly to this table from multiple process as it leads to MetadataChangedException.Based on another post from community, we can have try to repeat ...

  • 1887 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Anonymous47 , Let’s dive into your questions regarding Delta Lake and parallel writes:   Best Practices for Parallel Writes: Partitioning: Choose an appropriate partition column for your Delta table. Typically, the most commonly used partition co...

  • 0 kudos
CaptainJack
by New Contributor III
  • 5539 Views
  • 2 replies
  • 0 kudos

Resolved! Workspace API

Hello friends. I am having problem with Workspace API. I have many folders inside my /Workspace (200+) which I would like to copy my Program, whole Program folder, which includes 20 spark scripts are Databricks notebooks. I tried Workspace API and I ...

  • 5539 Views
  • 2 replies
  • 0 kudos
Latest Reply
CaptainJack
New Contributor III
  • 0 kudos

I am using this as api = /api/2.0/workspace/import

  • 0 kudos
1 More Replies
Chalki
by New Contributor III
  • 2587 Views
  • 1 replies
  • 0 kudos

DataBricks Asset Bundles - Don't deploy to the workspace, update only the repo

Hello Guys, So basically me and my team have bunch of jobs, which are pointing to a remote repo directly - they are not pointing to the workspace of the related environment. Is there a way to update the repo part in our databricks environment, instea...

  • 2587 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Chalki, Let’s address both aspects of your question:   Updating the Repo in Databricks Environment: If your jobs are currently pointing directly to a remote repository and you want to update the code without deploying it to the workspace, you can...

  • 0 kudos
Akshay_127877
by New Contributor II
  • 28240 Views
  • 7 replies
  • 1 kudos

How to open Streamlit URL that is hosted by Databricks in local web browser?

I have run this webapp code on Databricks notebook. It works properly without any errors. With databricks acting as server, I am unable open this link on my browser for this webapp.But when I run the code on my local IDE, I am able to just open the U...

image
  • 28240 Views
  • 7 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Akshay Aravinnakshan​ Thank you for posting your question in our community!Your input matters! Help our community thrive by coming back and marking the most helpful and accurate answers. Together, we can make a difference!

  • 1 kudos
6 More Replies
KNYSJOA
by New Contributor
  • 2292 Views
  • 4 replies
  • 0 kudos

SDK Workspace client HTTP Connection Pool

Hello.Do you know how to solve issue with the HTTPSConnectionPool when we are using SDK WorkspaceClient in notebook via workflow?I would like to trigger job when some conditions are met. These condition are done using Python. I am using SDK to trigge...

  • 2292 Views
  • 4 replies
  • 0 kudos
Latest Reply
Dribka
New Contributor III
  • 0 kudos

It seems like the issue you're facing with the HTTPSConnectionPool in the SDK WorkspaceClient when using it within a workflow may be related to the environment variables or credentials not being propagated correctly. When running the notebook manuall...

  • 0 kudos
3 More Replies
Deexith
by New Contributor
  • 3784 Views
  • 3 replies
  • 0 kudos

getting this error in logs Status logger error unable to locate configured logger context factory though i am able to connect with databricks db and retrive the data for mulesoft integration

ERROR StatusLogger Unable to locate configured LoggerContextFactory org.mule.runtime.module.launcher.log4j2.MuleLog4jContextFactoryERROR StatusLogger Unable to load class org.apache.logging.log4j.core.config.xml.XmlConfigurationFactoryjava.lang.Class...

  • 3784 Views
  • 3 replies
  • 0 kudos
Latest Reply
DataBricks1565
New Contributor II
  • 0 kudos

Hi @Uppala Deexith​ Any update on how you fixed this issue would greatly appreciated.

  • 0 kudos
2 More Replies
CKBertrams
by New Contributor III
  • 1359 Views
  • 2 replies
  • 2 kudos

Resolved! Stream failure notifications

Hi all,I have a job running three consecutive streams, when just one of them fails I want to get notified. The notification only triggers when all tasks have failed or are skipped/canceled. Does anyone have a suggestion on how to implement this?

  • 1359 Views
  • 2 replies
  • 2 kudos
Latest Reply
deng_dev
New Contributor III
  • 2 kudos

Hi!You can add notifications directly on tasks 

  • 2 kudos
1 More Replies
Kayla
by Valued Contributor
  • 1740 Views
  • 2 replies
  • 0 kudos

Clusters Suddenly Failing - java.lang.RuntimeException: abort: DriverClient destroyed

I'm having clusters randomly failing that we've been using without issue for weeks. We're able to run a handful of cells and then get an error about "java.lang.RuntimeException: abort: DriverClient destroyed". Has anyone run into this before?Edit: I ...

  • 1740 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Kayla , Let’s explore some potential solutions to address this issue: Cluster Configuration: You mentioned that the same code worked before with a smaller 6-node cluster but started failing after upgrading to a 12-node cluster. Consider the f...

  • 0 kudos
1 More Replies
Trifa
by New Contributor II
  • 432 Views
  • 0 replies
  • 0 kudos

Override DLT Fille Refresh using a Job parameter

HelloI have a Job with a DLT pipeline as a first task. From time to time, I want to execute this Job with a Full Refresh of the DLT pipeline. How could I override my default "full_refresh = false" ?This was possible before using the Legacy parameters...

Trifa_0-1701170537015.png
  • 432 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels