cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mahesh_Yadav
by New Contributor II
  • 2784 Views
  • 1 replies
  • 3 kudos

How to Export lineage data directly from unity catalog without using system tables

I have been trying to check if there is any direct way to export lineage hierarchy data in data bricks.I have tried to build a workaround solution by accessing system tables as per this link:Monitor usage with system tables - Azure Databricks | Micro...

  • 2784 Views
  • 1 replies
  • 3 kudos
Latest Reply
bturnwald39
New Contributor II
  • 3 kudos

I have a similar use case.  The Databricks Lineage Graph is nice but only zooms out enough for the most basic lineages.  We have lineages/data flows with hundreds of tables.  I'd like more flexibility on showing the entire flow in one screen and expo...

  • 3 kudos
hardeeksharma
by New Contributor II
  • 604 Views
  • 1 replies
  • 1 kudos

Data ingestion issue with THAI data

I have a use case where my file has data in Thai characters. The source location is azure blob storage, here files are stored in text format. I am using the following code to read the file, but when I am downloading the data from catalog it encloses ...

  • 604 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Do the quotes exist in original data?

  • 1 kudos
pradeepvatsvk
by New Contributor III
  • 1946 Views
  • 6 replies
  • 1 kudos

Too many small files from updates

Hi ,I am updating some data into a delta table , each time I  only need to  update  one row due to which after every update statement it is creating new file, How do I tackle this issue , it doesn't make sense to run optimize command after every upda...

  • 1946 Views
  • 6 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

If you performing 100s of update operations on the delta table, you can opt to run an optimize operation after a batch of 100 updates. There should be no significant performance issue up to 100 such updates

  • 1 kudos
5 More Replies
dc-rnc
by Contributor
  • 2190 Views
  • 1 replies
  • 1 kudos

Resolved! How to deploy an asset bundle job that triggers another one

Hello everyone.Using DAB, is there a dynamic value reference or something equivalent to get a job_id to be used inside the YAML definition of another Databricks job? I'd like to trigger that job from another one, but if I'm using a CI/CD pipeline to ...

  • 2190 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

resources: jobs: my-first-job: name: my-first-job tasks: - task_key: my-first-job-task new_cluster: spark_version: "13.3.x-scala2.12" node_type_id: "i3.xlarge" num_workers: 2 ...

  • 1 kudos
eballinger
by Contributor
  • 3568 Views
  • 2 replies
  • 1 kudos

Resolved! How to grant all tables in schema except 1

Hi Guys, I am trying to grant all tables in a schema to a user group in databricks. The only catch is that there is one table I do not want granted. I currently am granting schema access to the group so the benefit is that as tables are add in the fu...

  • 3568 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

What you are facing is because of inheritance.  https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/upgrade-privilege-model.html I would say this is by design, but please feel free to suggest it as an idea here - https://do...

  • 1 kudos
1 More Replies
jspehar
by New Contributor
  • 739 Views
  • 2 replies
  • 0 kudos

JDBC Error Trying to Connect erwin Data Modeler to Databricks

I am trying to connect erWin Data Modeler to Databricks to reverse engineer a physical data model. I am trying to connect manually per erWin and Databricks instructions, but I am getting the following error[Databricks][DatabricksJDBCDriver][500593] C...

  • 739 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

I hope you referred https://docs.databricks.com/en/partners/data-governance/erwin.html   It is also possible, it can be a library issue, hope you are using the Databricks JDBC driver.

  • 0 kudos
1 More Replies
AlexCancioBedon
by New Contributor II
  • 558 Views
  • 1 replies
  • 1 kudos
  • 558 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika_
Databricks Employee
  • 1 kudos

Congratulations, @AlexCancioBedon! This is a great milestone that showcases your expertise in Data engineering with Databricks. We’d love to have you share your insights with the community, whether by sharing best practices or helping others. Keep up...

  • 1 kudos
Sayeed
by New Contributor II
  • 895 Views
  • 1 replies
  • 0 kudos

Missing dbc for databricks associate engineer certification

Hi ,I am unable to find the dbc for https://customer-academy.databricks.com/learn/courses/2963/data-ingestion-with-delta-lake/lessons/25622/demo-set-up-and-load-delta-tables or anything related to databricks associate engineer certification.Any help ...

Sayeed_0-1738320038441.png
  • 895 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika_
Databricks Employee
  • 0 kudos

Hello @Sayeed! I see that you're currently going through a self-paced course, which does not include hands-on labs (dbc files). To access the labs, you can either purchase the ILT course, which will grant you access to the labs for 7 days, or get the...

  • 0 kudos
SaraCorralLou
by New Contributor III
  • 22435 Views
  • 3 replies
  • 2 kudos

Resolved! Differences between lit(None) or lit(None).cast('string')

I want to define a column with null values in my dataframe using pyspark. This column will later be used for other calculations.What is the difference between creating it in these two different ways?df.withColumn("New_Column", lit(None))df.withColumn...

  • 22435 Views
  • 3 replies
  • 2 kudos
Latest Reply
shadowinc
New Contributor III
  • 2 kudos

For me df.withColumn("New_Column", lit(None).cast(StringType())) this didn't work.I used this instead df.withColumn("New_Column", lit(null).cast(StringType))  

  • 2 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 1511 Views
  • 5 replies
  • 1 kudos

Set serveless compute environment to a task of a job

Hi Community,I want to set the environment of a task inside in a job using DABs, but I got this error.I could achieve my goal, if I set manually the task inside to be environment 2, because I need to use Python 3.11.How can I do it through DABs?

jeremy98_0-1738149373540.png
  • 1511 Views
  • 5 replies
  • 1 kudos
Latest Reply
jeremy98
Honored Contributor
  • 1 kudos

Hi,Seems that this could be set for spark_python_task:resources: jobs: New_Job_Jan_29_2025_at_11_48_AM: name: New Job Jan 29, 2025 at 11:48 AM tasks: - task_key: test-py-version2 spark_python_task: pyth...

  • 1 kudos
4 More Replies
panganibana
by New Contributor II
  • 678 Views
  • 1 replies
  • 0 kudos

Resolved! Inconsistency on Dataframe queried from External Data Source

We have a Catalog pointing to an External Data Source (Google BigQuery).1) In a notebook, create a cell where it runs a query to populate a Dataframe. Display results.2) Create another cell below and display the same Dataframe.3) I get different resu...

Data Engineering
externaldata
  • 678 Views
  • 1 replies
  • 0 kudos
Latest Reply
crystal548
New Contributor III
  • 0 kudos

@panganibana wrote:We have a Catalog pointing to an External Data Source (Google BigQuery).1) In a notebook, create a cell where it runs a query to populate a Dataframe. Display results.2) Create another cell below and display the same Dataframe.3) I...

  • 0 kudos
markbaas
by New Contributor III
  • 10826 Views
  • 9 replies
  • 0 kudos

DBFS_DOWN

I have an Azure Databricks workspace with Unity Catalog setup, using VNet and private endpoints. Serverless works great; however, the regular clusters have problems showing large results:Failed to store the result. Try rerunning the command. Failed ...

  • 10826 Views
  • 9 replies
  • 0 kudos
Latest Reply
markbaas
New Contributor III
  • 0 kudos

The dbfs (dbstorage) resource in the managed azure resource group needs to have private endpoints to your virtual network. You can create those manually or through iac (bicep/terraform).

  • 0 kudos
8 More Replies
sdes10
by New Contributor II
  • 1672 Views
  • 3 replies
  • 0 kudos

DLT apply_as_deletes not working on existing data with full refresh

I have an existing DLT pipeline that works on a modified medallion architecture. Data is sent from debezium to kafka and lands into a bronze table. From bronze table, it goes to a silver table where it is schematized. Finally to a good table where I ...

  • 1672 Views
  • 3 replies
  • 0 kudos
Latest Reply
sdes10
New Contributor II
  • 0 kudos

@Sidhant07 how do i use skipChangeCommits? The idea is that i have a bronze, silver and gold table already built. Now i am enabling deletes on gold table in the apply_changes API. The silver table is added with operation column (values c,u,r,d). I di...

  • 0 kudos
2 More Replies
Abdurrahman
by New Contributor II
  • 1115 Views
  • 3 replies
  • 0 kudos

How can I save a large spark table (~88.3Mn rows) to a delta lake table

I am trying to add a column to an existing delta lake table by adding a column and saving the table as a new table. The spark driver is getting overloaded. I have databricks notebook to work with (I have a decent compute as well g5.12xlarge) and have...

  • 1115 Views
  • 3 replies
  • 0 kudos
Latest Reply
Amit_Dass
New Contributor II
  • 0 kudos

Hi @Abdurrahman, Addition to the Sidhant07, I assumed you are adding this new column and you may be using this column in query, Use the ZORDER & OPTIMIZE both. ZORDER (Highly Recommended): Even more important than just OPTIMIZE for adding columns eff...

  • 0 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 8350 Views
  • 4 replies
  • 0 kudos

Resolved! Concurrent Writes to the same DELTA TABLE

Hi Community,My team and I have written some workflows that write to the same table. One of my workflows performs a MERGE operation on the table, while another workflow performs an append. However, these operations can occur simultaneously, leading t...

  • 8350 Views
  • 4 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

To resolve the issue of concurrent write conflicts, specifically the `ConcurrentAppendException: [DELTA_CONCURRENT_APPEND]`, you can consider the following strategies: 1. **Isolation Levels**:- **WriteSerializable**: This is the default isolation lev...

  • 0 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels