cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 7440 Views
  • 4 replies
  • 2 kudos

Resolved! Databricks Spark Vs Spark on Yarn

I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn. How is the Databricks architecture different from yarn?

  • 7440 Views
  • 4 replies
  • 2 kudos
Latest Reply
de-qrosh
New Contributor III
  • 2 kudos

What about the disadvantages?How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?

  • 2 kudos
3 More Replies
MartinB
by Contributor III
  • 15259 Views
  • 5 replies
  • 3 kudos

Resolved! Interoperability Spark ↔ Pandas: can't convert Spark dataframe to Pandas dataframe via df.toPandas() when it contains datetime value in distant future

Hi,I have multiple datasets in my data lake that feature valid_from and valid_to columns indicating validity of rows.If a row is valid currently, this is indicated by valid_to=9999-12-31 00:00:00.Example:Loading this into a Spark dataframe works fine...

Example_SCD2
  • 15259 Views
  • 5 replies
  • 3 kudos
Latest Reply
ThePhil
New Contributor II
  • 3 kudos

Be aware, that in Databricks 15.2 LTS this behavior is broken.I cannot find the code, but most likely related to the following option:https://github.com/apache/spark/commit/c1c710e7da75b989f4d14e84e85f336bc10920e0#diff-f9ddcc6cba651c6ebfd34e29ef049c3...

  • 3 kudos
4 More Replies
Mahesh_Yadav
by New Contributor II
  • 2786 Views
  • 1 replies
  • 3 kudos

How to Export lineage data directly from unity catalog without using system tables

I have been trying to check if there is any direct way to export lineage hierarchy data in data bricks.I have tried to build a workaround solution by accessing system tables as per this link:Monitor usage with system tables - Azure Databricks | Micro...

  • 2786 Views
  • 1 replies
  • 3 kudos
Latest Reply
bturnwald39
New Contributor II
  • 3 kudos

I have a similar use case.  The Databricks Lineage Graph is nice but only zooms out enough for the most basic lineages.  We have lineages/data flows with hundreds of tables.  I'd like more flexibility on showing the entire flow in one screen and expo...

  • 3 kudos
hardeeksharma
by New Contributor II
  • 611 Views
  • 1 replies
  • 1 kudos

Data ingestion issue with THAI data

I have a use case where my file has data in Thai characters. The source location is azure blob storage, here files are stored in text format. I am using the following code to read the file, but when I am downloading the data from catalog it encloses ...

  • 611 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Do the quotes exist in original data?

  • 1 kudos
pradeepvatsvk
by New Contributor III
  • 1961 Views
  • 6 replies
  • 1 kudos

Too many small files from updates

Hi ,I am updating some data into a delta table , each time I  only need to  update  one row due to which after every update statement it is creating new file, How do I tackle this issue , it doesn't make sense to run optimize command after every upda...

  • 1961 Views
  • 6 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

If you performing 100s of update operations on the delta table, you can opt to run an optimize operation after a batch of 100 updates. There should be no significant performance issue up to 100 such updates

  • 1 kudos
5 More Replies
dc-rnc
by Contributor
  • 2201 Views
  • 1 replies
  • 1 kudos

Resolved! How to deploy an asset bundle job that triggers another one

Hello everyone.Using DAB, is there a dynamic value reference or something equivalent to get a job_id to be used inside the YAML definition of another Databricks job? I'd like to trigger that job from another one, but if I'm using a CI/CD pipeline to ...

  • 2201 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

resources: jobs: my-first-job: name: my-first-job tasks: - task_key: my-first-job-task new_cluster: spark_version: "13.3.x-scala2.12" node_type_id: "i3.xlarge" num_workers: 2 ...

  • 1 kudos
eballinger
by Contributor
  • 3589 Views
  • 2 replies
  • 1 kudos

Resolved! How to grant all tables in schema except 1

Hi Guys, I am trying to grant all tables in a schema to a user group in databricks. The only catch is that there is one table I do not want granted. I currently am granting schema access to the group so the benefit is that as tables are add in the fu...

  • 3589 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

What you are facing is because of inheritance.  https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/upgrade-privilege-model.html I would say this is by design, but please feel free to suggest it as an idea here - https://do...

  • 1 kudos
1 More Replies
jspehar
by New Contributor
  • 743 Views
  • 2 replies
  • 0 kudos

JDBC Error Trying to Connect erwin Data Modeler to Databricks

I am trying to connect erWin Data Modeler to Databricks to reverse engineer a physical data model. I am trying to connect manually per erWin and Databricks instructions, but I am getting the following error[Databricks][DatabricksJDBCDriver][500593] C...

  • 743 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

I hope you referred https://docs.databricks.com/en/partners/data-governance/erwin.html   It is also possible, it can be a library issue, hope you are using the Databricks JDBC driver.

  • 0 kudos
1 More Replies
AlexCancioBedon
by New Contributor II
  • 562 Views
  • 1 replies
  • 1 kudos
  • 562 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika_
Databricks Employee
  • 1 kudos

Congratulations, @AlexCancioBedon! This is a great milestone that showcases your expertise in Data engineering with Databricks. We’d love to have you share your insights with the community, whether by sharing best practices or helping others. Keep up...

  • 1 kudos
Sayeed
by New Contributor II
  • 899 Views
  • 1 replies
  • 0 kudos

Missing dbc for databricks associate engineer certification

Hi ,I am unable to find the dbc for https://customer-academy.databricks.com/learn/courses/2963/data-ingestion-with-delta-lake/lessons/25622/demo-set-up-and-load-delta-tables or anything related to databricks associate engineer certification.Any help ...

Sayeed_0-1738320038441.png
  • 899 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika_
Databricks Employee
  • 0 kudos

Hello @Sayeed! I see that you're currently going through a self-paced course, which does not include hands-on labs (dbc files). To access the labs, you can either purchase the ILT course, which will grant you access to the labs for 7 days, or get the...

  • 0 kudos
SaraCorralLou
by New Contributor III
  • 22450 Views
  • 3 replies
  • 2 kudos

Resolved! Differences between lit(None) or lit(None).cast('string')

I want to define a column with null values in my dataframe using pyspark. This column will later be used for other calculations.What is the difference between creating it in these two different ways?df.withColumn("New_Column", lit(None))df.withColumn...

  • 22450 Views
  • 3 replies
  • 2 kudos
Latest Reply
shadowinc
New Contributor III
  • 2 kudos

For me df.withColumn("New_Column", lit(None).cast(StringType())) this didn't work.I used this instead df.withColumn("New_Column", lit(null).cast(StringType))  

  • 2 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 1515 Views
  • 5 replies
  • 1 kudos

Set serveless compute environment to a task of a job

Hi Community,I want to set the environment of a task inside in a job using DABs, but I got this error.I could achieve my goal, if I set manually the task inside to be environment 2, because I need to use Python 3.11.How can I do it through DABs?

jeremy98_0-1738149373540.png
  • 1515 Views
  • 5 replies
  • 1 kudos
Latest Reply
jeremy98
Honored Contributor
  • 1 kudos

Hi,Seems that this could be set for spark_python_task:resources: jobs: New_Job_Jan_29_2025_at_11_48_AM: name: New Job Jan 29, 2025 at 11:48 AM tasks: - task_key: test-py-version2 spark_python_task: pyth...

  • 1 kudos
4 More Replies
panganibana
by New Contributor II
  • 687 Views
  • 1 replies
  • 0 kudos

Resolved! Inconsistency on Dataframe queried from External Data Source

We have a Catalog pointing to an External Data Source (Google BigQuery).1) In a notebook, create a cell where it runs a query to populate a Dataframe. Display results.2) Create another cell below and display the same Dataframe.3) I get different resu...

Data Engineering
externaldata
  • 687 Views
  • 1 replies
  • 0 kudos
Latest Reply
crystal548
New Contributor III
  • 0 kudos

@panganibana wrote:We have a Catalog pointing to an External Data Source (Google BigQuery).1) In a notebook, create a cell where it runs a query to populate a Dataframe. Display results.2) Create another cell below and display the same Dataframe.3) I...

  • 0 kudos
markbaas
by New Contributor III
  • 10841 Views
  • 9 replies
  • 0 kudos

DBFS_DOWN

I have an Azure Databricks workspace with Unity Catalog setup, using VNet and private endpoints. Serverless works great; however, the regular clusters have problems showing large results:Failed to store the result. Try rerunning the command. Failed ...

  • 10841 Views
  • 9 replies
  • 0 kudos
Latest Reply
markbaas
New Contributor III
  • 0 kudos

The dbfs (dbstorage) resource in the managed azure resource group needs to have private endpoints to your virtual network. You can create those manually or through iac (bicep/terraform).

  • 0 kudos
8 More Replies
sdes10
by New Contributor II
  • 1677 Views
  • 3 replies
  • 0 kudos

DLT apply_as_deletes not working on existing data with full refresh

I have an existing DLT pipeline that works on a modified medallion architecture. Data is sent from debezium to kafka and lands into a bronze table. From bronze table, it goes to a silver table where it is schematized. Finally to a good table where I ...

  • 1677 Views
  • 3 replies
  • 0 kudos
Latest Reply
sdes10
New Contributor II
  • 0 kudos

@Sidhant07 how do i use skipChangeCommits? The idea is that i have a bronze, silver and gold table already built. Now i am enabling deletes on gold table in the apply_changes API. The silver table is added with operation column (values c,u,r,d). I di...

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels