cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Paxi
by New Contributor
  • 1065 Views
  • 1 replies
  • 0 kudos

Maven libs often failed during installation

Dear Community,I have a Databricks compute where I added 2 Maven libs using a custom repository from Nexus (because of a company policy, Databricks cannot communicate with the public internet, so I must use a private Nexus repo using a firewall). Sin...

  • 1065 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

@Paxi Not sure if you are still looking for the solution. But we have Databricks Git server proxy that you can configure which enables you to proxy Git commands from Databricks Git folders to your on-premises Git repositories https://docs.databricks....

  • 0 kudos
Prathik
by New Contributor II
  • 310 Views
  • 2 replies
  • 1 kudos

Exam got suspended in the middle

My Databricks Certified Data Engineer Associate  exam was suspended on 31st Jan'2025 and is currently in a "SUSPENDED" state.I remained in front of the camera throughout the exam, and suddenly an alert appeared. The support person asked me to show th...

Data Engineering
@Cert-Team
  • 310 Views
  • 2 replies
  • 1 kudos
Latest Reply
Prathik
New Contributor II
  • 1 kudos

@Cert-TeamOPS @Cert-Team ,Thanks for your quick response. Please find the latest open request no. - #00610129 

  • 1 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 5558 Views
  • 4 replies
  • 2 kudos

Resolved! Databricks Spark Vs Spark on Yarn

I am moving my Spark workloads from EMR/on-premise Spark cluster to Databricks. I understand Databricks Spark is different from Yarn. How is the Databricks architecture different from yarn?

  • 5558 Views
  • 4 replies
  • 2 kudos
Latest Reply
de-qrosh
New Contributor III
  • 2 kudos

What about the disadvantages?How can I separate multiple jobs running on the same cluster cleanly in the logs and same in the spark-ui?

  • 2 kudos
3 More Replies
MartinB
by Contributor III
  • 11640 Views
  • 5 replies
  • 3 kudos

Resolved! Interoperability Spark ↔ Pandas: can't convert Spark dataframe to Pandas dataframe via df.toPandas() when it contains datetime value in distant future

Hi,I have multiple datasets in my data lake that feature valid_from and valid_to columns indicating validity of rows.If a row is valid currently, this is indicated by valid_to=9999-12-31 00:00:00.Example:Loading this into a Spark dataframe works fine...

Example_SCD2
  • 11640 Views
  • 5 replies
  • 3 kudos
Latest Reply
ThePhil
New Contributor II
  • 3 kudos

Be aware, that in Databricks 15.2 LTS this behavior is broken.I cannot find the code, but most likely related to the following option:https://github.com/apache/spark/commit/c1c710e7da75b989f4d14e84e85f336bc10920e0#diff-f9ddcc6cba651c6ebfd34e29ef049c3...

  • 3 kudos
4 More Replies
Mahesh_Yadav
by New Contributor II
  • 1680 Views
  • 1 replies
  • 1 kudos

How to Export lineage data directly from unity catalog without using system tables

I have been trying to check if there is any direct way to export lineage hierarchy data in data bricks.I have tried to build a workaround solution by accessing system tables as per this link:Monitor usage with system tables - Azure Databricks | Micro...

  • 1680 Views
  • 1 replies
  • 1 kudos
Latest Reply
bturnwald39
New Contributor II
  • 1 kudos

I have a similar use case.  The Databricks Lineage Graph is nice but only zooms out enough for the most basic lineages.  We have lineages/data flows with hundreds of tables.  I'd like more flexibility on showing the entire flow in one screen and expo...

  • 1 kudos
Henrik_
by New Contributor III
  • 2351 Views
  • 9 replies
  • 5 kudos

Can use graphframes DBR 14.3

I get the following error when trying to run GraphFrame on DBR 14.3. Anyone has an idea of how I can solve this?  """import pyspark.sql.functions as Ffrom graphframes import GraphFrame vertices = spark.createDataFrame([    ("a", "Alice", 34),    ("b"...

  • 2351 Views
  • 9 replies
  • 5 kudos
Latest Reply
Snag
New Contributor II
  • 5 kudos

Hi Guys, even with the classic compute with version 14.3LTS, I'm getting the _sc error mentioned above, pls let me know if you're able to fix the issue

  • 5 kudos
8 More Replies
hardeeksharma
by New Contributor II
  • 148 Views
  • 1 replies
  • 1 kudos

Data ingestion issue with THAI data

I have a use case where my file has data in Thai characters. The source location is azure blob storage, here files are stored in text format. I am using the following code to read the file, but when I am downloading the data from catalog it encloses ...

  • 148 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Do the quotes exist in original data?

  • 1 kudos
pradeepvatsvk
by New Contributor III
  • 414 Views
  • 6 replies
  • 1 kudos

Too many small files from updates

Hi ,I am updating some data into a delta table , each time I  only need to  update  one row due to which after every update statement it is creating new file, How do I tackle this issue , it doesn't make sense to run optimize command after every upda...

  • 414 Views
  • 6 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

If you performing 100s of update operations on the delta table, you can opt to run an optimize operation after a batch of 100 updates. There should be no significant performance issue up to 100 such updates

  • 1 kudos
5 More Replies
erigaud
by Honored Contributor
  • 7660 Views
  • 2 replies
  • 2 kudos

Dynamically specify pivot column in SQL

Hello everyone !I am looking for a way to dynamically specify pivot columns in a SQL query, so it can be used in a view. However we don't want to hard code the values that need to become columns, and would rather extract it from another table.I've se...

  • 7660 Views
  • 2 replies
  • 2 kudos
Latest Reply
Wikram
New Contributor II
  • 2 kudos

Did u find any answer? Can u share your thoughts on meeting this usecase?

  • 2 kudos
1 More Replies
dc-rnc
by New Contributor II
  • 320 Views
  • 1 replies
  • 1 kudos

Resolved! How to deploy an asset bundle job that triggers another one

Hello everyone.Using DAB, is there a dynamic value reference or something equivalent to get a job_id to be used inside the YAML definition of another Databricks job? I'd like to trigger that job from another one, but if I'm using a CI/CD pipeline to ...

  • 320 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

resources: jobs: my-first-job: name: my-first-job tasks: - task_key: my-first-job-task new_cluster: spark_version: "13.3.x-scala2.12" node_type_id: "i3.xlarge" num_workers: 2 ...

  • 1 kudos
eballinger
by Contributor
  • 381 Views
  • 2 replies
  • 1 kudos

Resolved! How to grant all tables in schema except 1

Hi Guys, I am trying to grant all tables in a schema to a user group in databricks. The only catch is that there is one table I do not want granted. I currently am granting schema access to the group so the benefit is that as tables are add in the fu...

  • 381 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

What you are facing is because of inheritance.  https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/upgrade-privilege-model.html I would say this is by design, but please feel free to suggest it as an idea here - https://do...

  • 1 kudos
1 More Replies
jspehar
by New Contributor
  • 221 Views
  • 2 replies
  • 0 kudos

JDBC Error Trying to Connect erwin Data Modeler to Databricks

I am trying to connect erWin Data Modeler to Databricks to reverse engineer a physical data model. I am trying to connect manually per erWin and Databricks instructions, but I am getting the following error[Databricks][DatabricksJDBCDriver][500593] C...

  • 221 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

I hope you referred https://docs.databricks.com/en/partners/data-governance/erwin.html   It is also possible, it can be a library issue, hope you are using the Databricks JDBC driver.

  • 0 kudos
1 More Replies
AlexCancioBedon
by New Contributor II
  • 166 Views
  • 1 replies
  • 1 kudos
  • 166 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika_
Databricks Employee
  • 1 kudos

Congratulations, @AlexCancioBedon! This is a great milestone that showcases your expertise in Data engineering with Databricks. We’d love to have you share your insights with the community, whether by sharing best practices or helping others. Keep up...

  • 1 kudos
Sayeed
by New Contributor II
  • 242 Views
  • 1 replies
  • 0 kudos

Missing dbc for databricks associate engineer certification

Hi ,I am unable to find the dbc for https://customer-academy.databricks.com/learn/courses/2963/data-ingestion-with-delta-lake/lessons/25622/demo-set-up-and-load-delta-tables or anything related to databricks associate engineer certification.Any help ...

Sayeed_0-1738320038441.png
  • 242 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika_
Databricks Employee
  • 0 kudos

Hello @Sayeed! I see that you're currently going through a self-paced course, which does not include hands-on labs (dbc files). To access the labs, you can either purchase the ILT course, which will grant you access to the labs for 7 days, or get the...

  • 0 kudos
SaraCorralLou
by New Contributor III
  • 17071 Views
  • 3 replies
  • 2 kudos

Resolved! Differences between lit(None) or lit(None).cast('string')

I want to define a column with null values in my dataframe using pyspark. This column will later be used for other calculations.What is the difference between creating it in these two different ways?df.withColumn("New_Column", lit(None))df.withColumn...

  • 17071 Views
  • 3 replies
  • 2 kudos
Latest Reply
shadowinc
New Contributor III
  • 2 kudos

For me df.withColumn("New_Column", lit(None).cast(StringType())) this didn't work.I used this instead df.withColumn("New_Column", lit(null).cast(StringType))  

  • 2 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels