cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RiyazAli
by Valued Contributor II
  • 1366 Views
  • 1 replies
  • 0 kudos

Unable to create a record_id column via DLT - Autoloader

Hi Community,I'm trying to load data from the landing zone to the bronze layer via DLT- Autoloader, I want to add a column record_id to the bronze table while I fetch my data. I'm also using file arrival trigger in the workflow to update my table inc...

  • 1366 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAli
Valued Contributor II
  • 0 kudos

Hey @Retired_mod  - could you or any body from the community team help me here, please? I've been stuck since quite some time now.

  • 0 kudos
Aidonis
by New Contributor III
  • 20217 Views
  • 3 replies
  • 2 kudos

Resolved! Flatten Deep Nested Struct

Hi All,I have a deeply nested spark dataframe struct something similar to below |-- id: integer (nullable = true) |-- lower: struct (nullable = true) | |-- field_a: integer (nullable = true) | |-- upper: struct (containsNull = true) | | ...

  • 20217 Views
  • 3 replies
  • 2 kudos
Latest Reply
Praveen-bpk21
New Contributor II
  • 2 kudos

@Aidonis You can try this as well:flatten-spark-dataframe · PyPIThis also allows for specific level of flattening.

  • 2 kudos
2 More Replies
SPres
by New Contributor
  • 1325 Views
  • 1 replies
  • 0 kudos

Passing Parameters from Azure Synapse

Hey Community!Just curious if anyone has tried using Azure Synapse for orchestration and passing parameters from Synapse to a Databricks Notebook. My team is testing out Databricks, and I'm replacing Synapse Notebooks with Databricks Notebooks, but I...

  • 1325 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @SPres You can definitely pass these parameters to databricks notebook also.Please refer below docs - Run a Databricks Notebook with the activity - Azure Data Factory | Microsoft Learn

  • 0 kudos
Chengzhu
by New Contributor
  • 805 Views
  • 0 replies
  • 0 kudos

Databricks Model Registry Notification

Hi community,Currently, I am training models on databricks cluster and use mlflow to log and register models. My goal is to send notification to me when a new version of registered model happens (if the new run achieves some model performance baselin...

Screenshot 2024-04-17 at 1.14.11 PM.png Screenshot 2024-04-17 at 1.13.14 PM.png
  • 805 Views
  • 0 replies
  • 0 kudos
dilkushpatel
by New Contributor II
  • 2092 Views
  • 2 replies
  • 0 kudos

Databricks connecting SQL Azure DW - Confused between Polybase and Copy Into

I see two articles on databricks documentationshttps://docs.databricks.com/en/archive/azure/synapse-polybase.html#language-pythonhttps://docs.databricks.com/en/connect/external-systems/synapse-analytics.html#service-principal Polybase one is legacy o...

Data Engineering
azure
Copy
help
Polybase
Synapse
  • 2092 Views
  • 2 replies
  • 0 kudos
Abhi0607
by New Contributor II
  • 1470 Views
  • 2 replies
  • 0 kudos

Variables passed from ADF to Databricks Notebook Try-Catch are not accessible

Dear Members,I need your help in below scenario.I am passing few parameters from ADF pipeline to Databricks notebook.If I execute ADF pipeline to run my databricks notebook and use these variables as is in my code (python) then it works fine.But as s...

  • 1470 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Abhi0607  Can you please help me to find if you are taking or defining these parameter value outside try catch or inside it ?

  • 0 kudos
1 More Replies
fuselessmatt
by Contributor
  • 8543 Views
  • 4 replies
  • 1 kudos

Accidentally removing the service principal that owns the view seems to put the Unity Catalog in an illegal state. Can you fix this?

I renamed our service principal in Terraform, which forces a replacement where the old service principal is removed and a new principal with the same permission is recreated. The Terraform succeeds to apply, but when I try to run dbt that creates tab...

  • 8543 Views
  • 4 replies
  • 1 kudos
Latest Reply
fuselessmatt
Contributor
  • 1 kudos

This is also true for removing groups before unassigning them (removing and unassigning in Terraform)│ Error: cannot update grants: Could not find principal with name <My Group Name>

  • 1 kudos
3 More Replies
JeanT
by New Contributor
  • 2984 Views
  • 1 replies
  • 0 kudos

Help with Identifying and Parsing Varying Date Formats in Spark DataFrame

 Hello Spark Community,I'm encountering an issue with parsing dates in a Spark DataFrame due to inconsistent date formats across my datasets. I need to identify and parse dates correctly, irrespective of their format. Below is a brief outline of my p...

  • 2984 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

How about not specifying the format?  This will already match common formats.When you still have nulls, you can use your list with known exotic formats.Another solution is working with regular expressions.  looking for 2 digit numbers not larger than...

  • 0 kudos
AnkithP
by New Contributor
  • 2526 Views
  • 1 replies
  • 1 kudos

Infer schema eliminating leading zeros.

Upon reading a CSV file with schema inference enabled, I've noticed that a column originally designated as string datatype contains numeric values with leading zeros. However, upon reading the data to Pyspark data frame, it undergoes automatic conver...

  • 2526 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

if you set .option("inferSchema", "false") all columns will be read as string.You will have to cast all the other columns to their appropriate type though.  So passing a schema seems easier to me.

  • 1 kudos
PrebenOlsen
by New Contributor III
  • 2018 Views
  • 2 replies
  • 0 kudos

Job stuck while utilizing all workers

Hi!Started a job yesterday. It was iterating over data, 2-months at a time, and writing to a table. It was successfully doing this for 4 out of 6 time periods. The 5th time period however, got stuck, 5 hours in.I can find one Failed Stage that reads ...

Data Engineering
job failed
Job froze
need help
  • 2018 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

As Spark is lazy evaluated, using only small clusters for read and large ones for writes is not something that will happen.The data is read when you apply an action (write f.e.).That being said:  I have no knowledge of a bug in Databricks on clusters...

  • 0 kudos
1 More Replies
laurenskuiper97
by New Contributor
  • 1599 Views
  • 1 replies
  • 0 kudos

JDBC / SSH-tunnel to connect to PostgreSQL not working on multi-node clusters

Hi everybody,I'm trying to setup a connection between Databricks' Notebooks and an external PostgreSQL database through a SSH-tunnel. On a single-node cluster, this is working perfectly fine. However, when this is ran on a multi-node cluster, this co...

Data Engineering
clusters
JDBC
spark
SSH
  • 1599 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I doubt it is possible.The driver runs the program, and sends tasks to the executors.  But since creating the ssh tunnel is no spark task, I don't think it will be established on any executor.

  • 0 kudos
Jotav93
by New Contributor II
  • 1675 Views
  • 2 replies
  • 1 kudos

Move a delta table from a non UC metastore to a UC metastore preserving history

Hi, I am using Azure databricks and we recently enabled UC in our workspace. We have some tables in our non UC metastore that we want to move to a UC enabled metastore. Is there any way we can move these tables without loosing the delta table history...

Data Engineering
delta
unity
  • 1675 Views
  • 2 replies
  • 1 kudos
Latest Reply
ThomazRossito
Contributor
  • 1 kudos

Hello,It is possible to have the expected result with dbutils.fs.cp("Origin location", "Destination location", True) and then create the table with the LOCATION of the Destination locationHope this helps

  • 1 kudos
1 More Replies
Dp15
by Contributor
  • 1321 Views
  • 1 replies
  • 1 kudos

Using UDF in an insert command

Hi,I am trying to use a UDF to get the last day of the month and use the boolean result of the function in an insert command. Please find herewith the function and the my query.function:import calendarfrom datetime import datetime, date, timedeltadef...

  • 1321 Views
  • 1 replies
  • 1 kudos
Latest Reply
Dp15
Contributor
  • 1 kudos

Thank you @Retired_mod for your detailed explanation

  • 1 kudos
Kroy
by Contributor
  • 10388 Views
  • 7 replies
  • 1 kudos

Resolved! What is difference between streaming and streaming live table

Can anyone explain in layman what is difference between Streaming and streaming live table ?

  • 10388 Views
  • 7 replies
  • 1 kudos
Latest Reply
CharlesReily
New Contributor III
  • 1 kudos

Streaming, in a broad sense, refers to the continuous flow of data over a network. It allows you to watch or listen to content in real-time without having to download the entire file first.  A "Streaming Live Table" might refer to a specific type of ...

  • 1 kudos
6 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels