cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RiyazAliM
by Honored Contributor
  • 2097 Views
  • 1 replies
  • 0 kudos

Unable to create a record_id column via DLT - Autoloader

Hi Community,I'm trying to load data from the landing zone to the bronze layer via DLT- Autoloader, I want to add a column record_id to the bronze table while I fetch my data. I'm also using file arrival trigger in the workflow to update my table inc...

  • 2097 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hey @Retired_mod  - could you or any body from the community team help me here, please? I've been stuck since quite some time now.

  • 0 kudos
Aidonis
by New Contributor III
  • 25593 Views
  • 3 replies
  • 2 kudos

Resolved! Flatten Deep Nested Struct

Hi All,I have a deeply nested spark dataframe struct something similar to below |-- id: integer (nullable = true) |-- lower: struct (nullable = true) | |-- field_a: integer (nullable = true) | |-- upper: struct (containsNull = true) | | ...

  • 25593 Views
  • 3 replies
  • 2 kudos
Latest Reply
Praveen-bpk21
New Contributor II
  • 2 kudos

@Aidonis You can try this as well:flatten-spark-dataframe · PyPIThis also allows for specific level of flattening.

  • 2 kudos
2 More Replies
SPres
by New Contributor
  • 1896 Views
  • 1 replies
  • 0 kudos

Passing Parameters from Azure Synapse

Hey Community!Just curious if anyone has tried using Azure Synapse for orchestration and passing parameters from Synapse to a Databricks Notebook. My team is testing out Databricks, and I'm replacing Synapse Notebooks with Databricks Notebooks, but I...

  • 1896 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 0 kudos

Hi @SPres You can definitely pass these parameters to databricks notebook also.Please refer below docs - Run a Databricks Notebook with the activity - Azure Data Factory | Microsoft Learn

  • 0 kudos
Chengzhu
by New Contributor
  • 1076 Views
  • 0 replies
  • 0 kudos

Databricks Model Registry Notification

Hi community,Currently, I am training models on databricks cluster and use mlflow to log and register models. My goal is to send notification to me when a new version of registered model happens (if the new run achieves some model performance baselin...

Screenshot 2024-04-17 at 1.14.11 PM.png Screenshot 2024-04-17 at 1.13.14 PM.png
  • 1076 Views
  • 0 replies
  • 0 kudos
dilkushpatel
by New Contributor II
  • 3050 Views
  • 2 replies
  • 0 kudos

Databricks connecting SQL Azure DW - Confused between Polybase and Copy Into

I see two articles on databricks documentationshttps://docs.databricks.com/en/archive/azure/synapse-polybase.html#language-pythonhttps://docs.databricks.com/en/connect/external-systems/synapse-analytics.html#service-principal Polybase one is legacy o...

Data Engineering
azure
Copy
help
Polybase
Synapse
  • 3050 Views
  • 2 replies
  • 0 kudos
Abhi0607
by New Contributor II
  • 2223 Views
  • 2 replies
  • 0 kudos

Variables passed from ADF to Databricks Notebook Try-Catch are not accessible

Dear Members,I need your help in below scenario.I am passing few parameters from ADF pipeline to Databricks notebook.If I execute ADF pipeline to run my databricks notebook and use these variables as is in my code (python) then it works fine.But as s...

  • 2223 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 0 kudos

Hi @Abhi0607  Can you please help me to find if you are taking or defining these parameter value outside try catch or inside it ?

  • 0 kudos
1 More Replies
fuselessmatt
by Contributor
  • 9901 Views
  • 4 replies
  • 1 kudos

Accidentally removing the service principal that owns the view seems to put the Unity Catalog in an illegal state. Can you fix this?

I renamed our service principal in Terraform, which forces a replacement where the old service principal is removed and a new principal with the same permission is recreated. The Terraform succeeds to apply, but when I try to run dbt that creates tab...

  • 9901 Views
  • 4 replies
  • 1 kudos
Latest Reply
fuselessmatt
Contributor
  • 1 kudos

This is also true for removing groups before unassigning them (removing and unassigning in Terraform)│ Error: cannot update grants: Could not find principal with name <My Group Name>

  • 1 kudos
3 More Replies
JeanT
by New Contributor
  • 5462 Views
  • 1 replies
  • 0 kudos

Help with Identifying and Parsing Varying Date Formats in Spark DataFrame

 Hello Spark Community,I'm encountering an issue with parsing dates in a Spark DataFrame due to inconsistent date formats across my datasets. I need to identify and parse dates correctly, irrespective of their format. Below is a brief outline of my p...

  • 5462 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

How about not specifying the format?  This will already match common formats.When you still have nulls, you can use your list with known exotic formats.Another solution is working with regular expressions.  looking for 2 digit numbers not larger than...

  • 0 kudos
AnkithP
by New Contributor
  • 3495 Views
  • 1 replies
  • 1 kudos

Infer schema eliminating leading zeros.

Upon reading a CSV file with schema inference enabled, I've noticed that a column originally designated as string datatype contains numeric values with leading zeros. However, upon reading the data to Pyspark data frame, it undergoes automatic conver...

  • 3495 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

if you set .option("inferSchema", "false") all columns will be read as string.You will have to cast all the other columns to their appropriate type though.  So passing a schema seems easier to me.

  • 1 kudos
PrebenOlsen
by New Contributor III
  • 3102 Views
  • 2 replies
  • 0 kudos

Job stuck while utilizing all workers

Hi!Started a job yesterday. It was iterating over data, 2-months at a time, and writing to a table. It was successfully doing this for 4 out of 6 time periods. The 5th time period however, got stuck, 5 hours in.I can find one Failed Stage that reads ...

Data Engineering
job failed
Job froze
need help
  • 3102 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

As Spark is lazy evaluated, using only small clusters for read and large ones for writes is not something that will happen.The data is read when you apply an action (write f.e.).That being said:  I have no knowledge of a bug in Databricks on clusters...

  • 0 kudos
1 More Replies
laurenskuiper97
by New Contributor
  • 2819 Views
  • 1 replies
  • 0 kudos

JDBC / SSH-tunnel to connect to PostgreSQL not working on multi-node clusters

Hi everybody,I'm trying to setup a connection between Databricks' Notebooks and an external PostgreSQL database through a SSH-tunnel. On a single-node cluster, this is working perfectly fine. However, when this is ran on a multi-node cluster, this co...

Data Engineering
clusters
JDBC
spark
SSH
  • 2819 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I doubt it is possible.The driver runs the program, and sends tasks to the executors.  But since creating the ssh tunnel is no spark task, I don't think it will be established on any executor.

  • 0 kudos
Jotav93
by New Contributor II
  • 2647 Views
  • 2 replies
  • 1 kudos

Move a delta table from a non UC metastore to a UC metastore preserving history

Hi, I am using Azure databricks and we recently enabled UC in our workspace. We have some tables in our non UC metastore that we want to move to a UC enabled metastore. Is there any way we can move these tables without loosing the delta table history...

Data Engineering
delta
unity
  • 2647 Views
  • 2 replies
  • 1 kudos
Latest Reply
ThomazRossito
Contributor
  • 1 kudos

Hello,It is possible to have the expected result with dbutils.fs.cp("Origin location", "Destination location", True) and then create the table with the LOCATION of the Destination locationHope this helps

  • 1 kudos
1 More Replies
Dp15
by Contributor
  • 1831 Views
  • 1 replies
  • 1 kudos

Using UDF in an insert command

Hi,I am trying to use a UDF to get the last day of the month and use the boolean result of the function in an insert command. Please find herewith the function and the my query.function:import calendarfrom datetime import datetime, date, timedeltadef...

  • 1831 Views
  • 1 replies
  • 1 kudos
Latest Reply
Dp15
Contributor
  • 1 kudos

Thank you @Retired_mod for your detailed explanation

  • 1 kudos
Kroy
by Contributor
  • 15775 Views
  • 7 replies
  • 1 kudos

Resolved! What is difference between streaming and streaming live table

Can anyone explain in layman what is difference between Streaming and streaming live table ?

  • 15775 Views
  • 7 replies
  • 1 kudos
Latest Reply
CharlesReily
New Contributor III
  • 1 kudos

Streaming, in a broad sense, refers to the continuous flow of data over a network. It allows you to watch or listen to content in real-time without having to download the entire file first.  A "Streaming Live Table" might refer to a specific type of ...

  • 1 kudos
6 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels