Data Engineering

Forum Posts

Sorted by:

by Aidonis • New Contributor III

01-18-2023 12:39:30 AM

5726 Views
3 replies
2 kudos

Resolved! Flatten Deep Nested Struct

Data Engineering

5726 Views
3 replies
2 kudos

01-18-2023 12:39:30 AM

View Replies

Latest Reply

Praveen-bpk21
Visitor

2 hours ago

2 kudos

@Aidonis You can try this as well:flatten-spark-dataframe · PyPIThis also allows for specific level of flattening.

2 kudos

2 hours ago

2 More Replies

by SPres • Visitor

8 hours ago

58 Views
1 replies
0 kudos

Passing Parameters from Azure Synapse

Hey Community!Just curious if anyone has tried using Azure Synapse for orchestration and passing parameters from Synapse to a Databricks Notebook. My team is testing out Databricks, and I'm replacing Synapse Notebooks with Databricks Notebooks, but I...

Data Engineering

58 Views
1 replies
0 kudos

8 hours ago

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

2 hours ago

0 kudos

Hi @SPres You can definitely pass these parameters to databricks notebook also.Please refer below docs - Run a Databricks Notebook with the activity - Azure Data Factory | Microsoft Learn

0 kudos

2 hours ago

by Chengzhu • Visitor

9 hours ago

44 Views
0 replies
0 kudos

Databricks Model Registry Notification

Hi community,Currently, I am training models on databricks cluster and use mlflow to log and register models. My goal is to send notification to me when a new version of registered model happens (if the new run achieves some model performance baselin...

Data Engineering

44 Views
0 replies
0 kudos

9 hours ago

by dilkushpatel • New Contributor

yesterday

114 Views
4 replies
0 kudos

Databricks connecting SQL Azure DW - Confused between Polybase and Copy Into

I see two articles on databricks documentationshttps://docs.databricks.com/en/archive/azure/synapse-polybase.html#language-pythonhttps://docs.databricks.com/en/connect/external-systems/synapse-analytics.html#service-principal Polybase one is legacy o...

Data Engineering

azure

Copy

help

Polybase

Synapse

114 Views
4 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

17 hours ago

0 kudos

Hi @dilkushpatel, Thank you for sharing your confusion regarding PolyBase and the COPY INTO command in Databricks when working with Azure Synapse. PolyBase (Legacy): PolyBase was previously used for data loading and unloading operations in Azure...

0 kudos

17 hours ago

3 More Replies

by Abhi0607 • Visitor

15 hours ago

105 Views
2 replies
0 kudos

Variables passed from ADF to Databricks Notebook Try-Catch are not accessible

Dear Members,I need your help in below scenario.I am passing few parameters from ADF pipeline to Databricks notebook.If I execute ADF pipeline to run my databricks notebook and use these variables as is in my code (python) then it works fine.But as s...

Data Engineering

105 Views
2 replies
0 kudos

15 hours ago

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

15 hours ago

0 kudos

Hi @Abhi0607 Can you please help me to find if you are taking or defining these parameter value outside try catch or inside it ?

0 kudos

15 hours ago

1 More Replies

by fuselessmatt • Contributor

03-27-2023 9:06:18 AM

1356 Views
4 replies
1 kudos

Accidentally removing the service principal that owns the view seems to put the Unity Catalog in an illegal state. Can you fix this?

I renamed our service principal in Terraform, which forces a replacement where the old service principal is removed and a new principal with the same permission is recreated. The Terraform succeeds to apply, but when I try to run dbt that creates tab...

Data Engineering

1356 Views
4 replies
1 kudos

03-27-2023 9:06:18 AM

View Replies

Latest Reply

fuselessmatt
Contributor

04-19-2023 1:45:18 AM

1 kudos

This is also true for removing groups before unassigning them (removing and unassigning in Terraform)│ Error: cannot update grants: Could not find principal with name <My Group Name>

1 kudos

04-19-2023 1:45:18 AM

3 More Replies

by data-grassroots • New Contributor

yesterday

100 Views
4 replies
0 kudos

Ingesting Files - Same file name, modified content

We have a data feed with files whose filenames stays the same but the contents change over time (brand_a.csv, brand_b.csv, brand_c.csv ....).Copy Into seems to ignore the files when they change.If we set the Force flag to true and run it, we end up w...

Data Engineering

100 Views
4 replies
0 kudos

yesterday

View Replies

Latest Reply

data-grassroots
New Contributor

14 hours ago

0 kudos

That's the question, short of treating the initial copy into as a temp table and executing a merge statement after it into another table where we can do the add, update type operations is there another option - with COPY INTO or AUTOLOADER or DLT - t...

0 kudos

14 hours ago

3 More Replies

by my_super_name • New Contributor

Monday

58 Views
1 replies
0 kudos

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Hello,I'm using the auto loader to stream a table of data and have added schema hints to specify field values.I've observed that when my initial data file is missing fields specified in the schema hint,the auto loader correctly identifies this and ad...

Data Engineering

58 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

Kaniz
Community Manager

15 hours ago

0 kudos

Hi @my_super_name, Default Schema Inference: By default, Auto Loader schema inference aims to avoid schema evolution issues due to type mismatches. For formats like JSON, CSV, and XML that don’t encode data types explicitly, Auto Loader infers a...

0 kudos

15 hours ago

by manish1987c • New Contributor II

Monday

76 Views
1 replies
0 kudos

calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster

I want to confirm if this understanding is correct ???To calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster with the given configuration, we need to consider the number of executors that can run on each node a...

Data Engineering

76 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

Kaniz
Community Manager

15 hours ago

0 kudos

Hi @manish1987c, Your understanding is almost correct! Node Configuration: You have 10 nodes in your Databricks PySpark cluster.Each node has 16 CPU cores and 64 GB RAM. Executor Size: Each executor requires 5 CPU cores and 20 GB RAM.Additional...

0 kudos

15 hours ago

by Jennifer • New Contributor III

Monday

49 Views
1 replies
0 kudos

Optimization failed for timestampNtz

We have a table using timestampNtz type for timestamp, which is also a cluster key for this table using liquid clustering. I ran OPTIMIZE <table-name>, it failed with errorUnsupported datatype 'TimestampNTZType' But the failed optmization also broke ...

Data Engineering

49 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

Kaniz
Community Manager

15 hours ago

0 kudos

Hi @Jennifer, Since TimestampNTZType is not currently supported for optimization, you can try a workaround by converting the timestamp column to a different data type before running the OPTIMIZE command.For example, you could convert the timestampNt...

0 kudos

15 hours ago

by vpacik • New Contributor

Monday

97 Views
1 replies
0 kudos

Databricks-connect OpenSSL Handshake failed on WSL2

When trying to setup databricks-connect on WSL2 using 13.3 cluster, I receive the following error regarding OpenSSL CERTIFICATE_ERIFY_FAILED.The authentication is done via SPARK_REMOTE env. variable. E0415 11:24:26.646129568 142172 ssl_transport_sec...

Data Engineering

97 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

Kaniz
Community Manager

15 hours ago

0 kudos

Hi @jp_allard, One approach to resolve this is to disable SSL certificate verification. However, keep in mind that this approach may compromise security.In your Databricks configuration file (usually located at ~/.databrickscfg), add the following l...

0 kudos

15 hours ago

by pernilak • New Contributor III

Monday

60 Views
1 replies
0 kudos

Working with Unity Catalog from VSCode using the Databricks Extension

Hi!As suggested by Databricks, we are working with Databricks from VSCode using Databricks bundles for our deployment and using the VSCode Databricks Extension and Databricks Connect during development.However, there are some limitations that we are ...

Data Engineering

60 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

Kaniz
Community Manager

15 hours ago

0 kudos

Hi @pernilak, It’s great that you’re using Databricks with Visual Studio Code (VSCode) for your development workflow! Let’s address the limitations you’ve encountered when working with files from Unity Catalog using native Python. When running Python...

0 kudos

15 hours ago

by Nastia • Visitor

15 hours ago

24 Views
0 replies
0 kudos

I am getting NoneType error when running a query from API on cluster

When I am running a query on Databricks itself from notebook, it is running fine and giving me results. But the same query when executed from FastAPI (Python, using databricks library) is giving me "TypeError: 'NoneType' object is not iterable".I can...

Data Engineering

24 Views
0 replies
0 kudos

15 hours ago

by jp_allard • New Contributor

Monday

54 Views
1 replies
0 kudos

Selective Overwrite to a Unity Catalog Table

I have been able to perform a selective overwrite using replace Where to a hive_metastore table, but when I use the same code for the same table in a unity catalog, no data is written.Has anyone else had this issue or is there common mistakes that ar...

Data Engineering

54 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

Kaniz
Community Manager

15 hours ago

0 kudos

Hi @jp_allard , The Unity Catalog is a newer feature in Databricks, designed to replace the traditional Hive Metastore.When transitioning from Hive Metastore to Unity Catalog, there might be differences in behavior due to underlying architectural ch...

0 kudos

15 hours ago

by SG • New Contributor II

07-20-2023 1:01:53 PM

444 Views
1 replies
1 kudos

Customize job run name when running jobs from adf

Hi guys, i am running my Databricks jobs on a cluster job from azure datafactory using a databricks Python activity When I monitor my jobs in workflow-> job runs . I see that the run name is a concatenation of adf pipeline name , Databricks python ac...

Data Engineering

444 Views
1 replies
1 kudos

07-20-2023 1:01:53 PM

View Replies

Latest Reply

databricksdev
New Contributor II

15 hours ago

1 kudos

Hi Hamza,did u got any solution for this issueThanks

1 kudos

15 hours ago

User

Count

1599

734

343

284

246

Databricks

Forum Posts

Resolved! Flatten Deep Nested Struct

Passing Parameters from Azure Synapse

Databricks Model Registry Notification

Databricks connecting SQL Azure DW - Confused between Polybase and Copy Into

Variables passed from ADF to Databricks Notebook Try-Catch are not accessible

Accidentally removing the service principal that owns the view seems to put the Unity Catalog in an illegal state. Can you fix this?

Ingesting Files - Same file name, modified content

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster

Optimization failed for timestampNtz

Databricks-connect OpenSSL Handshake failed on WSL2

Working with Unity Catalog from VSCode using the Databricks Extension

I am getting NoneType error when running a query from API on cluster

Selective Overwrite to a Unity Catalog Table

Customize job run name when running jobs from adf

Cluster pools

What is difference between streaming and streaming...

Liquid Clustering With Merge

Accessing ADLS Gen 2 Raw Files with UC ?

How to enable CDF when saveAsTable from pyspark co...