cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

my_super_name
by New Contributor II
  • 1036 Views
  • 2 replies
  • 3 kudos

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Hello,I'm using the auto loader to stream a table of data and have added schema hints to specify field values.I've observed that when my initial data file is missing fields specified in the schema hint,the auto loader correctly identifies this and ad...

  • 1036 Views
  • 2 replies
  • 3 kudos
Latest Reply
my_super_name
New Contributor II
  • 3 kudos

Hi @Kaniz_Fatma Thanks for your help!Your solution works for the initial issue,and I've implemented it first in my code.but it creates a other problem.When we explicitly define the struct hint as 'bbb STRUCT<ccc: INT>',it works until someone adds mor...

  • 3 kudos
1 More Replies
RiyazAli
by Valued Contributor
  • 1031 Views
  • 1 replies
  • 0 kudos

Unable to create a record_id column via DLT - Autoloader

Hi Community,I'm trying to load data from the landing zone to the bronze layer via DLT- Autoloader, I want to add a column record_id to the bronze table while I fetch my data. I'm also using file arrival trigger in the workflow to update my table inc...

  • 1031 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 0 kudos

Hey @Kaniz_Fatma  - could you or any body from the community team help me here, please? I've been stuck since quite some time now.

  • 0 kudos
Phani1
by Valued Contributor II
  • 745 Views
  • 1 replies
  • 0 kudos

Code Review tools

Could you kindly recommend any Code Review tools that would be suitable for our Databricks tech stack?

Data Engineering
code review
  • 745 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Phani1, When it comes to code review tools for your Databricks tech stack, here are some options you might find useful: Built-in Interactive Debugger in Databricks Notebook: The interactive debugger is available exclusively for Python code withi...

  • 0 kudos
Aidonis
by New Contributor III
  • 14734 Views
  • 3 replies
  • 2 kudos

Resolved! Flatten Deep Nested Struct

Hi All,I have a deeply nested spark dataframe struct something similar to below |-- id: integer (nullable = true) |-- lower: struct (nullable = true) | |-- field_a: integer (nullable = true) | |-- upper: struct (containsNull = true) | | ...

  • 14734 Views
  • 3 replies
  • 2 kudos
Latest Reply
Praveen-bpk21
New Contributor II
  • 2 kudos

@Aidonis You can try this as well:flatten-spark-dataframe · PyPIThis also allows for specific level of flattening.

  • 2 kudos
2 More Replies
SPres
by New Contributor
  • 843 Views
  • 1 replies
  • 0 kudos

Passing Parameters from Azure Synapse

Hey Community!Just curious if anyone has tried using Azure Synapse for orchestration and passing parameters from Synapse to a Databricks Notebook. My team is testing out Databricks, and I'm replacing Synapse Notebooks with Databricks Notebooks, but I...

  • 843 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @SPres You can definitely pass these parameters to databricks notebook also.Please refer below docs - Run a Databricks Notebook with the activity - Azure Data Factory | Microsoft Learn

  • 0 kudos
dilkushpatel
by New Contributor II
  • 1435 Views
  • 4 replies
  • 0 kudos

Databricks connecting SQL Azure DW - Confused between Polybase and Copy Into

I see two articles on databricks documentationshttps://docs.databricks.com/en/archive/azure/synapse-polybase.html#language-pythonhttps://docs.databricks.com/en/connect/external-systems/synapse-analytics.html#service-principal Polybase one is legacy o...

Data Engineering
azure
Copy
help
Polybase
Synapse
  • 1435 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @dilkushpatel, Thank you for sharing your confusion regarding PolyBase and the COPY INTO command in Databricks when working with Azure Synapse.  PolyBase (Legacy): PolyBase was previously used for data loading and unloading operations in Azure...

  • 0 kudos
3 More Replies
Abhi0607
by New Contributor II
  • 967 Views
  • 2 replies
  • 0 kudos

Variables passed from ADF to Databricks Notebook Try-Catch are not accessible

Dear Members,I need your help in below scenario.I am passing few parameters from ADF pipeline to Databricks notebook.If I execute ADF pipeline to run my databricks notebook and use these variables as is in my code (python) then it works fine.But as s...

  • 967 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Abhi0607  Can you please help me to find if you are taking or defining these parameter value outside try catch or inside it ?

  • 0 kudos
1 More Replies
fuselessmatt
by Contributor
  • 7085 Views
  • 4 replies
  • 1 kudos

Accidentally removing the service principal that owns the view seems to put the Unity Catalog in an illegal state. Can you fix this?

I renamed our service principal in Terraform, which forces a replacement where the old service principal is removed and a new principal with the same permission is recreated. The Terraform succeeds to apply, but when I try to run dbt that creates tab...

  • 7085 Views
  • 4 replies
  • 1 kudos
Latest Reply
fuselessmatt
Contributor
  • 1 kudos

This is also true for removing groups before unassigning them (removing and unassigning in Terraform)│ Error: cannot update grants: Could not find principal with name <My Group Name>

  • 1 kudos
3 More Replies
manish1987c
by New Contributor III
  • 2586 Views
  • 1 replies
  • 2 kudos

calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster

I want to confirm if this understanding is correct ???To calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster with the given configuration, we need to consider the number of executors that can run on each node a...

  • 2586 Views
  • 1 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @manish1987c, Your understanding is almost correct!  Node Configuration: You have 10 nodes in your Databricks PySpark cluster.Each node has 16 CPU cores and 64 GB RAM. Executor Size: Each executor requires 5 CPU cores and 20 GB RAM.Additional...

  • 2 kudos
Jennifer
by New Contributor III
  • 406 Views
  • 1 replies
  • 0 kudos

Optimization failed for timestampNtz

We have a table using timestampNtz type for timestamp, which is also a cluster key for this table using liquid clustering. I ran OPTIMIZE <table-name>, it failed with errorUnsupported datatype 'TimestampNTZType' But the failed optmization also broke ...

  • 406 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Jennifer,  Since TimestampNTZType is not currently supported for optimization, you can try a workaround by converting the timestamp column to a different data type before running the OPTIMIZE command.For example, you could convert the timestampNt...

  • 0 kudos
vpacik
by New Contributor
  • 1087 Views
  • 1 replies
  • 0 kudos

Databricks-connect OpenSSL Handshake failed on WSL2

When trying to setup databricks-connect on WSL2 using 13.3 cluster, I receive the following error regarding OpenSSL CERTIFICATE_ERIFY_FAILED.The authentication is done via SPARK_REMOTE env. variable. E0415 11:24:26.646129568 142172 ssl_transport_sec...

  • 1087 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @jp_allard,  One approach to resolve this is to disable SSL certificate verification. However, keep in mind that this approach may compromise security.In your Databricks configuration file (usually located at ~/.databrickscfg), add the following l...

  • 0 kudos
jp_allard
by New Contributor
  • 676 Views
  • 1 replies
  • 0 kudos

Selective Overwrite to a Unity Catalog Table

I have been able to perform a selective overwrite using replace Where to a hive_metastore table, but when I use the same code for the same table in a unity catalog, no data is written.Has anyone else had this issue or is there common mistakes that ar...

  • 676 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @jp_allard ,  The Unity Catalog is a newer feature in Databricks, designed to replace the traditional Hive Metastore.When transitioning from Hive Metastore to Unity Catalog, there might be differences in behavior due to underlying architectural ch...

  • 0 kudos
CDICSteph
by New Contributor
  • 2033 Views
  • 5 replies
  • 0 kudos

permission denied listing external volume when using vscode databricks extension

hey, i'm using the Db extension for vscode (Databricks connect v2). When using dbutils to list an external volume defined in UC like so:   dbutils.fs.ls("/Volumes/dev/bronze/rawdatafiles/") i get this error: "databricks.sdk.errors.mapping.PermissionD...

  • 2033 Views
  • 5 replies
  • 0 kudos
Latest Reply
lukasjh
New Contributor II
  • 0 kudos

We still face the problem (UC enabled shared cluster). Is there any resolution? @Kaniz_Fatma  

  • 0 kudos
4 More Replies
JeanT
by New Contributor
  • 1412 Views
  • 1 replies
  • 0 kudos

Help with Identifying and Parsing Varying Date Formats in Spark DataFrame

 Hello Spark Community,I'm encountering an issue with parsing dates in a Spark DataFrame due to inconsistent date formats across my datasets. I need to identify and parse dates correctly, irrespective of their format. Below is a brief outline of my p...

  • 1412 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

How about not specifying the format?  This will already match common formats.When you still have nulls, you can use your list with known exotic formats.Another solution is working with regular expressions.  looking for 2 digit numbers not larger than...

  • 0 kudos
AnkithP
by New Contributor
  • 1025 Views
  • 1 replies
  • 1 kudos

Infer schema eliminating leading zeros.

Upon reading a CSV file with schema inference enabled, I've noticed that a column originally designated as string datatype contains numeric values with leading zeros. However, upon reading the data to Pyspark data frame, it undergoes automatic conver...

  • 1025 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

if you set .option("inferSchema", "false") all columns will be read as string.You will have to cast all the other columns to their appropriate type though.  So passing a schema seems easier to me.

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels