cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RakeshRakesh_De
by New Contributor II
  • 12 Views
  • 1 replies
  • 0 kudos

Spark CSV file read option to read blank/empty value from file as empty value only instead Null

Hi,I am trying to read one file which having some blank value in column and we know spark convert blank value to null value during reading, how to read blank/empty value as empty value ?? tried DBR 13.2,14.3I have tried all possible way but its not w...

RakeshRakesh_De_0-1713431921922.png
Data Engineering
csv
EmptyValue
FileRead
  • 12 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

May I ask why you do not want null?  It is THE way to indicate a value is missing (and gives you filtering possibilities using isNull/isNotNull).

  • 0 kudos
data-grassroots
by New Contributor
  • 104 Views
  • 5 replies
  • 0 kudos

Ingesting Files - Same file name, modified content

We have a data feed with files whose filenames stays the same but the contents change over time (brand_a.csv, brand_b.csv, brand_c.csv ....).Copy Into seems to ignore the files when they change.If we set the Force flag to true and run it, we end up w...

  • 104 Views
  • 5 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

If you do not have control over the content of the files I suggest the following:Each day you get new files/data (I suppose these are not incremental).  These files contain new, updated and deleted data, and are overwritten.Because of this, autoloade...

  • 0 kudos
4 More Replies
RiyazAli
by Contributor III
  • 295 Views
  • 1 replies
  • 0 kudos

Unable to create a record_id column via DLT - Autoloader

Hi Community,I'm trying to load data from the landing zone to the bronze layer via DLT- Autoloader, I want to add a column record_id to the bronze table while I fetch my data. I'm also using file arrival trigger in the workflow to update my table inc...

  • 295 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAli
Contributor III
  • 0 kudos

Hey @Kaniz  - could you or any body from the community team help me here, please? I've been stuck since quite some time now.

  • 0 kudos
174817
by New Contributor II
  • 13 Views
  • 1 replies
  • 0 kudos

DataBricks Rust client and/or OpenAPI spec

Hi,I'm looking for a DataBricks client for Rust.  I could only find these SDK implementations.Alternatively, I would be very happy with the OpenAPI spec.  Clearly one exists: the Go SDK implementation contains code to generate itself from such a spec...

Data Engineering
openapi
rust
sdk
unity
  • 13 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Contributor
  • 0 kudos

Databricks REST API referenceThis reference contains information about the Databricks application programming interfaces (APIs). Each API reference page is presented primarily from a representational state transfer (REST) perspective. Databricks REST...

  • 0 kudos
cosminsanda
by New Contributor II
  • 84 Views
  • 4 replies
  • 1 kudos

Resolved! Unit Testing with the new Databricks Connect in Python

I would like to create a regular PySpark session in an isolated environment against which I can run my Spark based tests. I don't see how that's possible with the new Databricks Connect. I'm going in circles here, is it even possible?I don't want to ...

  • 84 Views
  • 4 replies
  • 1 kudos
Latest Reply
cosminsanda
New Contributor II
  • 1 kudos

Ok, so the best solution as it stands today (for me personally at least) is this:Have pyspark ^3.4 installed with the connect extra feature.My unit tests then don't have to change at all, as they use the regular spark session created on the flyFor ru...

  • 1 kudos
3 More Replies
Karlo_Kotarac
by New Contributor II
  • 9 Views
  • 0 replies
  • 0 kudos

Run failed with error message ContextNotFound

Hi all!Recently we've been getting lots of these errors when running Databricks notebooks:At that time we observed DRIVER_NOT_RESPONDING (Driver is up but is not responsive, likely due to GC.) log on the single-user cluster we use.Previously when thi...

Karlo_Kotarac_0-1713422302017.png
  • 9 Views
  • 0 replies
  • 0 kudos
Brammer88
by New Contributor II
  • 130 Views
  • 2 replies
  • 0 kudos

Resolved! Trying to run databricks academy labs, but execution fails due to method to clearcache not whilelist

Hi there,Im trying to run DE 2.1 - Querying Files Directly on my workspace with a default cluster configuration for found below,but I cannot seem to run this file (or any other labs) as it gives me this error message  Resetting the learning environme...

Brammer88_0-1713340930496.png
  • 130 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brammer88
New Contributor II
  • 0 kudos

works, thanks for the quick response! 

  • 0 kudos
1 More Replies
Phani1
by Valued Contributor
  • 21 Views
  • 1 replies
  • 0 kudos

Code Review tools

Could you kindly recommend any Code Review tools that would be suitable for our Databricks tech stack?

Data Engineering
code review
  • 21 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1, When it comes to code review tools for your Databricks tech stack, here are some options you might find useful: Built-in Interactive Debugger in Databricks Notebook: The interactive debugger is available exclusively for Python code withi...

  • 0 kudos
Aidonis
by New Contributor III
  • 5758 Views
  • 3 replies
  • 2 kudos

Resolved! Flatten Deep Nested Struct

Hi All,I have a deeply nested spark dataframe struct something similar to below |-- id: integer (nullable = true) |-- lower: struct (nullable = true) | |-- field_a: integer (nullable = true) | |-- upper: struct (containsNull = true) | | ...

  • 5758 Views
  • 3 replies
  • 2 kudos
Latest Reply
Praveen-bpk21
  • 2 kudos

@Aidonis You can try this as well:flatten-spark-dataframe · PyPIThis also allows for specific level of flattening.

  • 2 kudos
2 More Replies
SPres
by Visitor
  • 67 Views
  • 1 replies
  • 0 kudos

Passing Parameters from Azure Synapse

Hey Community!Just curious if anyone has tried using Azure Synapse for orchestration and passing parameters from Synapse to a Databricks Notebook. My team is testing out Databricks, and I'm replacing Synapse Notebooks with Databricks Notebooks, but I...

  • 67 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @SPres You can definitely pass these parameters to databricks notebook also.Please refer below docs - Run a Databricks Notebook with the activity - Azure Data Factory | Microsoft Learn

  • 0 kudos
Chengzhu
by Visitor
  • 44 Views
  • 0 replies
  • 0 kudos

Databricks Model Registry Notification

Hi community,Currently, I am training models on databricks cluster and use mlflow to log and register models. My goal is to send notification to me when a new version of registered model happens (if the new run achieves some model performance baselin...

Screenshot 2024-04-17 at 1.14.11 PM.png Screenshot 2024-04-17 at 1.13.14 PM.png
  • 44 Views
  • 0 replies
  • 0 kudos
dilkushpatel
by New Contributor
  • 116 Views
  • 4 replies
  • 0 kudos

Databricks connecting SQL Azure DW - Confused between Polybase and Copy Into

I see two articles on databricks documentationshttps://docs.databricks.com/en/archive/azure/synapse-polybase.html#language-pythonhttps://docs.databricks.com/en/connect/external-systems/synapse-analytics.html#service-principal Polybase one is legacy o...

Data Engineering
azure
Copy
help
Polybase
Synapse
  • 116 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @dilkushpatel, Thank you for sharing your confusion regarding PolyBase and the COPY INTO command in Databricks when working with Azure Synapse.  PolyBase (Legacy): PolyBase was previously used for data loading and unloading operations in Azure...

  • 0 kudos
3 More Replies
Abhi0607
by Visitor
  • 109 Views
  • 2 replies
  • 0 kudos

Variables passed from ADF to Databricks Notebook Try-Catch are not accessible

Dear Members,I need your help in below scenario.I am passing few parameters from ADF pipeline to Databricks notebook.If I execute ADF pipeline to run my databricks notebook and use these variables as is in my code (python) then it works fine.But as s...

  • 109 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Abhi0607  Can you please help me to find if you are taking or defining these parameter value outside try catch or inside it ?

  • 0 kudos
1 More Replies
fuselessmatt
by Contributor
  • 1362 Views
  • 4 replies
  • 1 kudos

Accidentally removing the service principal that owns the view seems to put the Unity Catalog in an illegal state. Can you fix this?

I renamed our service principal in Terraform, which forces a replacement where the old service principal is removed and a new principal with the same permission is recreated. The Terraform succeeds to apply, but when I try to run dbt that creates tab...

  • 1362 Views
  • 4 replies
  • 1 kudos
Latest Reply
fuselessmatt
Contributor
  • 1 kudos

This is also true for removing groups before unassigning them (removing and unassigning in Terraform)│ Error: cannot update grants: Could not find principal with name <My Group Name>

  • 1 kudos
3 More Replies
my_super_name
by New Contributor
  • 64 Views
  • 1 replies
  • 0 kudos

Auto Loader Schema Hint Behavior: Addressing Nested Field Errors

Hello,I'm using the auto loader to stream a table of data and have added schema hints to specify field values.I've observed that when my initial data file is missing fields specified in the schema hint,the auto loader correctly identifies this and ad...

  • 64 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @my_super_name,  Default Schema Inference: By default, Auto Loader schema inference aims to avoid schema evolution issues due to type mismatches. For formats like JSON, CSV, and XML that don’t encode data types explicitly, Auto Loader infers a...

  • 0 kudos
Labels
Top Kudoed Authors