cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vivek_purbey
by New Contributor II
  • 444 Views
  • 8 replies
  • 1 kudos

Databricks notebooks error

I want to read a csv file using pandas library in python in Databricks Notebooks and I uploaded my csv file (employee_data) on adfs but it still shows no such file exists can anyone help me on this?

vivek_purbey_0-1749739088896.png
  • 444 Views
  • 8 replies
  • 1 kudos
Latest Reply
Alok0903
New Contributor II
  • 1 kudos

Load it using PySpark and create a pandas data frame. Here is how you do it after uploading the datafile_path = "/FileStore/tables/your_file_name.csv"# Load CSV as Spark DataFramedf_spark = spark.read.option("header", "true").option("inferSchema", "t...

  • 1 kudos
7 More Replies
ankit001mittal
by New Contributor III
  • 161 Views
  • 0 replies
  • 0 kudos

DLT schema evolution/changes in the logs

Hi all,I want to figure out how to find when the schema evolution/changes are happening in the objects in DLT pipelines through the DLT logs.Could you please share some sample DLT logs which explains about the schema changes?Thank you for your help.

  • 161 Views
  • 0 replies
  • 0 kudos
RakeshRakesh_De
by New Contributor III
  • 152 Views
  • 1 replies
  • 0 kudos

Databricks Free Edition - sql server connector not working-

I am trying to explore New Databricks Free edition but SQL Server connector Ingestion pipeline not able to set up through UI.. Its showing error that --Serverless Compute Must be Enabled for the workspace,But Free Edition only have Serverless Option ...

Data Engineering
FreeEdition
LakeFlow
  • 152 Views
  • 1 replies
  • 0 kudos
Latest Reply
RakeshRakesh_De
New Contributor III
  • 0 kudos

Can any one please help

  • 0 kudos
kmenke-em
by New Contributor II
  • 274 Views
  • 1 replies
  • 1 kudos

Resolved! CHAR/VARCHAR fields sometimes show as STRING in a view

We've found an interesting behavior where `char` and `varchar` fields in a table show as the `string` type in a view. Consider the following table and view:create or replace table thirty_day_tables.kit_varchar_string ( str1 string, str2 char(10),...

  • 274 Views
  • 1 replies
  • 1 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 1 kudos

In Spark SQLstring is the canonical type for all textual data.char(n) and varchar(n) are parsed and stored as metadata, but internally treated as string.When you create a view, Spark does not preserve the original char(n) or varchar(n) types — it nor...

  • 1 kudos
Ranga_naik1180
by New Contributor II
  • 7440 Views
  • 7 replies
  • 5 kudos

Resolved! Delta Live table

Hi All,I'm working on a databricks delta live table(DLT) pipe line where we receive daily fully sanshot csv files in azure cloud storage .These files contain HR data (eg.employee file) and i'm using autoloader to ingest them into bronze layer DLT tab...

  • 7440 Views
  • 7 replies
  • 5 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 5 kudos

Hi @Ranga_naik1180, There is no need to create an intermediate view in SQL. You can directly read the change data feed from silver into the gold table. You can use the code something like below: CREATE STREAMING LIVE TABLE gold_table AS SELECT * FRO...

  • 5 kudos
6 More Replies
carlos_tasayco
by New Contributor III
  • 71 Views
  • 1 replies
  • 0 kudos

Showing masked column when they should not

In my organization we mask some column because they are PII, I have a dlt pipeline, I am masking these columns like this:CASE WHEN is_account_group_member("BDAIM-{environment.upper()}-PII_Unmask") THEN Personshopper.firstName ELSE mask(Personshopper....

  • 71 Views
  • 1 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Hi @carlos_tasayco, this is supported for materialized views. Please check the document to confirm if you're using the right syntax:  https://docs.databricks.com/aws/en/dlt-ref/dlt-sql-ref-create-materialized-view#examples  Please let me know if you ...

  • 0 kudos
Andolina1
by New Contributor III
  • 918 Views
  • 5 replies
  • 1 kudos

How to trigger an Azure Data Factory pipeline through API using parameters

Hello All,I have a use case where I want to trigger an Azure Data Factory pipeline through API. Right now I am calling the API in Databricks and using Service Principal(token based) to connect to ADF from Databricks.The ADF pipeline has some paramete...

  • 918 Views
  • 5 replies
  • 1 kudos
Latest Reply
Andolina1
New Contributor III
  • 1 kudos

Hello All,Thank you for all your suggestions on this thread. We have raised the issue as a product bug to Microsoft. They did not update us as to why parameters do not work with http request. However they did demo to us that parameters work with pyth...

  • 1 kudos
4 More Replies
erigaud
by Honored Contributor
  • 1521 Views
  • 5 replies
  • 5 kudos

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Hello everyone !Since Databricks Asset Bundles can now be used to deploy dashboards, I'm wondering how to pass parameters so that the queries for the dev dashboard query the dev catalog, and the dashboard in stg query the stg catalog etc.Is there any...

  • 1521 Views
  • 5 replies
  • 5 kudos
Latest Reply
Bram123
New Contributor II
  • 5 kudos

HI @erigaud, do you perhaps know if it is already possible at the moment? 

  • 5 kudos
4 More Replies
alexbarev
by New Contributor II
  • 258 Views
  • 2 replies
  • 0 kudos

Very Slow UDF Execution on One Cluster Compared to Another with Similar Config

Hi all,I’m experiencing a significant slowdown behavior in Python UDF execution times on a particular cluster. The same code runs much faster on another cluster with very similar hardware and policy settings.This cell takes 2–3 minutes on the problem...

  • 258 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Contributor
  • 0 kudos

Hi @alexbarev ,The slowdown is likely due to using Python UDFs on a Shared (Standard) access mode cluster with Unity Catalog, which adds extra security and isolation overhead. Using a Dedicated access mode cluster removes the extra isolation overhead...

  • 0 kudos
1 More Replies
jv_v
by Contributor
  • 2192 Views
  • 9 replies
  • 2 kudos

Resolved! Issue with Installing Remorph Reconcile Tool and Compatibility Clarification

I am currently working on a table migration project from a source Hive Metastore workspace to a target Unity Catalog workspace. After migrating the tables, I intend to write table validation scripts using the Remorph Reconcile tool. However, I am enc...

  • 2192 Views
  • 9 replies
  • 2 kudos
Latest Reply
Kvant
New Contributor II
  • 2 kudos

 I would just like to mention that it might not be due to remorph or your python version that you encounter this error. I got a similar error message when trying to apply changes to the metastore grants through terraform.It worked when I authenticate...

  • 2 kudos
8 More Replies
Pavankumar7
by New Contributor II
  • 491 Views
  • 1 replies
  • 0 kudos

Diffrence b/w community edition and Free edition Databrick platform

Recently there is a news from DATA +AI summit mentioning Free edition of Databricks platform, how its different from community edition?Follow up question.Is there any limitation on compute resources.Will it support other cloud services provider apart...

  • 491 Views
  • 1 replies
  • 0 kudos
Latest Reply
ilir_nuredini
New Contributor III
  • 0 kudos

Hello Pavankumar,Regarding your questions:The difference between free edition and CE: "Free Edition has been designed and extended to include full access to the Data Intelligence Platform. It provides an easy-to-use environment where you can build AI...

  • 0 kudos
Parth2692
by New Contributor II
  • 239 Views
  • 6 replies
  • 0 kudos

Experiencing sorting problems with bigint columns

Experiencing sorting problems with bigint columns across tables tested. Example:In the table the projectid (bigint as per schema - which is correct) yet when sorted in SQL is sorting as per a string value 1000903 is returned as the minimum projectid ...

  • 239 Views
  • 6 replies
  • 0 kudos
Latest Reply
EktaPuri
New Contributor III
  • 0 kudos

Try running explain command  what's happening in the background, also if table is small try to write in another table and check 

  • 0 kudos
5 More Replies
stefan-vulpe
by New Contributor II
  • 342 Views
  • 2 replies
  • 1 kudos

Resolved! Batch Python UDFs in Unity Catalog and Spark SQL

Hello datanauts 六‍,I'm encountering a conceptual challenge regarding Batch Python UDFs within Spark SQL in Databricks. My primary question is: can Batch Python UDFs be used directly via Spark SQL? As a Databricks beginner, I'm seeking to understand ...

Data Engineering
spark sql
udf
Unity Catalog
  • 342 Views
  • 2 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 1 kudos

Hi @stefan-vulpe Looking at your code and the behavior you're describing, I can identify the core issue and provide some insights about Batch Python UDFs in Databricks.The Core ProblemThe issue you're encountering is related to session isolation and ...

  • 1 kudos
1 More Replies
dbx_user
by New Contributor
  • 576 Views
  • 4 replies
  • 0 kudos

Intermittent error: "Command failed because warehouse <<warehouse id>> was stopped."

The error "Command failed because warehouse <<warehouse id>> was stopped." has started popping up during deployment runs. Some times the error correlates with serverless warehouse cluster count reducing to zero while a query is running, sometimes it ...

  • 576 Views
  • 4 replies
  • 0 kudos
Latest Reply
dbx_user2
New Contributor II
  • 0 kudos

We first noticed it this week, and have seen it occur intermittently about 10 times.The warehouse is serverless compute, and the queries are being deployed from a dbt connection.We have noticed the error pop up on queries that are being reported as r...

  • 0 kudos
3 More Replies
MauricioS
by New Contributor III
  • 115 Views
  • 1 replies
  • 1 kudos

Is it possible to reprocess only a portion of a streaming table data using DLT?

Hi all,Currently I have a standard notebook that it takes 2 dates as parameters, start date and end date it goes to the source then it pull only that portion of data then on target table deletes if necessary (if data within those ranges exists) the u...

  • 115 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 1 kudos

Hi @MauricioS Yes, you can achieve similar reprocessing functionality with DLT streaming tables,but it requires a different approach than your current batch process. Here are the main strategies:1. CDC Pattern with Tombstone RecordsThe most common ap...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels