cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

EAnthemNHC1
by New Contributor III
  • 13 Views
  • 1 replies
  • 0 kudos

Time Travel Error when selecting from materialized view (Azure Databricks)

Hey - running into an error this morning that was brought to my attention via failed refreshes from PowerBI. We have a materialized view that, when queried with the standard pattern of 'select col1 from {schema}.table_name', returns an error of 'Cann...

  • 13 Views
  • 1 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Hi @EAnthemNHC1 there could be multiple reasons :1) The materialized view is backed by a Delta table. If the underlying Delta table has been vacuumed (old versions removed), but the materialized view metadata or refresh logic tries to access a specif...

  • 0 kudos
shashankB
by New Contributor III
  • 17 Views
  • 2 replies
  • 0 kudos

Lakebridge analyzer not able to determine DDL.

 Databricks analyzer does not shows any DDL statement count, I've also tested with just a simple SELECT * query (SELECT *  FROM SCHEMA_NAME.TABLE_NAME;) . Is there any solution for this ?My target was to get a detailed analysis on SnowSQL code. Any h...

  • 17 Views
  • 2 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Hi @shashankB  select is considered as DML and not DDL

  • 0 kudos
1 More Replies
zoe_unifeye
by New Contributor II
  • 23 Views
  • 1 replies
  • 1 kudos

Building a Theoretical Solar Flare Intelligence System for the Databricks Free Edition Hackathon

I recently built a Theoretical Solar Flare Grid Impact Intelligence System for the Databricks Free Edition Hackathon 2025, and I wanted to share my journey building an end-to-end data engineering and ML solution on Databricks Free Edition.Finding the...

  • 23 Views
  • 1 replies
  • 1 kudos
Latest Reply
Raman_Unifeye
New Contributor III
  • 1 kudos

Fabulous submission @zoe_unifeye and good luck with hackathon.

  • 1 kudos
Nidhig
by Contributor
  • 20 Views
  • 2 replies
  • 1 kudos

Databricks One- Get option to see objects list

Hi,While working  on Databricks one, I feel it would be very helpful to have an option that allows users to easily view the list of tables within a schema or database directly from the UI. This would improve navigation and make it easier to explore a...

  • 20 Views
  • 2 replies
  • 1 kudos
Latest Reply
Raman_Unifeye
New Contributor III
  • 1 kudos

Short answer is NO. and I suppose that is not the purpose and right usage of Databricks One as It is meant to be the interface for the Business users, rather traditional data analysts. Obviously, through Genie you could ask 'Explain Data' to provide ...

  • 1 kudos
1 More Replies
Y_WANG
by New Contributor II
  • 115 Views
  • 2 replies
  • 0 kudos

Resolved! Want to use DataFrame equality functions but also Numpy >= 2.0

In my team, we has a lot of Data science workflow using Spark and Pandas. In order to rassure the stability of workflows, we need to implement the unit test. Recently, I found out the DataFrame equality test functions introduced in Spark 3.5 which se...

  • 115 Views
  • 2 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 0 kudos

@Y_WANG  The root cause of the AttributeError you face when importing assertDataFrameEqual from pyspark.testing in Spark 3.5 is due to Spark's code using the deprecated np.NaN attribute, which was removed in NumPy 2.0 (replaced by np.nan). This break...

  • 0 kudos
1 More Replies
pooja_bhumandla
by New Contributor III
  • 17 Views
  • 1 replies
  • 0 kudos

Best Practice for Updating Data Skipping Statistics for Additional Columns

Hi Community,I have a scenario where I’ve already calculated delta statistics for the first 32 columns after enabling the dataskipping property. Now, I need to include 10 more frequently used columns that were not part of the original 32.Goal:I want ...

  • 17 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @pooja_bhumandla ,Updating any of two below options does not automatically recompute statistics for existing data. Rather, it impacts the behavior of future statistics collection when adding or updating data in the table.- delta.dataSkippingNumInd...

  • 0 kudos
Elm8r
by Visitor
  • 23 Views
  • 1 replies
  • 0 kudos

Databricks connect use local virtual enviroment

I have a simple python script in my local development enviroment and a related uv virtual env. I am trying to run the script on a databricks cluster using my venv, but even if i select it in the Python Environment section it is not actually using it,...

  • 23 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Elm8r,Are you using vscode? 

  • 0 kudos
guidotognini
by New Contributor
  • 26 Views
  • 1 replies
  • 0 kudos

Medallion Architecture: do I need a materialized “exploded” layer (raw JSON → exploded → CDC)?

Hi everyone,I’m building a Medallion-style pipeline on Databricks for nested JSON API responses and I’d like advice on the design of an intermediate “exploded” step:Can I avoid materializing it as a table?If not, how should I name/classify it in the ...

  • 26 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 0 kudos

@guidotognini 1. Can You Avoid Materializing Exploded Data?Materialized Views: If your downstream silver table is itself streaming and supports materialized views, you may be able to collapse the explode+normalize step into a view that directly trans...

  • 0 kudos
saicharandeepb
by New Contributor III
  • 10 Views
  • 0 replies
  • 0 kudos

Decision Tree for Selecting the Right VM Types in Databricks – Looking for Feedback & Improvements!

Hi everyone,I’ve been working on an updated VM selection decision tree for Azure Databricks, designed to help teams quickly identify the most suitable worker types based on workload behavior. I’m sharing the latest version (In this updated version I’...

saicharandeepb_0-1763118168705.png
  • 10 Views
  • 0 replies
  • 0 kudos
m997al
by Contributor III
  • 3961 Views
  • 3 replies
  • 0 kudos

Errors using Databricks Extension for VS Code on Windows

Hi - I am trying to get my VS Code (running on Windows) to work with the Databricks extension for VS Code.  It seems like I can almost get this to work.  Here is my setup:1. Using Databricks Extension v2.4.02. Connecting to Databricks cluster with ru...

  • 3961 Views
  • 3 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

I guess it will be very likely you are already using it but if not, I would suggest to use GIT folders directly in Databricks in conjunction with VS Code as you are doing now. There will be times in which is much faster/straightforward to run noteboo...

  • 0 kudos
2 More Replies
pooja_bhumandla
by New Contributor III
  • 67 Views
  • 3 replies
  • 1 kudos

Seeking Insights on Liquid Clustering (LC) Based on Table Sizes

Hi all,I'm exploring Liquid Clustering (LC) and its effectiveness based on the size of the tables.Specifically, I’m interested in understanding how LC behaves with small, medium, and large tables and the best practices for each, along with size range...

  • 67 Views
  • 3 replies
  • 1 kudos
Latest Reply
pooja_bhumandla
New Contributor III
  • 1 kudos

Hi @bianca_unifeye , thank you for your response.My tables range in size from 1 KB to 5 TB. Given this, I’d love to hear your thoughts and experiences on whether Liquid Clustering (LC) would be a good fit in this scenario. Thanks in advance for shari...

  • 1 kudos
2 More Replies
Charansai
by New Contributor III
  • 44 Views
  • 2 replies
  • 1 kudos

Pipelines not included in Databricks Asset Bundles deployment

Hi all,I’m working with Databricks Asset Bundles (DAB) to build and deploy Jobs and pipelines across multiple environments in Azure Databricks.I can successfully deploy Jobs using bundles.However, when I try to deploy pipelines, I notice that the bun...

  • 44 Views
  • 2 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

As per this documentation https://docs.databricks.com/aws/en/dev-tools/bundles/resources#pipeline you should be able to do it whit latest CLI version. Check you have that latest version.Here is a sample databricks.yml configuration file -> https://gi...

  • 1 kudos
1 More Replies
Charansai
by New Contributor III
  • 29 Views
  • 1 replies
  • 0 kudos

How to use serverless clusters in DAB deployments with Unity Catalog in private network?

Hi everyone,I’m deploying Jobs and Pipelines using Databricks Asset Bundles (DAB) in an Azure Databricks workspace configured with private networking. I’m trying to use serverless compute for some workloads, but I’m running into issues when Unity Cat...

  • 29 Views
  • 1 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

A lot of questions  Concerning usage of serverless clusters in databricks.yml and assuming you're using those clusters in jobs, you must define them in the job definition. Take a look here: https://github.com/databricks/bundle-examples/tree/main/know...

  • 0 kudos
Datalight
by Contributor
  • 28 Views
  • 0 replies
  • 0 kudos

Design Oracle Fusion SCM to Azure Databricks

Hello Techie,I am planning to migrate All module of Oracle fusion scm data to Azure Databricks.Do we have only option of BICC (Business Intelligence Cloud Connector), OR any other option avaialble.Can anyone please help me with reference architecture...

  • 28 Views
  • 0 replies
  • 0 kudos
intelliconnectq
by New Contributor II
  • 62 Views
  • 2 replies
  • 0 kudos

Resolved! Loading CSV from private S3 bucket

Trying to load a csv file from a private S3 bucketplease clarify requirements to do this- Can I do it in community edition (if yes then how)? How to do it in premium version?I have IAM role and I also access key & secret 

  • 62 Views
  • 2 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

Assuming you have these pre-requisites: A private S3 bucket (e.g., s3://my-private-bucket/data/file.csv)An IAM user or role with access (list/get) to that bucketThe AWS Access Key ID and Secret Access Key (client and secret)The most straightforward w...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels