cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Devsh_on_point
by New Contributor
  • 1433 Views
  • 2 replies
  • 3 kudos

Liquid Clustering with Partitioning

Hi Team,Can we use Partitioning and Liquid Clustering in Conjunction? Essentially, partitioning the table first on a specific field and then apply liquid clustering (on other fields)?Alternatively, can we define the order priority of the cluster key ...

  • 1433 Views
  • 2 replies
  • 3 kudos
Latest Reply
jeffrey-gong
Databricks Employee
  • 3 kudos

Hi @Devsh_on_point, we are in Private Preview for a feature that helps you convert a Partitioned table to Liquid Clustering. Here is the User Guide. Reach out to your account team to get enrolled!

  • 3 kudos
1 More Replies
walgt
by Databricks Partner
  • 5468 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks data engineer associate exam

Hi everyone,I'm preparing for the Databricks Data Engineer Associate certification. On the Databricks website, they list the following self-paced courses available in Databricks Academy for exam preparation:Data Ingestion with Delta LakeDeploy Worklo...

  • 5468 Views
  • 2 replies
  • 1 kudos
Latest Reply
jackeis
New Contributor II
  • 1 kudos

Thanks for sharing this useful discussion on the Databricks Data Engineer Associate exam. I’m also preparing for this certification and found many helpful resources and insights here. If anyone else is currently studying or has already cleared it, fe...

  • 1 kudos
1 More Replies
Volker
by Databricks Partner
  • 4361 Views
  • 2 replies
  • 0 kudos

From Partitioning to Liquid Clustering

We had some delta tables that where previously partitioned on year, month, day, and hour. This resulted in quite small partitions and we now switched to liquid clustering.We followed these steps:Remove partitioning by doing REPLACEALTER TABLE --- CLU...

  • 4361 Views
  • 2 replies
  • 0 kudos
Latest Reply
jeffrey-gong
Databricks Employee
  • 0 kudos

Hi @Volker , we are in Private Preview now for a feature that helps you easily convert a table from Partitioning to Liquid Clustering. Here is the User Guide.

  • 0 kudos
1 More Replies
shoumitra
by New Contributor
  • 4837 Views
  • 2 replies
  • 0 kudos

Resolved! Pathway advice on how to Data Engineer Associate

Hi everyone,I am new to this community and I am a BI/Data Engineer by trade in Microsoft Azure/On prem context. I want some advice on how to be a certified Data Engineer Associate in Databiricks. The training, lesson or courses to be eligible for tak...

  • 4837 Views
  • 2 replies
  • 0 kudos
Latest Reply
jackeis
New Contributor II
  • 0 kudos

Great post, very helpful insights on the Data Engineer Associate pathway I’m also preparing for this exam and found similar resources really useful for understanding the core concepts and practice approach. Thanks for sharing! If anyone else has add...

  • 0 kudos
1 More Replies
Chiran-Gajula
by New Contributor III
  • 554 Views
  • 3 replies
  • 0 kudos

Resolved! How to update alias for catalogs

Greetings,Is there a way to create an alias for a Databricks catalog? Current catalog name: trainingDesired alias: development_training The goal is that users connecting to either name should see the same schemas, tables, and data

  • 554 Views
  • 3 replies
  • 0 kudos
Latest Reply
Chiran-Gajula
New Contributor III
  • 0 kudos

I have a use case where I need to rename a catalog without impacting existing pipelines and notebooks, as the current catalog name is referenced across multiple applications. Instead of coordinating with multiple teams to update it everywhere, I was ...

  • 0 kudos
2 More Replies
Raj_DB
by Contributor
  • 1603 Views
  • 7 replies
  • 11 kudos

Resolved! Designing Reliable Data Versioning Strategies in Databricks

Hi everyone,I’m working on a use case where I need to retain 30 days of historical data in a Delta table and use it to build trend reports.I’m looking for the best approach to reliably maintain this historical data while also making it suitable for r...

  • 1603 Views
  • 7 replies
  • 11 kudos
Latest Reply
DivyaandData
Databricks Employee
  • 11 kudos

Hey @Raj_DB , The TLDR is  time travel is great for short-term ops and debugging, but brittle as your primary reporting history, and its cost profile is harder to control and reason about than a purpose-built history table. Docs 1,2 explicitly say De...

  • 11 kudos
6 More Replies
bi_123
by New Contributor III
  • 953 Views
  • 4 replies
  • 5 kudos

Valid init script for installing ODBC Driver 18 for SQL Server to a job cluster

I need to execute stored procedures in my notebook, to do that I created an init script that installs ODBC driver to my job cluster. But the script stops working after some time and I can't figure out why, so the cluster can't start. Can someone send...

  • 953 Views
  • 4 replies
  • 5 kudos
Latest Reply
Lu_Wang_ENB_DBX
Databricks Employee
  • 5 kudos

You don't need an ODBC driver inside a Databricks job cluster to run stored procedures, and init scripts are fragile enough that they easily break cluster startup. Options: Option 1 (Recommended): Use built-in JDBC / connectors instead of ODBC For S...

  • 5 kudos
3 More Replies
rohan22sri
by New Contributor III
  • 1038 Views
  • 3 replies
  • 2 kudos

Structured Streaming Real-Time Mode Doesn’t Support Delta — What’s the Plan?

Real-time mode doesn’t currently support Delta tables in Structured Streaming. Is there a planned timeline for launching this support?Also, are there any plans for declarative pipelines to support real-time mode or provide an equivalent capability?

  • 1038 Views
  • 3 replies
  • 2 kudos
Latest Reply
amirabedhiafi
Contributor
  • 2 kudos

Hi again !Tere usually is not a single detailed public roadmap with dates for Databricks features. I think the best official places to track what is coming are release notes, preview or beta announcements the blog and the product feedback and ideas c...

  • 2 kudos
2 More Replies
AnonymousK
by New Contributor II
  • 430 Views
  • 2 replies
  • 2 kudos

Why do you want to migrate from azure synapse analytics or Azure data factory to databricks

It's a simple answer bro. According to our analysis Azure pipelines and not books match process approximately 40% faster than the snaps analytics. If we really want to optimise your pipelines and perform cost optimisations in your team please migrate...

  • 430 Views
  • 2 replies
  • 2 kudos
Latest Reply
amirabedhiafi
Contributor
  • 2 kudos

Hello !The only thing I can add here is migrating from ASA or ADF to dbks can make sense when your workloads need more scalable spark based processing, batch and streaming support, advanced transformation logic, lakehouse architecture or ML and AI ca...

  • 2 kudos
1 More Replies
mjedy78
by New Contributor II
  • 3080 Views
  • 5 replies
  • 1 kudos

Transition from partitioned table to Liquid clustered table

Hi all,I have a table called classes, which is already partitioned on three different columns. I want to create a Liquid Clustered Table, but as far as I understand from the documentation—and from Dany Lee and his team—it was not possible as of 2024 ...

  • 3080 Views
  • 5 replies
  • 1 kudos
Latest Reply
biancaorita
New Contributor II
  • 1 kudos

Is there a plan to implement a way to migrate to liquid clustering for an existing table that has traditional partitioning and that is quite large (over 4 TB)? Re-creating such tables from scratch is not always ideal.

  • 1 kudos
4 More Replies
SahilRana3097
by New Contributor
  • 560 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks not able to create cluster with Amazon free trial version

Error : Cannot launch the cluster because the user specified an invalid argument.Instance ID: failed-2d901c0f-d88d-499a-aInternal error message: The VM launch request to AWS failed, please check your configuration. [details] InvalidParameterCombinati...

  • 560 Views
  • 1 replies
  • 0 kudos
Latest Reply
DivyaandData
Databricks Employee
  • 0 kudos

The error is coming from AWS, not Databricks: your AWS account is restricted to Free Tier–eligible instance types, but the node type you picked in Databricks maps to an EC2 instance that is not Free Tier–eligible, so AWS rejects the launch request wi...

  • 0 kudos
AdrianLobacz
by Databricks Partner
  • 220 Views
  • 1 replies
  • 0 kudos

FileNotFoundError: [Errno 2] No such file or directory: '../00_configuration/prd/main_configuration.

Maybe someone has encountered this problem before?I’m running parallel loading for 10 objects using pool.map. Nine of them complete successfully, but one fails when trying to read a configuration file. The problem occurs occasionally and doesn’t foll...

  • 220 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

@AdrianLobacz You can read the configuration once and pass the object into your function instead of reading the same file multiple times. It eliminates the IO overhead and avoids hitting the FUSE layer. When the code triggers parallel processes, they...

  • 0 kudos
seefoods
by Valued Contributor
  • 571 Views
  • 1 replies
  • 0 kudos

Resolved! databricks autoloader source files

Hello, How can handle this error when we use autoloader with spark.readStream (com.databricks.sql.cloudfiles.errors.CloudFilesException) [CF_EMPTY_DIR_FOR_SCHEMA_INFERENCE] Cannot infer schema when the input path `/Volumes/default/landing/source/bund...

  • 571 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @seefoods, The error message seems to indicate there are no files in the source path? You can either define the schema yourself and pass it to schema(...) so Auto Loader doesn’t need to infer anything.. and as soon as files arrive, the stream will...

  • 0 kudos
IM_01
by Valued Contributor
  • 1711 Views
  • 8 replies
  • 0 kudos
  • 1711 Views
  • 8 replies
  • 0 kudos
Latest Reply
IM_01
Valued Contributor
  • 0 kudos

Hi @Ashwin_DSA Thanks for the response â€Œâ€ŒI was thinking if the results are precomputed using cube and persisted using mv that would be retrieve results faster than metric view. Could u please let me know if my understanding is correct

  • 0 kudos
7 More Replies
Diehl
by New Contributor III
  • 755 Views
  • 1 replies
  • 1 kudos

Resolved! Auto Loader with ignoreMissingFiles and useManagedFileEvents fails on Classic Compute

Hi everyone,I am seeing an unexpected behavior with Auto Loader when using Managed File Events on Classic Compute.The error message itself seems inconsistent with the behavior I am seeing:[FAILED_READ_FILE.DBR_FILE_NOT_EXIST] Error while reading file...

  • 755 Views
  • 1 replies
  • 1 kudos
Latest Reply
Diehl
New Contributor III
  • 1 kudos

Just sharing a solution in case anyone runs into the same issue.The error was caused by the cluster configuration including spark.master: "local[*]". After removing this setting, the error stopped occurring and the Auto Loader finished correctly.This...

  • 1 kudos
Labels