cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

prakash360
by New Contributor II
  • 176 Views
  • 2 replies
  • 1 kudos

Best practices for Liquid clustering and z-ordering for existing streaming delta tables

Hello, I have been tasked to optimize some of our existing tables in delta lake in Databricks and I was able to perform following clause on some of our delta tables but I wasn't able to execute the same clause against some of our streaming tables. AL...

  • 176 Views
  • 2 replies
  • 1 kudos
Latest Reply
prakash360
New Contributor II
  • 1 kudos

Hi @filipniziol , thank you for your quick response. Not executing the alter statement against the DLT table directly and instead adjusting the pipeline configuration approach makes sense. Regarding the liquid clustering on the DLT tables, I assume i...

  • 1 kudos
1 More Replies
baert23
by New Contributor II
  • 191 Views
  • 5 replies
  • 0 kudos

Code Optimization

1. I have alot of data transformation which result in df1, then df1 is starting point for 10 different transformations path. At first I tried to use .cache() and .count() on df1 but it was very slow. I changed caching to saving df1 as delta table and...

  • 191 Views
  • 5 replies
  • 0 kudos
Latest Reply
filipniziol
New Contributor III
  • 0 kudos

It is hard to tell why the transformations are slow without any details.You would need to share the file you are trying process and the code you are running. 

  • 0 kudos
4 More Replies
L1000
by New Contributor II
  • 245 Views
  • 4 replies
  • 2 kudos

Resolved! Delta Live Tables (Triggered Mode), how and in what order is the data processed?

Hey!I have set up a delta live tables pipeline with bronze and silver tables.I have one bronze tables which ingest data from a storage account using autoloader.Multiple files are uploaded at once in the storage account.My silver tables read and proce...

Data Engineering
autoloader
Delta Live Tables
triggered mode
  • 245 Views
  • 4 replies
  • 2 kudos
Latest Reply
L1000
New Contributor II
  • 2 kudos

Thanks @szymon_dybczak for your reply!Do you maybe have a link to documentation for this? I didn't find a lot of info about this and would love to read more on how Delta Live Tables work in Triggered vs Streaming mode  

  • 2 kudos
3 More Replies
IsmaelHenzel1
by New Contributor
  • 128 Views
  • 0 replies
  • 0 kudos

Delta Live Tables - ForeachBatch

I am wondering how to create complex streaming queries using Delta Live Tables (DLT). I can't find a way to use foreachBatch with it, and this is causing me some difficulty. I need to create a window using a lag without a time range, which is not pos...

  • 128 Views
  • 0 replies
  • 0 kudos
RohitKulkarni
by Contributor II
  • 125 Views
  • 1 replies
  • 0 kudos

Databricks Certified Data Engineer Professional

Hello Team,I have experience in the databricks.But most of the work i had on analytics side.So i need to get certified Databricks Certified Data Engineer Professional.So please any one can help me to get the content to read and prepare for the exams....

  • 125 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16756723392
New Contributor III
  • 0 kudos

@RohitKulkarni here is the link with all the details https://www.databricks.com/learn/certification/data-engineer-professional

  • 0 kudos
Ligaya
by New Contributor II
  • 28526 Views
  • 4 replies
  • 2 kudos

ValueError: not enough values to unpack (expected 2, got 1)

Code:Writer.jdbc_writer("Economy",economy,conf=CONF.MSSQL.to_dict(), modified_by=JOB_ID['Economy'])The problem arises when i try to run the code, in the specified databricks notebook, An error of "ValueError: not enough values to unpack (expected 2, ...

  • 28526 Views
  • 4 replies
  • 2 kudos
Latest Reply
veraelmore
New Contributor II
  • 2 kudos

Hey Databricks Community,The error "ValueError: not enough values to unpack (expected 2, got 1)" typically occurs when Python is trying to unpack a certain number of values, but the data it is processing does not contain the expected number. This err...

  • 2 kudos
3 More Replies
AlvaroCM
by New Contributor II
  • 113 Views
  • 0 replies
  • 0 kudos

DLT error at validation

Hello,I'm creating a DLT pipeline with Databricks on AWS. After creating an external location for my bucket, I encountered the following error:DataPlaneException: [DLT ERROR CODE: CLUSTER_LAUNCH_FAILURE.CLIENT_ERROR] Failed to launch pipeline cluster...

  • 113 Views
  • 0 replies
  • 0 kudos
dhainik
by New Contributor II
  • 324 Views
  • 3 replies
  • 0 kudos

ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3

I have a Databricks job that runs daily at 14:00 IST and typically finishes in about 2 hours. However, yesterday, the job got stuck and continued running indefinitely. After exceeding 5 hours, I canceled it and reran the job, which then completed suc...

  • 324 Views
  • 3 replies
  • 0 kudos
Latest Reply
Witold
Contributor III
  • 0 kudos

When you navigate to the corresponding cluster, you'll see, "Event log", "Spark UI" and "Driver logs". There, you should find all the needed information.

  • 0 kudos
2 More Replies
prashasinghal
by New Contributor II
  • 129 Views
  • 1 replies
  • 0 kudos

Resolved! Compute cluster not working after installing ojdbc

Hi ,I have a Databricks 12.2 LTS compute cluster which as expected.I need to establish conn to Oracle, once ojdbc11 Driver jar is installed. Cluster does not execute any cell (even print statement) and stuck in waiting state.In Driver logs it shows:'...

  • 129 Views
  • 1 replies
  • 0 kudos
Latest Reply
prashasinghal
New Contributor II
  • 0 kudos

Issue is resolved.Installing drivers directly collide with Databricks internal runtime libraries.We have used init script to copy jar from workspace to databricks/jars/.

  • 0 kudos
fpmsi
by New Contributor
  • 105 Views
  • 1 replies
  • 0 kudos

Best Approach to Store Data in Azure Gov Cloud Workspace without Unity Catalog

Our team is using a workspace Azure Gov Cloud. We would like to download files from an external source into our Workspace. Since Unity Catalog is not enabled in Azure Gov Cloud, we’re looking for the best approach for securely storing data in our wor...

  • 105 Views
  • 1 replies
  • 0 kudos
Latest Reply
Marlene495
New Contributor II
  • 0 kudos

Hello!For securely storing sensitive data in your Azure Gov Cloud workspace without Unity Catalog, use Azure Blob Storage with encryption, Azure Data Lake Storage with AAD access control, or Azure Key Vault for secrets. Managed identities can also he...

  • 0 kudos
TWib
by New Contributor III
  • 3127 Views
  • 8 replies
  • 3 kudos

DatabricksSession broken for 15.1

This code fails with exception:[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.File <command-4420517954891674>, line 7 4 spark = DatabricksSession.builder.getOrCreate() 6 df = spark.read.table("samples.nyctaxi.trips") ---->...

  • 3127 Views
  • 8 replies
  • 3 kudos
Latest Reply
977073
New Contributor II
  • 3 kudos

I can see this issue in 13.3 LTS, production code still running in 11.3LTS but upgradding to higher LTS DBR version gives this error. I believe you should fix it or provide a migration guide from one DBR to the other

  • 3 kudos
7 More Replies
ruoyuqian
by New Contributor II
  • 268 Views
  • 3 replies
  • 2 kudos

How to print out logs during DLT pipeline run

I'm trying to debug my pipeline in DLT and during runtime I need some log info and how do I do a print('something') during DLT run?

  • 268 Views
  • 3 replies
  • 2 kudos
Latest Reply
filipniziol
New Contributor III
  • 2 kudos

Hi  @ruoyuqian ,  @kranthi2,Why print() Statements Won’t Work in DLT:In Databricks Delta Live Tables (DLT), you do not see print() statements, as what is visible are the events.Alternative Solution: Using Log4j to log to Driver LogTo log information ...

  • 2 kudos
2 More Replies
Mutharasu
by New Contributor II
  • 3597 Views
  • 7 replies
  • 6 kudos

SAP Business Object(BO) Integration with Databricks

Hi Team,We are doing an analysis on SAP Business object to connect with databricks and built a report on top of the data in the data lakehouse. In our current architecture we have delta tables on top of S3 storage. Please let us know any connectors/d...

  • 3597 Views
  • 7 replies
  • 6 kudos
Latest Reply
bharat4880
New Contributor II
  • 6 kudos

Hi @HB83 , Can I know which version of BO are you using? We have a similar requirement.

  • 6 kudos
6 More Replies
Dave_Nithio
by Contributor
  • 4735 Views
  • 4 replies
  • 2 kudos

Resolved! How to use autoloader with csv containing spaces in attribute names?

I am attempting to use autoloader to add a number of csv files to a delta table. The underlying csv files have spaces in the attribute names though (i.e. 'Account Number' instead of 'AccountNumber'). When I run my autoload, I get the following error ...

  • 4735 Views
  • 4 replies
  • 2 kudos
Latest Reply
Dave_Nithio
Contributor
  • 2 kudos

@Hubert Dudek​ thanks for your response! I was able to use what you proposed above to generate the schema. The issue is that the schema sets all attributes to STRING values and renames them numerically ('_c0', '_c1', etc.). Although this allows us to...

  • 2 kudos
3 More Replies
suresh1122
by New Contributor III
  • 11658 Views
  • 12 replies
  • 7 kudos

dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?

I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code( df .write .format('delta') .mode('overwrite') .option('overwriteSchema', 'true') .saveAsTable('output_table'))but this...

  • 11658 Views
  • 12 replies
  • 7 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 7 kudos

You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time

  • 7 kudos
11 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels