cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

TinasheChinyati
by New Contributor III
  • 534 Views
  • 3 replies
  • 1 kudos

Resolved! Retention window from DLT created Delta tables

Hi guysI am working with data ingested from Azure EventHub using Delta Live Tables in databricks. Our data architecture includes the medallion approach. Our current requirement is to retain only the most recent 14 days of data in the silver layer. To...

Data Engineering
data engineer
Delta Live Tables
  • 534 Views
  • 3 replies
  • 1 kudos
Latest Reply
TinasheChinyati
New Contributor III
  • 1 kudos

Hi @MuthuLakshmi Thank you for sharing the configurations. Here is a bit more clarity on our current workflow.DELETE and VACUUM WorkflowOur workflow involves the following:1. DELETE Operation:We delete records matching a specific predicate to mark th...

  • 1 kudos
2 More Replies
sathya08
by New Contributor III
  • 993 Views
  • 9 replies
  • 4 kudos

Resolved! Trigger queries to SQL warehouse from Databricks notebook

Hello, I am trying to explore triggering for sql queries from Databricks notebook to serverless sql warehouse along with nest-asyncio module.Both the above are very new for me and need help on the same.For triggering the API from notebook, I am using...

  • 993 Views
  • 9 replies
  • 4 kudos
Latest Reply
sathya08
New Contributor III
  • 4 kudos

Thankyou, it really helped.

  • 4 kudos
8 More Replies
ashap551
by New Contributor II
  • 443 Views
  • 2 replies
  • 1 kudos

Best practices for code organization in large-scale Databricks ETL projects: Modular vs. Scripted

I’m curious about Data Engineering best practices for a large-scale data engineering project using Databricks to build a Lakehouse architecture (Bronze -> Silver -> Gold layers).I’m presently comparing two approaches of code writing to engineer the s...

  • 443 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @ashap551 ,I would vote for modular approach which lets you reuse code and write unit test in simpler manner. Notebooks are for me only "clients" of these shared modules. You can take a look at official documentation where they're following simila...

  • 1 kudos
1 More Replies
aahil824
by New Contributor
  • 360 Views
  • 3 replies
  • 0 kudos

How to read zip folder that contains 4 .csv files

Hello Community, I have uploaded one zip folder  "dbfs:/FileStore/tables/bike_sharing.zip"  I was trying to unzip the folder and read the 4 .csv files. I was unable to do it. Any help from your side will really be grateful!

  • 360 Views
  • 3 replies
  • 0 kudos
Latest Reply
SenthilRT
New Contributor III
  • 0 kudos

Hope this link will help. You can use cell command within a notebook to unzip (assuming you have the path access where do you want to unzip the file).https://stackoverflow.com/questions/74196011/databricks-reading-from-a-zip-file

  • 0 kudos
2 More Replies
brickster_2018
by Databricks Employee
  • 9251 Views
  • 2 replies
  • 0 kudos
  • 9251 Views
  • 2 replies
  • 0 kudos
Latest Reply
lchari
New Contributor II
  • 0 kudos

Is the limit per "table/dataframe" or for all tables/dataframes put together?The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in bo...

  • 0 kudos
1 More Replies
ChristianRRL
by Valued Contributor
  • 1011 Views
  • 4 replies
  • 4 kudos

Resolved! Spark SQL: USING JSON to create VIEWS/TABLES with existing schema file

Hi there,I'm trying to understand if there's an easy way to create VIEWS and TABLES (I'm interested in both) *WITH* a provided schema file. For example, I understand that via dataframes I can accomplish this via something like this:df = spark.read.sc...

  • 1011 Views
  • 4 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @ChristianRRL ,(A) You're not missing anything, there's no such an option as of today for SQL API.  (B) It would be much better for you to just use pyspark, but if you have to stick to just SQL API you can use following aproach. Define your schema...

  • 4 kudos
3 More Replies
SagarJi
by New Contributor II
  • 187 Views
  • 1 replies
  • 0 kudos

Is there any impact on data ingestion and data extract while REORG TABLE is in progress

While using deltalake for eventing system, with repeated updates and merges etc, we are using deletion vector to improve performance. With that comes "REORG TABLE" maintenance task.My question is in a ingestion and extract heavy system, when we condu...

  • 187 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

It is advisable to schedule REORG TABLE operations during periods of low activity to minimize disruptions to both data ingestion and extraction processes.This can potentially affect ongoing data ingestion processes because the table's underlying file...

  • 0 kudos
Phani1
by Valued Contributor II
  • 337 Views
  • 1 replies
  • 0 kudos

Cluster idle time and usage details

How can we find out the usage details of the Databricks cluster? Specifically, we need to know how many nodes are in use, how long the cluster is idle, the time it takes to start up, and the jobs it is running along with their durations. Is there a q...

  • 337 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

You can try the following query on your system tables: WITH cluster_usage AS ( SELECT u.usage_metadata.cluster_id, u.usage_date, u.usage_start_time, u.usage_end_time, DATEDIFF(second, u.usage_start_time, u.usage_end_time) AS du...

  • 0 kudos
cristianc
by Contributor
  • 350 Views
  • 1 replies
  • 1 kudos

Resolved! Does Databricks support AWS S3 Express One Zone?

Greetings,I'm writing this message since I learned that AWS has a storage class that is faster than S3 Standard called "S3 Express One Zone". (https://aws.amazon.com/s3/storage-classes/express-one-zone/)AWS offers support for this storage class with ...

  • 350 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Right now there is no support for S3 Express One Zone but this is already in our radar through idea DB-I-8058, this is currently tagged as Considered for the future, there is no ETA but our teams are working to have this supported in the near future.

  • 1 kudos
dyusuf
by New Contributor
  • 318 Views
  • 1 replies
  • 0 kudos

Unable to install kafka in community edition

Hi,I am trying to install kafka in databricks community edition after downloading. Using below command in notebook.. %sh cd kafka_2.12-3.8.1/ls -ltr ./bin/zookeeper-server-start.sh config/zookeeper.properties Below is the error log.Kindly help.

  • 318 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

It seems like the issue might be related to a port conflict, as indicated by the java.net.BindException: Address already in use error. You might want to check if another instance of ZooKeeper or another service is using the same port

  • 0 kudos
ChsAIkrishna
by New Contributor III
  • 448 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks Workflow dbt-core job failure with Connection aborted

When we are using dbt-core task on databricks workflow, each 100 workflow executions one job is failing with below reason after the reboot it works well what would be the permanent remediation ? ('Connection aborted.', RemoteDisconnected('Remote end ...

  • 448 Views
  • 2 replies
  • 1 kudos
Latest Reply
ChsAIkrishna
New Contributor III
  • 1 kudos

@Walter_C  Kudo's to you, Thank you very much, we placed the "connect retries" lets see. Ref : https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup#additional-parameters   

  • 1 kudos
1 More Replies
Sans
by New Contributor III
  • 3066 Views
  • 8 replies
  • 3 kudos

Unable to create new compute in community databricks

Hi Team,I am unable to create computer in databricks community due to below error. Please advice.Bootstrap Timeout:Node daemon ping timeout in 780000 ms for instance i-0ab6798b2c762fb25 @ 10.172.246.217. Please check network connectivity between the ...

  • 3066 Views
  • 8 replies
  • 3 kudos
Latest Reply
Rakeshkikani
New Contributor II
  • 3 kudos

Same issue facing by myself as well

  • 3 kudos
7 More Replies
sumitdesai
by New Contributor II
  • 5729 Views
  • 3 replies
  • 3 kudos

How to reuse a cluster with Databricks Asset bundles

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual...

  • 5729 Views
  • 3 replies
  • 3 kudos
Latest Reply
felix_
New Contributor II
  • 3 kudos

Hi, would it also be possible to reuse the same job cluster for multiple "Run Job" Tasks?

  • 3 kudos
2 More Replies
bhanuteja_1
by New Contributor II
  • 318 Views
  • 2 replies
  • 0 kudos

'Illegal group reference'

'Illegal group reference'The complete error is :eventmessage: {"message":"COSMOS Publish Start","instrumentationkey":"[REDACTED]","type":"Event","properties":{"ServiceOffering":"Commercial Sales and Marketing","ServiceLine":"Recommendations","Service...

  • 318 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

This log does not show any errors or issues related to "Illegal group reference". It appears to be a normal operational log for starting a data publish process. Are you facing any specific error message?

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels