cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Phani1
by Valued Contributor II
  • 3727 Views
  • 2 replies
  • 0 kudos

Convert EBCDIC (Binary) file format to ASCII

Hi Team,How can we convert EBCDIC (Binary) file format to ASCII in databricks? Do we have any libraries in Databricks?

  • 3727 Views
  • 2 replies
  • 0 kudos
Latest Reply
Radush
New Contributor II
  • 0 kudos

https://github.com/AbsaOSS/cobrix there is a library for it. 

  • 0 kudos
1 More Replies
TamD
by Contributor
  • 660 Views
  • 1 replies
  • 1 kudos

TIME data type

Our business does a LOT of reporting and analysis by time-of-day and clock times, independent of day or date.  Databricks does not seem to support the TIME data type, that I can see.  If I attempt to import data recorded as a time (eg., 02:59:59.000)...

  • 660 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @TamD ,Basically, it's just like you've written. There is no TIME data type, so you have 2 options which you already mentioned:-  you can use Timestamp data type and ignore its date part-  store it as string and do conversion each time you need it

  • 1 kudos
Phani1
by Valued Contributor II
  • 1627 Views
  • 2 replies
  • 0 kudos

Code Review tools

Could you kindly recommend any Code Review tools that would be suitable for our Databricks tech stack?

Data Engineering
code review
  • 1627 Views
  • 2 replies
  • 0 kudos
Latest Reply
Phani1
Valued Contributor II
  • 0 kudos

You can explore - SonarQube

  • 0 kudos
1 More Replies
TinasheChinyati
by New Contributor III
  • 1272 Views
  • 3 replies
  • 1 kudos

Resolved! Retention window from DLT created Delta tables

Hi guysI am working with data ingested from Azure EventHub using Delta Live Tables in databricks. Our data architecture includes the medallion approach. Our current requirement is to retain only the most recent 14 days of data in the silver layer. To...

Data Engineering
data engineer
Delta Live Tables
  • 1272 Views
  • 3 replies
  • 1 kudos
Latest Reply
TinasheChinyati
New Contributor III
  • 1 kudos

Hi @MuthuLakshmi Thank you for sharing the configurations. Here is a bit more clarity on our current workflow.DELETE and VACUUM WorkflowOur workflow involves the following:1. DELETE Operation:We delete records matching a specific predicate to mark th...

  • 1 kudos
2 More Replies
sathya08
by New Contributor III
  • 1793 Views
  • 9 replies
  • 4 kudos

Resolved! Trigger queries to SQL warehouse from Databricks notebook

Hello, I am trying to explore triggering for sql queries from Databricks notebook to serverless sql warehouse along with nest-asyncio module.Both the above are very new for me and need help on the same.For triggering the API from notebook, I am using...

  • 1793 Views
  • 9 replies
  • 4 kudos
Latest Reply
sathya08
New Contributor III
  • 4 kudos

Thankyou, it really helped.

  • 4 kudos
8 More Replies
ashap551
by New Contributor II
  • 1151 Views
  • 2 replies
  • 1 kudos

Best practices for code organization in large-scale Databricks ETL projects: Modular vs. Scripted

I’m curious about Data Engineering best practices for a large-scale data engineering project using Databricks to build a Lakehouse architecture (Bronze -> Silver -> Gold layers).I’m presently comparing two approaches of code writing to engineer the s...

  • 1151 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @ashap551 ,I would vote for modular approach which lets you reuse code and write unit test in simpler manner. Notebooks are for me only "clients" of these shared modules. You can take a look at official documentation where they're following simila...

  • 1 kudos
1 More Replies
aahil824
by New Contributor
  • 490 Views
  • 3 replies
  • 0 kudos

How to read zip folder that contains 4 .csv files

Hello Community, I have uploaded one zip folder  "dbfs:/FileStore/tables/bike_sharing.zip"  I was trying to unzip the folder and read the 4 .csv files. I was unable to do it. Any help from your side will really be grateful!

  • 490 Views
  • 3 replies
  • 0 kudos
Latest Reply
SenthilRT
New Contributor III
  • 0 kudos

Hope this link will help. You can use cell command within a notebook to unzip (assuming you have the path access where do you want to unzip the file).https://stackoverflow.com/questions/74196011/databricks-reading-from-a-zip-file

  • 0 kudos
2 More Replies
brickster_2018
by Databricks Employee
  • 11055 Views
  • 2 replies
  • 0 kudos
  • 11055 Views
  • 2 replies
  • 0 kudos
Latest Reply
lchari
New Contributor II
  • 0 kudos

Is the limit per "table/dataframe" or for all tables/dataframes put together?The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in bo...

  • 0 kudos
1 More Replies
ChristianRRL
by Valued Contributor
  • 2813 Views
  • 4 replies
  • 4 kudos

Resolved! Spark SQL: USING JSON to create VIEWS/TABLES with existing schema file

Hi there,I'm trying to understand if there's an easy way to create VIEWS and TABLES (I'm interested in both) *WITH* a provided schema file. For example, I understand that via dataframes I can accomplish this via something like this:df = spark.read.sc...

  • 2813 Views
  • 4 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @ChristianRRL ,(A) You're not missing anything, there's no such an option as of today for SQL API.  (B) It would be much better for you to just use pyspark, but if you have to stick to just SQL API you can use following aproach. Define your schema...

  • 4 kudos
3 More Replies
SagarJi
by New Contributor II
  • 347 Views
  • 1 replies
  • 0 kudos

Is there any impact on data ingestion and data extract while REORG TABLE is in progress

While using deltalake for eventing system, with repeated updates and merges etc, we are using deletion vector to improve performance. With that comes "REORG TABLE" maintenance task.My question is in a ingestion and extract heavy system, when we condu...

  • 347 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

It is advisable to schedule REORG TABLE operations during periods of low activity to minimize disruptions to both data ingestion and extraction processes.This can potentially affect ongoing data ingestion processes because the table's underlying file...

  • 0 kudos
cristianc
by Contributor
  • 594 Views
  • 1 replies
  • 1 kudos

Resolved! Does Databricks support AWS S3 Express One Zone?

Greetings,I'm writing this message since I learned that AWS has a storage class that is faster than S3 Standard called "S3 Express One Zone". (https://aws.amazon.com/s3/storage-classes/express-one-zone/)AWS offers support for this storage class with ...

  • 594 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Right now there is no support for S3 Express One Zone but this is already in our radar through idea DB-I-8058, this is currently tagged as Considered for the future, there is no ETA but our teams are working to have this supported in the near future.

  • 1 kudos
dyusuf
by New Contributor II
  • 454 Views
  • 1 replies
  • 0 kudos

Unable to install kafka in community edition

Hi,I am trying to install kafka in databricks community edition after downloading. Using below command in notebook.. %sh cd kafka_2.12-3.8.1/ls -ltr ./bin/zookeeper-server-start.sh config/zookeeper.properties Below is the error log.Kindly help.

  • 454 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

It seems like the issue might be related to a port conflict, as indicated by the java.net.BindException: Address already in use error. You might want to check if another instance of ZooKeeper or another service is using the same port

  • 0 kudos
ChsAIkrishna
by Contributor
  • 773 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks Workflow dbt-core job failure with Connection aborted

When we are using dbt-core task on databricks workflow, each 100 workflow executions one job is failing with below reason after the reboot it works well what would be the permanent remediation ? ('Connection aborted.', RemoteDisconnected('Remote end ...

  • 773 Views
  • 2 replies
  • 1 kudos
Latest Reply
ChsAIkrishna
Contributor
  • 1 kudos

@Walter_C  Kudo's to you, Thank you very much, we placed the "connect retries" lets see. Ref : https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup#additional-parameters   

  • 1 kudos
1 More Replies
sumitdesai
by New Contributor II
  • 8068 Views
  • 3 replies
  • 3 kudos

How to reuse a cluster with Databricks Asset bundles

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual...

  • 8068 Views
  • 3 replies
  • 3 kudos
Latest Reply
felix_
New Contributor II
  • 3 kudos

Hi, would it also be possible to reuse the same job cluster for multiple "Run Job" Tasks?

  • 3 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels