Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Phani1 • Valued Contributor II

06-05-2023 10:38:32 PM

3727 Views
2 replies
0 kudos

Convert EBCDIC (Binary) file format to ASCII

Hi Team,How can we convert EBCDIC (Binary) file format to ASCII in databricks? Do we have any libraries in Databricks?

Data Engineering

3727 Views
2 replies
0 kudos

06-05-2023 10:38:32 PM

View Replies

Latest Reply

Radush
New Contributor II

11-18-2024 5:51:19 AM

0 kudos

https://github.com/AbsaOSS/cobrix there is a library for it.

0 kudos

11-18-2024 5:51:19 AM

1 More Replies

by TamD • Contributor

11-17-2024 4:48:45 PM

660 Views
1 replies
1 kudos

TIME data type

Our business does a LOT of reporting and analysis by time-of-day and clock times, independent of day or date. Databricks does not seem to support the TIME data type, that I can see. If I attempt to import data recorded as a time (eg., 02:59:59.000)...

Data Engineering

660 Views
1 replies
1 kudos

11-17-2024 4:48:45 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

11-18-2024 2:02:09 AM

1 kudos

Hi @TamD ,Basically, it's just like you've written. There is no TIME data type, so you have 2 options which you already mentioned:- you can use Timestamp data type and ignore its date part- store it as string and do conversion each time you need it

1 kudos

11-18-2024 2:02:09 AM

by Phani1 • Valued Contributor II

04-17-2024 10:31:16 PM

1627 Views
2 replies
0 kudos

Code Review tools

Could you kindly recommend any Code Review tools that would be suitable for our Databricks tech stack?

Data Engineering

code review

1627 Views
2 replies
0 kudos

04-17-2024 10:31:16 PM

View Replies

Latest Reply

Phani1
Valued Contributor II

11-18-2024 1:23:22 AM

0 kudos

You can explore - SonarQube

0 kudos

11-18-2024 1:23:22 AM

1 More Replies

by TinasheChinyati • New Contributor III

11-14-2024 12:50:40 AM

1272 Views
3 replies
1 kudos

Resolved! Retention window from DLT created Delta tables

Hi guysI am working with data ingested from Azure EventHub using Delta Live Tables in databricks. Our data architecture includes the medallion approach. Our current requirement is to retain only the most recent 14 days of data in the silver layer. To...

Data Engineering

data engineer

Delta Live Tables

1272 Views
3 replies
1 kudos

11-14-2024 12:50:40 AM

View Replies

Latest Reply

TinasheChinyati
New Contributor III

11-17-2024 10:57:47 PM

1 kudos

Hi @MuthuLakshmi Thank you for sharing the configurations. Here is a bit more clarity on our current workflow.DELETE and VACUUM WorkflowOur workflow involves the following:1. DELETE Operation:We delete records matching a specific predicate to mark th...

1 kudos

11-17-2024 10:57:47 PM

2 More Replies

by sathya08 • New Contributor III

11-10-2024 9:14:59 PM

1793 Views
9 replies
4 kudos

Resolved! Trigger queries to SQL warehouse from Databricks notebook

Hello, I am trying to explore triggering for sql queries from Databricks notebook to serverless sql warehouse along with nest-asyncio module.Both the above are very new for me and need help on the same.For triggering the API from notebook, I am using...

Data Engineering

1793 Views
9 replies
4 kudos

11-10-2024 9:14:59 PM

View Replies

Latest Reply

sathya08
New Contributor III

11-17-2024 4:43:47 PM

4 kudos

Thankyou, it really helped.

4 kudos

11-17-2024 4:43:47 PM

8 More Replies

by ashap551 • New Contributor II

11-17-2024 2:01:07 AM

1151 Views
2 replies
1 kudos

Best practices for code organization in large-scale Databricks ETL projects: Modular vs. Scripted

I’m curious about Data Engineering best practices for a large-scale data engineering project using Databricks to build a Lakehouse architecture (Bronze -> Silver -> Gold layers).I’m presently comparing two approaches of code writing to engineer the s...

Data Engineering

1151 Views
2 replies
1 kudos

11-17-2024 2:01:07 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

11-17-2024 9:53:56 AM

1 kudos

Hi @ashap551 ,I would vote for modular approach which lets you reuse code and write unit test in simpler manner. Notebooks are for me only "clients" of these shared modules. You can take a look at official documentation where they're following simila...

1 kudos

11-17-2024 9:53:56 AM

1 More Replies

by ArulDavid • New Contributor III

06-29-2023 1:07:48 PM

2359 Views
2 replies
2 kudos

Resolved! Any SAS accelerator tools to convert to spark?

Any SAS accelerator tools to convert to spark?

Data Engineering

2359 Views
2 replies
2 kudos

06-29-2023 1:07:48 PM

View Replies

Latest Reply

protmaks
New Contributor II

11-17-2024 1:27:04 AM

2 kudos

You can try Alchemist https://www.getalchemist.io/

2 kudos

11-17-2024 1:27:04 AM

1 More Replies

by aahil824 • New Contributor

11-16-2024 2:10:57 AM

490 Views
3 replies
0 kudos

How to read zip folder that contains 4 .csv files

Hello Community, I have uploaded one zip folder "dbfs:/FileStore/tables/bike_sharing.zip" I was trying to unzip the folder and read the 4 .csv files. I was unable to do it. Any help from your side will really be grateful!

Data Engineering

490 Views
3 replies
0 kudos

11-16-2024 2:10:57 AM

View Replies

Latest Reply

SenthilRT
New Contributor III

11-16-2024 3:22:27 PM

0 kudos

Hope this link will help. You can use cell command within a notebook to unzip (assuming you have the path access where do you want to unzip the file).https://stackoverflow.com/questions/74196011/databricks-reading-from-a-zip-file

0 kudos

11-16-2024 3:22:27 PM

2 More Replies

by brickster_2018 • Databricks Employee

06-25-2021 2:51:19 PM

11055 Views
2 replies
0 kudos

Resolved! What is the maximum limit of data that can be broadcasted using broadcast join

Data Engineering

11055 Views
2 replies
0 kudos

06-25-2021 2:51:19 PM

View Replies

Latest Reply

lchari
New Contributor II

11-16-2024 6:01:19 AM

0 kudos

Is the limit per "table/dataframe" or for all tables/dataframes put together?The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in bo...

0 kudos

11-16-2024 6:01:19 AM

1 More Replies

by ChristianRRL • Valued Contributor

11-15-2024 9:50:16 AM

2813 Views
4 replies
4 kudos

Resolved! Spark SQL: USING JSON to create VIEWS/TABLES with existing schema file

Hi there,I'm trying to understand if there's an easy way to create VIEWS and TABLES (I'm interested in both) *WITH* a provided schema file. For example, I understand that via dataframes I can accomplish this via something like this:df = spark.read.sc...

Data Engineering

2813 Views
4 replies
4 kudos

11-15-2024 9:50:16 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

11-15-2024 10:49:04 AM

4 kudos

Hi @ChristianRRL ,(A) You're not missing anything, there's no such an option as of today for SQL API. (B) It would be much better for you to just use pyspark, but if you have to stick to just SQL API you can use following aproach. Define your schema...

4 kudos

11-15-2024 10:49:04 AM

3 More Replies

by SagarJi • New Contributor II

11-14-2024 11:36:05 PM

347 Views
1 replies
0 kudos

Is there any impact on data ingestion and data extract while REORG TABLE is in progress

While using deltalake for eventing system, with repeated updates and merges etc, we are using deletion vector to improve performance. With that comes "REORG TABLE" maintenance task.My question is in a ingestion and extract heavy system, when we condu...

Data Engineering

347 Views
1 replies
0 kudos

11-14-2024 11:36:05 PM

View Replies

Latest Reply

Walter_C
Databricks Employee

11-15-2024 8:11:23 AM

0 kudos

It is advisable to schedule REORG TABLE operations during periods of low activity to minimize disruptions to both data ingestion and extraction processes.This can potentially affect ongoing data ingestion processes because the table's underlying file...

0 kudos

11-15-2024 8:11:23 AM

by cristianc • Contributor

11-15-2024 12:28:23 AM

594 Views
1 replies
1 kudos

Resolved! Does Databricks support AWS S3 Express One Zone?

Greetings,I'm writing this message since I learned that AWS has a storage class that is faster than S3 Standard called "S3 Express One Zone". (https://aws.amazon.com/s3/storage-classes/express-one-zone/)AWS offers support for this storage class with ...

Data Engineering

594 Views
1 replies
1 kudos

11-15-2024 12:28:23 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

11-15-2024 7:39:02 AM

1 kudos

Right now there is no support for S3 Express One Zone but this is already in our radar through idea DB-I-8058, this is currently tagged as Considered for the future, there is no ETA but our teams are working to have this supported in the near future.

1 kudos

11-15-2024 7:39:02 AM

by dyusuf • New Contributor II

11-15-2024 5:13:21 AM

454 Views
1 replies
0 kudos

Unable to install kafka in community edition

Hi,I am trying to install kafka in databricks community edition after downloading. Using below command in notebook.. %sh cd kafka_2.12-3.8.1/ls -ltr ./bin/zookeeper-server-start.sh config/zookeeper.properties Below is the error log.Kindly help.

Data Engineering

454 Views
1 replies
0 kudos

11-15-2024 5:13:21 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

11-15-2024 7:33:13 AM

0 kudos

It seems like the issue might be related to a port conflict, as indicated by the java.net.BindException: Address already in use error. You might want to check if another instance of ZooKeeper or another service is using the same port

0 kudos

11-15-2024 7:33:13 AM

by ChsAIkrishna • Contributor

11-14-2024 3:26:02 AM

773 Views
2 replies
1 kudos

Resolved! Databricks Workflow dbt-core job failure with Connection aborted

When we are using dbt-core task on databricks workflow, each 100 workflow executions one job is failing with below reason after the reboot it works well what would be the permanent remediation ? ('Connection aborted.', RemoteDisconnected('Remote end ...

Data Engineering

773 Views
2 replies
1 kudos

11-14-2024 3:26:02 AM

View Replies

Latest Reply

ChsAIkrishna
Contributor

11-15-2024 5:39:20 AM

1 kudos

@Walter_C Kudo's to you, Thank you very much, we placed the "connect retries" lets see. Ref : https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup#additional-parameters

1 kudos

11-15-2024 5:39:20 AM

1 More Replies

by sumitdesai • New Contributor II

03-01-2024 6:06:00 AM

8068 Views
3 replies
3 kudos

How to reuse a cluster with Databricks Asset bundles

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual...

Data Engineering

8068 Views
3 replies
3 kudos

03-01-2024 6:06:00 AM

View Replies

Latest Reply

felix_
New Contributor II

11-14-2024 1:51:53 AM

3 kudos

Hi, would it also be possible to reuse the same job cluster for multiple "Run Job" Tasks?

3 kudos

11-14-2024 1:51:53 AM

2 More Replies

User

Count

1611

768

348

286

252

Databricks Community

Forum Posts

Convert EBCDIC (Binary) file format to ASCII

TIME data type

Code Review tools

Resolved! Retention window from DLT created Delta tables

Resolved! Trigger queries to SQL warehouse from Databricks notebook

Best practices for code organization in large-scale Databricks ETL projects: Modular vs. Scripted

Resolved! Any SAS accelerator tools to convert to spark?

How to read zip folder that contains 4 .csv files

Resolved! What is the maximum limit of data that can be broadcasted using broadcast join

Resolved! Spark SQL: USING JSON to create VIEWS/TABLES with existing schema file

Is there any impact on data ingestion and data extract while REORG TABLE is in progress

Resolved! Does Databricks support AWS S3 Express One Zone?

Unable to install kafka in community edition

Resolved! Databricks Workflow dbt-core job failure with Connection aborted

How to reuse a cluster with Databricks Asset bundles

Join Us as a Local Community Builder!

global temp view issue

Dlt pipeline showing legacy , even though all thin...

SERVERLESS SQL WAREHOUSE

Unity Catalog Table in Databricks Asset Bundle

Databricks data engineer associate exam