Get Started Discussions

by Phani1 • Valued Contributor II

02-04-2025 7:36:02 AM

418 Views
1 replies
0 kudos

EMR cluster pyspark scripts to databricks

Hi All,The PySpark scripts currently operating on the EMR cluster need to be migrated to Databricks. Are there any tools available that can assist in minimizing the time required for code conversion? Your suggestions would be appreciated.Regards,Phan...

Get Started Discussions

Reply

418 Views
1 replies
0 kudos

02-04-2025 7:36:02 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

02-04-2025 7:26:15 PM

0 kudos

Hello @Phani1, This guide can help you: https://www.databricks.com/resources/guide/emr-databricks-migration-guide

0 kudos

02-04-2025 7:26:15 PM

by Phani1 • Valued Contributor II

02-04-2025 7:40:18 AM

659 Views
1 replies
0 kudos

Airflow jobs migration to Databricks Workflows

Hi All,We need to move our Airflow jobs over to Databricks Workflows. Are there any tools out there that can help with this migration and make the process quicker? If you have any sample code or documents that could assist, I would really appreciate ...

Get Started Discussions

Reply

659 Views
1 replies
0 kudos

02-04-2025 7:40:18 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

02-04-2025 4:56:37 PM

0 kudos

Hi @Phani1, Please see this post which can help you: https://community.databricks.com/t5/data-engineering/migrating-logic-from-airflow-dags-to-databricks-workflow/td-p/104501

0 kudos

02-04-2025 4:56:37 PM

by tarunnagpal • New Contributor III

01-31-2025 4:47:40 AM

1268 Views
2 replies
3 kudos

Snowflake to Databricks migration

We are working on a proposal for our existing customer to migrate approximately 500 tables and the associated business logic from Snowflake to Databricks. The business logic is currently implemented using stored procedures, which need to be converted...

Get Started Discussions

Reply

1268 Views
2 replies
3 kudos

01-31-2025 4:47:40 AM

View Replies

Latest Reply

sunnydata
New Contributor II

02-04-2025 5:10:44 AM

3 kudos

Hi @tarunnagpal !!Adding to what @MariuszK said,Using an LLM to accelerate the translation process is a great approach, but if the code is proprietary, it's best to use a closed model.Implementing a validation process is crucial to ensure that the tr...

3 kudos

02-04-2025 5:10:44 AM

1 More Replies

by rgower • New Contributor III

02-03-2025 8:12:32 AM

1551 Views
4 replies
1 kudos

Different JSON Results when Running a Job vs Running a Notebook

I have a regularly scheduled job that runs a PySpark Notebook that GETs semi-structured JSON data from an external API, loads that data into dataframes, and saves those dataframes to delta tables in Databricks. I have the schema for the JSON defined ...

Get Started Discussions

Reply

1551 Views
4 replies
1 kudos

02-03-2025 8:12:32 AM

View Replies

Latest Reply

rgower
New Contributor III

02-03-2025 9:10:48 AM

1 kudos

@Alberto_Umana Sounds good, thank you for looking into it and let me know if there's any additional information I can provide in the meantime!

1 kudos

02-03-2025 9:10:48 AM

3 More Replies

by BS_THE_ANALYST • Esteemed Contributor II

01-30-2025 7:20:17 AM

2693 Views
4 replies
9 kudos

Zero to Hero - Databricks

Hi all!In a nutshell, I want to go from zero to hero with Databricks. I'd like to pursue the Databricks Data Engineering pathway, I think that makes sense as I have a background with Alteryx.I'd really like to get hands on whilst learning. Are the le...

Get Started Discussions

Reply

2693 Views
4 replies
9 kudos

01-30-2025 7:20:17 AM

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor II

02-03-2025 6:25:33 AM

9 kudos

@MariuszK thanks for the link to your medium article. There's some great stuff in there!Good point about the 30 day Azure free trial for Databricks.

9 kudos

02-03-2025 6:25:33 AM

3 More Replies

by 611124 • New Contributor II

01-22-2025 7:47:07 AM

997 Views
4 replies
0 kudos

dbt error: Data too long for column at row 1

Hi there!We are experiencing a Databricks error we don’t recognise when we are running one of our event-based dbt models in dbt core (version 1.6.18). The dbt model uses the ‘insert_by_period’ materialisation that is still experimental for version 1....

Get Started Discussions

Reply

997 Views
4 replies
0 kudos

01-22-2025 7:47:07 AM

View Replies

Latest Reply

611124
New Contributor II

02-03-2025 1:42:22 AM

0 kudos

We are yet to upgrade dbt core to the latest version but will check again once we have done so.

0 kudos

02-03-2025 1:42:22 AM

3 More Replies

by Mantsama4 • Valued Contributor

01-30-2025 10:11:09 PM

2179 Views
4 replies
2 kudos

Resolved! Unity Catalog Migration: External AWS S3 Location Tables vs. Managed Tables in Databricks!

Hey Databricks enthusiasts!Migrating to Unity Catalog? Understanding the difference between External S3 Location Tables and Managed Tables is crucial for optimizing governance, security, and cost efficiency.External S3 Location TablesData remains in ...

Get Started Discussions

Reply

2179 Views
4 replies
2 kudos

01-30-2025 10:11:09 PM

View Replies

Latest Reply

Isi
Honored Contributor III

02-02-2025 2:23:58 PM

2 kudos

Hey!I hope I’m not too late, and I’d like to share my opinion. While it’s true that managed services offer certain advantages over external tables, you should keep in mind that Databricks services often come with an associated cost, such as Predictiv...

2 kudos

02-02-2025 2:23:58 PM

3 More Replies

by Lupo123 • New Contributor

02-02-2025 4:23:43 AM

481 Views
1 replies
0 kudos

Terminated cluster on free account

Hi,I mistakenly terminated my cluster. Could you please advise on how I can reactivate the same cluster?

Get Started Discussions

Reply

481 Views
1 replies
0 kudos

02-02-2025 4:23:43 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

02-02-2025 2:00:10 PM

0 kudos

Hi @Lupo123, To reactivate a terminated cluster on a free Databricks account, you will need to create a new cluster. Unfortunately, once a cluster is terminated, it cannot be reactivated

0 kudos

02-02-2025 2:00:10 PM

by trimethylpurine • New Contributor II

06-20-2024 5:51:16 PM

9521 Views
4 replies
2 kudos

Gathering Data Off Of A PDF File

Hello everyone,I am developing an application that accepts pdf files and inserts the data into my database. The company in question that distributes this data to us only offers PDF files, which you can see attached below (I hid personal info for priv...

Get Started Discussions

Reply

9521 Views
4 replies
2 kudos

06-20-2024 5:51:16 PM

View Replies

Latest Reply

Mykola_Melnyk
New Contributor III

02-02-2025 9:33:00 AM

2 kudos

You can use PDF Data Source for read data from pdf files. Examples here: https://stabrise.com/blog/spark-pdf-on-databricks/And after that use Scale DP library for extract data from the text in declarative way using LLM. Here is example of extraction ...

2 kudos

02-02-2025 9:33:00 AM

3 More Replies

by Nishat • New Contributor

01-31-2025 6:03:28 AM

1414 Views
1 replies
0 kudos

Speaker diarization on databricks with Nemo throwing error

The configuration of my compute is 15.4 LTS ML (includes Apache Spark 3.5.0, GPU, Scala 2.12)Standard_NC8as_T4_v3 on Azure Databricks

Get Started Discussions

Reply

1414 Views
1 replies
0 kudos

01-31-2025 6:03:28 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

02-02-2025 4:00:25 AM

0 kudos

Hi @Nishat ,It looks like there's a problem with GPU compability. As mentioned in the error message, FlashAttention only supports Ampere GPUs or newer.According to following thread, GPU architecture you've chosen is not supportedRuntimeError: FlashAt...

0 kudos

02-02-2025 4:00:25 AM

by dk09 • New Contributor

01-31-2025 2:57:01 AM

891 Views
1 replies
0 kudos

DBT RUN Command not working while invoked using subprocess.run

Hi,I am using below code to run DBT Model from notebook.I am using parameters to pass DBT run command(project directory, profile directory, schema name etc). The issue is, when I am running this code in my local workspace it is working fine but when ...

Get Started Discussions

Reply

891 Views
1 replies
0 kudos

01-31-2025 2:57:01 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

02-01-2025 1:17:25 PM

0 kudos

Hi @dk09, Can you share the path of: dbt_project_directory and also try inputting the folder path manually to debug it, does it still fail?

0 kudos

02-01-2025 1:17:25 PM

by subhadeep • New Contributor II

01-21-2025 6:55:50 AM

1140 Views
2 replies
0 kudos

INSERT OVERWRITE DIRECTORY

I am using this query to create a csv in a volume named test_volsrr that i createdINSERT OVERWRITE DIRECTORY '/Volumes/DATAMAX_DATABRICKS/staging/test_volsrr'USING CSVOPTIONS ('delimiter' = ',', 'header' = 'true')SELECT * FROM staging.extract1gbDISTR...

Get Started Discussions

Reply

1140 Views
2 replies
0 kudos

01-21-2025 6:55:50 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

01-31-2025 11:48:33 PM

0 kudos

The DISTRIBUTE BY COALESCE(1) clause is intended to reduce the number of output files to one. However, this can lead to inefficiencies and large file sizes because it forces all data to be processed by a single task, which can cause memory and perfor...

0 kudos

01-31-2025 11:48:33 PM

1 More Replies

by namankhamesara • New Contributor II

03-25-2024 11:34:56 PM

2449 Views
2 replies
0 kudos

Discrepancy in Performance Reading Delta Tables from S3 in PySpark

Hello Databricks Community,I've encountered a puzzling performance difference while reading Delta tables from S3 using PySpark, particularly when applying filters and projections. I'm seeking insights to understand this variation better.I've attempte...

Get Started Discussions

Reply

2449 Views
2 replies
0 kudos

03-25-2024 11:34:56 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

01-31-2025 11:11:41 PM

0 kudos

Use the explain method to analyze the execution plans for both methods and identify any inefficiencies or differences in the plans. You can also review the metrics to understand this further. https://www.databricks.com/discover/pages/optimize-data-wo...

0 kudos

01-31-2025 11:11:41 PM

1 More Replies

by DONGHEE • New Contributor

01-20-2025 9:56:49 PM

1705 Views
1 replies
0 kudos

Error changing connection information of Databricks data source posted on Tableau server

HelloThere is a Databricks data source published on the Tableau server.When I click the 'Edit Data Source' button in the location where the data source is published and go to the Data Source tab, and change the Databricks connection information (HTTP...

Get Started Discussions

Databricks

tableau

tableau server

Reply

1705 Views
1 replies
0 kudos

01-20-2025 9:56:49 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

01-31-2025 10:58:33 PM

0 kudos

1) I am thinking if there are saved auth, which could cause the issue. 2) If possible, try using different authentication methods (e.g., Personal Access Token) to see if the issue persists. This can help identify if the problem is specific to the aut...

0 kudos

01-31-2025 10:58:33 PM

by Lumoura • New Contributor

01-28-2025 1:43:51 PM

1335 Views
2 replies
1 kudos

How to download the results in batches

Hello, how are you?I`m trying to download some of my results on databricks and the sheets is around 300mb, unfortunately my google sheets is not open files that has more then 100mb. Is that any chance that i could download the results in batches to ...

Get Started Discussions

Reply

1335 Views
2 replies
1 kudos

01-28-2025 1:43:51 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

01-31-2025 10:24:23 PM

1 kudos

Hey, Thinking of more alternates to repartition: 1- Use the limit and offset options in your SQL queries to export data in manageable chunks. For example, if you have a table with 100,000 rows and you want to export 10,000 rows at a time, you can us...

1 kudos

01-31-2025 10:24:23 PM

1 More Replies

Databricks Community

Forum Posts

EMR cluster pyspark scripts to databricks

Airflow jobs migration to Databricks Workflows

Snowflake to Databricks migration

Different JSON Results when Running a Job vs Running a Notebook

Zero to Hero - Databricks

dbt error: Data too long for column at row 1

Resolved! Unity Catalog Migration: External AWS S3 Location Tables vs. Managed Tables in Databricks!

Terminated cluster on free account

Gathering Data Off Of A PDF File

Speaker diarization on databricks with Nemo throwing error

DBT RUN Command not working while invoked using subprocess.run

INSERT OVERWRITE DIRECTORY

Discrepancy in Performance Reading Delta Tables from S3 in PySpark

Error changing connection information of Databricks data source posted on Tableau server

How to download the results in batches

Join Us as a Local Community Builder!

Databricks partner Tech Summit FY26 access

Using merge Schema with spark.read.csv for inconsi...

Problem with ray train and Databricks Notebook (St...

Addressing Memory Constraints in Scaling XGBoost a...

Need help understanding Databricks