cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

parzival
by New Contributor III
  • 5103 Views
  • 10 replies
  • 2 kudos

Unable to login to Community Edition

Facing the below issueWe were not able to find a Community Edition workspace with this email. Please login to accounts.cloud.databricks.com to find the non-community-edition workspaces you may have access to. For help, please see Community Edition Lo...

parzival_0-1738477941646.png
  • 5103 Views
  • 10 replies
  • 2 kudos
Latest Reply
pakkufab1998
New Contributor III
  • 2 kudos

Hi All,Now neither I can sign up from my account not login. o response from them so good luck to everyone out there who's trying to learn this tool.

  • 2 kudos
9 More Replies
Daan
by New Contributor III
  • 1285 Views
  • 1 replies
  • 0 kudos

Data difference between SQL warehouse and all-purpose compute

Hey everyone,Executing the following query on my sql warehouse does not return any data:select * from acc_bolt.eod.configurationhistory where netarea = '541454827900000139';However running the same query using an all-purpose compute does return the e...

  • 1285 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @Daan , Greetings! Can you please confirm which SQL warehouse you are using here? If you are using a serverless, then can you try to run the query with PRO/Classic warehouse?  Kind Regards, Ayushi

  • 0 kudos
anandreddy23
by New Contributor III
  • 10502 Views
  • 4 replies
  • 0 kudos

Pyspark cast error

Hi All,hive> create table UK ( a decimal(10,2)) ;hive> create table IN ( a decimal(10,5)) ;hive> create view T as select a from UK union all select a from IN ;above all statements executes successfully in Hive and return results when select statement...

  • 10502 Views
  • 4 replies
  • 0 kudos
Latest Reply
anandreddy23
New Contributor III
  • 0 kudos

Hi Nandini,Thanks for sharing the above solution. To be sure my understanding is correct, could you confirm below please ?hive> create table test.UK ( a decimal(10,2)) ;hive> create table test.IN ( a decimal(10,5)) ;hive> create view test.T as select...

  • 0 kudos
3 More Replies
Phani1
by Databricks MVP
  • 731 Views
  • 1 replies
  • 0 kudos

EMR cluster pyspark scripts to databricks

Hi All,The PySpark scripts currently operating on the EMR cluster need to be migrated to Databricks. Are there any tools available that can assist in minimizing the time required for code conversion? Your suggestions would be appreciated.Regards,Phan...

  • 731 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @Phani1, This guide can help you: https://www.databricks.com/resources/guide/emr-databricks-migration-guide

  • 0 kudos
Phani1
by Databricks MVP
  • 1063 Views
  • 1 replies
  • 0 kudos

Airflow jobs migration to Databricks Workflows

Hi All,We need to move our Airflow jobs over to Databricks Workflows. Are there any tools out there that can help with this migration and make the process quicker? If you have any sample code or documents that could assist, I would really appreciate ...

  • 1063 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Phani1, Please see this post which can help you: https://community.databricks.com/t5/data-engineering/migrating-logic-from-airflow-dags-to-databricks-workflow/td-p/104501

  • 0 kudos
rgower
by New Contributor III
  • 2361 Views
  • 4 replies
  • 1 kudos

Different JSON Results when Running a Job vs Running a Notebook

I have a regularly scheduled job that runs a PySpark Notebook that GETs semi-structured JSON data from an external API, loads that data into dataframes, and saves those dataframes to delta tables in Databricks. I have the schema for the JSON defined ...

  • 2361 Views
  • 4 replies
  • 1 kudos
Latest Reply
rgower
New Contributor III
  • 1 kudos

@Alberto_Umana Sounds good, thank you for looking into it and let me know if there's any additional information I can provide in the meantime!

  • 1 kudos
3 More Replies
BS_THE_ANALYST
by Databricks Partner
  • 6134 Views
  • 4 replies
  • 9 kudos

Zero to Hero - Databricks

Hi all!In a nutshell, I want to go from zero to hero with Databricks. I'd like to pursue the Databricks Data Engineering pathway, I think that makes sense as I have a background with Alteryx.I'd really like to get hands on whilst learning. Are the le...

  • 6134 Views
  • 4 replies
  • 9 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 9 kudos

@MariuszK thanks for the link to your medium article. There's some great stuff in there!Good point about the 30 day Azure free trial for Databricks. 

  • 9 kudos
3 More Replies
611124
by New Contributor II
  • 1696 Views
  • 4 replies
  • 0 kudos

dbt error: Data too long for column at row 1

Hi there!We are experiencing a Databricks error we don’t recognise when we are running one of our event-based dbt models in dbt core (version 1.6.18). The dbt model uses the ‘insert_by_period’ materialisation that is still experimental for version 1....

  • 1696 Views
  • 4 replies
  • 0 kudos
Latest Reply
611124
New Contributor II
  • 0 kudos

We are yet to upgrade dbt core to the latest version but will check again once we have done so.

  • 0 kudos
3 More Replies
Mantsama4
by Databricks Partner
  • 3732 Views
  • 4 replies
  • 2 kudos

Resolved! Unity Catalog Migration: External AWS S3 Location Tables vs. Managed Tables in Databricks!

Hey Databricks enthusiasts!Migrating to Unity Catalog? Understanding the difference between External S3 Location Tables and Managed Tables is crucial for optimizing governance, security, and cost efficiency.External S3 Location TablesData remains in ...

  • 3732 Views
  • 4 replies
  • 2 kudos
Latest Reply
Isi
Honored Contributor III
  • 2 kudos

Hey!I hope I’m not too late, and I’d like to share my opinion. While it’s true that managed services offer certain advantages over external tables, you should keep in mind that Databricks services often come with an associated cost, such as Predictiv...

  • 2 kudos
3 More Replies
Lupo123
by New Contributor
  • 1102 Views
  • 1 replies
  • 0 kudos

Terminated cluster on free account

Hi,I mistakenly terminated my cluster. Could you please advise on how I can reactivate the same cluster?

  • 1102 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Lupo123, To reactivate a terminated cluster on a free Databricks account, you will need to create a new cluster. Unfortunately, once a cluster is terminated, it cannot be reactivated

  • 0 kudos
trimethylpurine
by New Contributor II
  • 12085 Views
  • 4 replies
  • 2 kudos

Gathering Data Off Of A PDF File

Hello everyone,I am developing an application that accepts pdf files and inserts the data into my database. The company in question that distributes this data to us only offers PDF files, which you can see attached below (I hid personal info for priv...

  • 12085 Views
  • 4 replies
  • 2 kudos
Latest Reply
Mykola_Melnyk
New Contributor III
  • 2 kudos

You can use PDF Data Source for read data from pdf files. Examples here: https://stabrise.com/blog/spark-pdf-on-databricks/And after that use Scale DP library for extract data from the text in declarative way using LLM. Here is example of extraction ...

  • 2 kudos
3 More Replies
Nishat
by Databricks Partner
  • 2087 Views
  • 1 replies
  • 0 kudos

Speaker diarization on databricks with Nemo throwing error

 The configuration of my compute is 15.4 LTS ML (includes Apache Spark 3.5.0, GPU, Scala 2.12)Standard_NC8as_T4_v3 on Azure Databricks 

Nishat_0-1738331572163.png
  • 2087 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Nishat ,It looks like there's a problem with GPU compability. As mentioned in the error message, FlashAttention only supports Ampere GPUs or newer.According to following thread, GPU architecture you've chosen is not supportedRuntimeError: FlashAt...

  • 0 kudos
dk09
by New Contributor
  • 1480 Views
  • 1 replies
  • 0 kudos

DBT RUN Command not working while invoked using subprocess.run

Hi,I am using below code to run DBT Model from notebook.I am using parameters to pass DBT run command(project directory, profile directory, schema name etc). The issue is, when I am running this code in my local workspace it is working fine but when ...

dk09_0-1738320601721.png
  • 1480 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @dk09, Can you share the path of: dbt_project_directory and also try inputting the folder path manually to debug it, does it still fail?

  • 0 kudos
subhadeep
by Databricks Partner
  • 2171 Views
  • 2 replies
  • 0 kudos

INSERT OVERWRITE DIRECTORY

I am using this query to create a csv in a volume named test_volsrr that i createdINSERT OVERWRITE DIRECTORY '/Volumes/DATAMAX_DATABRICKS/staging/test_volsrr'USING CSVOPTIONS ('delimiter' = ',', 'header' = 'true')SELECT * FROM staging.extract1gbDISTR...

  • 2171 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

The DISTRIBUTE BY COALESCE(1) clause is intended to reduce the number of output files to one. However, this can lead to inefficiencies and large file sizes because it forces all data to be processed by a single task, which can cause memory and perfor...

  • 0 kudos
1 More Replies
namankhamesara
by New Contributor II
  • 3178 Views
  • 2 replies
  • 0 kudos

Discrepancy in Performance Reading Delta Tables from S3 in PySpark

Hello Databricks Community,I've encountered a puzzling performance difference while reading Delta tables from S3 using PySpark, particularly when applying filters and projections. I'm seeking insights to understand this variation better.I've attempte...

  • 3178 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Use the explain method to analyze the execution plans for both methods and identify any inefficiencies or differences in the plans. You can also review the metrics to understand this further. https://www.databricks.com/discover/pages/optimize-data-wo...

  • 0 kudos
1 More Replies
Labels