cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Arihant
by New Contributor
  • 6805 Views
  • 1 replies
  • 0 kudos

Unable to login to Databricks Community Edition

Hello All,I have successfully created a databricks account and went to login to the community edition with the exact same login credentials as my account, but it tells me that the email/password are invalid. I can login with these same exact credenti...

  • 6805 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Arihant ,  Please look at this link related to the Community - Edition, which might solve your problem.   I appreciate your interest in sharing your Community-Edition query with us. However, at this time, we are not entertaining any Community-Edi...

  • 0 kudos
niladri
by New Contributor
  • 720 Views
  • 1 replies
  • 0 kudos

how to connect to aws elasticsearch of another account from databricks

Hi -  I have tried my level best to go through both elasticsearch documentation as well as Databricks documentation to get an answer for my question - is it possible to connect to AWS elasticsearch of a different AWS account from Databricks? I did no...

  • 720 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @niladri , It's possible to connect to AWS Elasticsearch of a different AWS account from Databricks.- The error is related to permissions, indicating the user or role lacks necessary access permissions.- To solve this, use AWS SDK boto3 to assume ...

  • 0 kudos
DanBrown
by New Contributor
  • 901 Views
  • 1 replies
  • 0 kudos

Remove WHERE 1=0

I am hoping someone can help me remove the WHERE 1=0 that is constantly getting added onto the end of my Query (see below).  Please let me know if I can provide more info here.This is running a notebook, in Azure Databricks against a cluster that has...

  • 901 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @DanBrown , The WHERE 1=0 clause is being added to your query by the Spark SQL engine during the query planning phase. This is a common optimization technique used to create an empty DataFrame with the same schema as the original data source. It'...

  • 0 kudos
RiyuLite
by New Contributor III
  • 1225 Views
  • 1 replies
  • 0 kudos

How to retrieve cluster IDs of a deleted All Purpose cluster ?

I need to retrieve the event logs of deleted All Purpose clusters of a certain workspace.databricks list API ({workspace_url}/api/2.0/clusters/list) provides me with the list of all active/terminated clusters but not the clusters that are deleted. I ...

  • 1225 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @RiyuLite, To retrieve the event logs of deleted All Purpose clusters without using the root account details, you can use Databricks audit logs. These logs record the activities in your workspace, allowing you to monitor detailed Databricks usage ...

  • 0 kudos
Divyanshu
by New Contributor
  • 1818 Views
  • 1 replies
  • 0 kudos

java.lang.ArithmeticException: long overflow Exception while writing to table | pyspark

Hey ,I am trying to fetch data from mongo and write to databricks table.I have read data from mongo using pymongo library, then flattened nested struct objects along with renaming columns(since there were few duplicates) and then writing to databrick...

  • 1818 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Divyanshu ,  The error message "org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 12.0 failed 4 times, most recent failure: Lost task 2.3 in stage 12.0 (TID 53) (192.168.23.122 executor 0): org.apache.spark.SparkR...

  • 0 kudos
Alex006
by Contributor
  • 705 Views
  • 1 replies
  • 1 kudos

Resolved! Does DLT use one single SparkSession?

Hi! Does DLT use one single SparkSession for all notebooks in a Delta Live Tables Pipeline?

Data Engineering
Delta Live Tables
dlt
SparkSession
  • 705 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Alex006 , No, a Delta Live Tables (DLT) pipeline does not use a single SparkSession for all notebooks. DLT evaluates and runs all code defined in notebooks but has a different execution model than a notebook 'Run all' command. You cannot rely on ...

  • 1 kudos
Gilg
by Contributor II
  • 658 Views
  • 1 replies
  • 0 kudos

Add data manually to DLT

Hi Team,Is there a way that we can add data manually to the tables that are generated by DLT?We have done a PoC using DLT for Sep 15 to current data. Now, that they are happy, they wanted the previous data from Synapse and put into Databricks.I can e...

  • 658 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Gilg, Yes, you can add data manually to the tables generated by DLT (Delta Live Tables). However, it would be best to be careful not to directly modify, add, or delete Parquet data files in a Delta table, as this can lead to lost data or table c...

  • 0 kudos
mike_engineer
by New Contributor
  • 719 Views
  • 1 replies
  • 1 kudos

Window functions in Change Data Feed

Hello!I am currently exploring the possibility of implementing incremental changes in our company's ETL pipeline and looking into Change Data Feed option. There are a couple of challenges I'm uncertain about.For instance, we have a piece of logic lik...

  • 719 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @mike_engineer , - Use the Change Data Feed feature in Databricks to track row-level changes in a Delta table.- Change Data Feed records change events for all data written into the table, including row data and metadata. - Use case scenarios:  1. ...

  • 1 kudos
Gilg
by Contributor II
  • 1600 Views
  • 1 replies
  • 0 kudos

DLT: Waiting for resources took a long time

Hi Team,I have a DLT pipeline running in Production for quite some time now. When I check the pipeline, a couple of jobs took longer than expected. Usually, 1 job only took 10-15 minutes to complete with 2 to 3 mins to provision a resource. Then I ha...

Gilg_0-1696540251644.png
  • 1600 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Gilg, The issue you're experiencing with your DLT pipeline could be due to a couple of factors: 1. **Development Optimizations**: As per the Databricks release notes from September 7-13, 2021, new pipelines run in development mode by default. Thi...

  • 0 kudos
AB_MN
by New Contributor III
  • 4308 Views
  • 4 replies
  • 1 kudos

Resolved! Read data from Azure SQL DB

I am trying to read data into a dataframe from Azure SQL DB, using jdbc. Here is the code I am using.driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"   database_host = "server.database.windows.net" database_port = "1433" database_name = "dat...

  • 4308 Views
  • 4 replies
  • 1 kudos
Latest Reply
AB_MN
New Contributor III
  • 1 kudos

That did the trick. Thank you!

  • 1 kudos
3 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 881 Views
  • 1 replies
  • 1 kudos

Foreign catalogs

With the introduction of the Unity Catalog in databricks, many of us have become familiar with creating catalogs. However, did you know that the Unity Catalog also allows you to create foreign catalogs? You can register databases from the following s...

db.png
  • 881 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Thank you for sharing @Hubert-Dudek !!!

  • 1 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 823 Views
  • 1 replies
  • 3 kudos

row-level concurrency

With the introduction of Databricks Runtime 14, you can now enable row-level concurrency using these simple techniques!

row-level.png
  • 823 Views
  • 1 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Thank you for sharing this @Hubert-Dudek 

  • 3 kudos
Shenstone
by New Contributor
  • 660 Views
  • 1 replies
  • 0 kudos

Debugging options if you are using streaming, RDDs and SparkContext?

Hi all,I've been trying to make use of some of the more recent tools for debugging in Databricks: pdb in the Databricks web interface with the variable explorer described in this article.I've also been trying to debug locally using the VSCode extensi...

  • 660 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Shenstone ,  - Limitations exist with pdb and VSCode extension with Databricks Connect. - **Databricks Connect**: - Doesn't support RDDs or SparkContext object. - Use DatabricksSession object for debugging. - Initialize DatabricksSession class wi...

  • 0 kudos
Lucifer
by New Contributor
  • 559 Views
  • 1 replies
  • 0 kudos

How to get job launch type in notebook

I want to get job launched status in notebook if it is launched by scheduler or manuallyI tried using JobTriggerType property of notebook context but it gives only manual and repair but not by scheduleddbutils.notebook.entry_point.getDbutils().notebo...

  • 559 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Lucifer , Please reach out to Databricks support for more information on this topic.

  • 0 kudos
EDDatabricks
by Contributor
  • 760 Views
  • 1 replies
  • 2 kudos

Multiple DLT pipelines same target table

Is it possible to have multiple DLT pipelines write data concurrently and in append mode to the same Delta table? Because of different data sources, with different data volumes and required processing, we would like to have different pipelines stream...

Data Engineering
Delta tables
DLT pipeline
  • 760 Views
  • 1 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @EDDatabricks,  Multiple DLT pipelines can write data concurrently and in append mode to the same Delta table.- Setting "pipelines.tableManagedByMultiplePipelinesCheck.enabled" to "false" allows multiple pipelines to write to the same table.- Howe...

  • 2 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels