cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

RiyuLite
by New Contributor III
  • 1492 Views
  • 1 replies
  • 0 kudos

How to retrieve cluster IDs of a deleted All Purpose cluster ?

I need to retrieve the event logs of deleted All Purpose clusters of a certain workspace.databricks list API ({workspace_url}/api/2.0/clusters/list) provides me with the list of all active/terminated clusters but not the clusters that are deleted. I ...

  • 1492 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @RiyuLite, To retrieve the event logs of deleted All Purpose clusters without using the root account details, you can use Databricks audit logs. These logs record the activities in your workspace, allowing you to monitor detailed Databricks usage ...

  • 0 kudos
Divyanshu
by New Contributor
  • 2385 Views
  • 1 replies
  • 0 kudos

java.lang.ArithmeticException: long overflow Exception while writing to table | pyspark

Hey ,I am trying to fetch data from mongo and write to databricks table.I have read data from mongo using pymongo library, then flattened nested struct objects along with renaming columns(since there were few duplicates) and then writing to databrick...

  • 2385 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Divyanshu ,  The error message "org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 12.0 failed 4 times, most recent failure: Lost task 2.3 in stage 12.0 (TID 53) (192.168.23.122 executor 0): org.apache.spark.SparkR...

  • 0 kudos
Alex006
by Contributor
  • 851 Views
  • 1 replies
  • 1 kudos

Resolved! Does DLT use one single SparkSession?

Hi! Does DLT use one single SparkSession for all notebooks in a Delta Live Tables Pipeline?

Data Engineering
Delta Live Tables
dlt
SparkSession
  • 851 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Alex006 , No, a Delta Live Tables (DLT) pipeline does not use a single SparkSession for all notebooks. DLT evaluates and runs all code defined in notebooks but has a different execution model than a notebook 'Run all' command. You cannot rely on ...

  • 1 kudos
Gilg
by Contributor II
  • 814 Views
  • 1 replies
  • 0 kudos

Add data manually to DLT

Hi Team,Is there a way that we can add data manually to the tables that are generated by DLT?We have done a PoC using DLT for Sep 15 to current data. Now, that they are happy, they wanted the previous data from Synapse and put into Databricks.I can e...

  • 814 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Gilg, Yes, you can add data manually to the tables generated by DLT (Delta Live Tables). However, it would be best to be careful not to directly modify, add, or delete Parquet data files in a Delta table, as this can lead to lost data or table c...

  • 0 kudos
mike_engineer
by New Contributor
  • 885 Views
  • 1 replies
  • 1 kudos

Window functions in Change Data Feed

Hello!I am currently exploring the possibility of implementing incremental changes in our company's ETL pipeline and looking into Change Data Feed option. There are a couple of challenges I'm uncertain about.For instance, we have a piece of logic lik...

  • 885 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @mike_engineer , - Use the Change Data Feed feature in Databricks to track row-level changes in a Delta table.- Change Data Feed records change events for all data written into the table, including row data and metadata. - Use case scenarios:  1. ...

  • 1 kudos
AB_MN
by New Contributor III
  • 5111 Views
  • 4 replies
  • 1 kudos

Resolved! Read data from Azure SQL DB

I am trying to read data into a dataframe from Azure SQL DB, using jdbc. Here is the code I am using.driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"   database_host = "server.database.windows.net" database_port = "1433" database_name = "dat...

  • 5111 Views
  • 4 replies
  • 1 kudos
Latest Reply
AB_MN
New Contributor III
  • 1 kudos

That did the trick. Thank you!

  • 1 kudos
3 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1053 Views
  • 1 replies
  • 1 kudos

Foreign catalogs

With the introduction of the Unity Catalog in databricks, many of us have become familiar with creating catalogs. However, did you know that the Unity Catalog also allows you to create foreign catalogs? You can register databases from the following s...

db.png
  • 1053 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Thank you for sharing @Hubert-Dudek !!!

  • 1 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 955 Views
  • 1 replies
  • 3 kudos

row-level concurrency

With the introduction of Databricks Runtime 14, you can now enable row-level concurrency using these simple techniques!

row-level.png
  • 955 Views
  • 1 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Thank you for sharing this @Hubert-Dudek 

  • 3 kudos
Shenstone
by New Contributor
  • 784 Views
  • 1 replies
  • 0 kudos

Debugging options if you are using streaming, RDDs and SparkContext?

Hi all,I've been trying to make use of some of the more recent tools for debugging in Databricks: pdb in the Databricks web interface with the variable explorer described in this article.I've also been trying to debug locally using the VSCode extensi...

  • 784 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Shenstone ,  - Limitations exist with pdb and VSCode extension with Databricks Connect. - **Databricks Connect**: - Doesn't support RDDs or SparkContext object. - Use DatabricksSession object for debugging. - Initialize DatabricksSession class wi...

  • 0 kudos
Lucifer
by New Contributor
  • 682 Views
  • 1 replies
  • 0 kudos

How to get job launch type in notebook

I want to get job launched status in notebook if it is launched by scheduler or manuallyI tried using JobTriggerType property of notebook context but it gives only manual and repair but not by scheduleddbutils.notebook.entry_point.getDbutils().notebo...

  • 682 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Lucifer , Please reach out to Databricks support for more information on this topic.

  • 0 kudos
EDDatabricks
by Contributor
  • 1135 Views
  • 1 replies
  • 2 kudos

Multiple DLT pipelines same target table

Is it possible to have multiple DLT pipelines write data concurrently and in append mode to the same Delta table? Because of different data sources, with different data volumes and required processing, we would like to have different pipelines stream...

Data Engineering
Delta tables
DLT pipeline
  • 1135 Views
  • 1 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @EDDatabricks,  Multiple DLT pipelines can write data concurrently and in append mode to the same Delta table.- Setting "pipelines.tableManagedByMultiplePipelinesCheck.enabled" to "false" allows multiple pipelines to write to the same table.- Howe...

  • 2 kudos
lawrence009
by Contributor
  • 1513 Views
  • 1 replies
  • 0 kudos

Updating Table Schema: Renaming and Dropping Columns

Are renaming and dropping columns Databricks proprietary methods?How does it work under the hood, and does enabling the feature render lazy loading ineffective?Ref: Do Delta Lake and Parquet Share Partition Strategy? 

  • 1513 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @lawrence009, Renaming and dropping columns are not Databricks proprietary methods, but Databricks Delta Lake provides an enhanced implementation of these operations using column mapping. This feature allows metadata-only changes to mark columns a...

  • 0 kudos
Jozhua
by New Contributor
  • 1516 Views
  • 1 replies
  • 0 kudos

Spark streaming auto loader wildcard not working

Need som help with an issue loading a subdirectory from S3 bucket using auto-loader. For example:S3://path1/path2/databases*/paths/In databases there are various versions of databases. For examplepath1/path2/database_v1/sub_path/*.parquet  path1/path...

Data Engineering
autoloader
S3
wildcard
  • 1516 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Jozhua, the wildcard character (*) in the path seems to be causing issues. Autoloader might not be able to handle wildcards in the way.  Since Databricks does not support directory lists, one possible workaround could be to load each subdirectory...

  • 0 kudos
Danh_Hoa
by New Contributor
  • 6893 Views
  • 1 replies
  • 0 kudos

Can't "copy into" new data to delta table after truncate old data

Like title, i have a delta table with data of 22/9 and today i wanna remove old data and add new data of 23/9, i used 'truncate' and 'copy into' query but after 'truncate', nothing is added to table, what's happened with my table, file of old data st...

Danh_Hoa_0-1695454173304.png Danh_Hoa_1-1695454194171.png
  • 6893 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Danh_Hoa , The issue you're experiencing might be due to a few reasons: 1. The Delta table you're trying to truncate and copy into might not actually be a Delta table. If the table was not created as a Delta table, you will not be able to perform...

  • 0 kudos
RYBK
by New Contributor III
  • 9546 Views
  • 5 replies
  • 2 kudos

Resolved! External location + Failure to initialize configuration for storage account

Hello,I created a storage credential and an external location. Test is ok, I'm able to browse it from the portal. I have a notebook to create a table :%sqlCREATE OR REPLACE TABLE myschema.mytable(  data1 string, data2 string)USING DELTA LOCATION "abf...

  • 9546 Views
  • 5 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @RYBK, The method you're using to set the fs.azure.* variables in the cluster configuration is a common way to handle Azure data lake Storage Gen2 configurations in Databricks.  However, if you're looking for a more secure and centralized way to m...

  • 2 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels