cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lozik
by New Contributor II
  • 826 Views
  • 2 replies
  • 0 kudos

Python callback functions fail to trigger

How can I get sys.exceptionhook and atexit module to trigger a callback function on exit of a python notebook? These fail to work when an unhandled exception is encountered (exceptionhook), or the program exits (atexit). 

  • 826 Views
  • 2 replies
  • 0 kudos
Latest Reply
Pieter
New Contributor II
  • 0 kudos

Hey Lozik,Ran into this myself as well. The reason this doesn't work is because Databricks is using Ipython under the hood.The following codesnippet creates an exception hook for all exceptions (using the general Exception), it's also possible to spe...

  • 0 kudos
1 More Replies
mjedy78
by New Contributor II
  • 171 Views
  • 1 replies
  • 0 kudos

Databricks read CDF by partitions for better performance?

I’m working with a large dataframe in Databricks, processing it in a streaming-batch fashion (I’m reading as a stream, but using .trigger(availableNow=True) for batch-like processing).I’m fetching around 40GB of CDF updates daily and performing some ...

  • 171 Views
  • 1 replies
  • 0 kudos
Latest Reply
cherry54wilder
New Contributor II
  • 0 kudos

You can indeed leverage your partitioned column to read and process Change Data Feed (CDF) changes in partitions. This approach can help you manage the processing load and improve performance. Here's a general outline of how you can achieve this:1. *...

  • 0 kudos
pra18
by New Contributor II
  • 233 Views
  • 2 replies
  • 0 kudos

Handling Binary Files Larger than 2GB in Apache Spark

I'm trying to process large binary files (>2GB) in Apache Spark, but I'm running into the following error:File format is : .mf4 (Measurement Data Format) org.apache.spark.SparkException: The length of ... is 14749763360, which exceeds the max length ...

  • 233 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @pra18, You can split and load the binary files using split command like this. ret = os.system("split -b 4020000 -a 4 -d large_data.dat large_data.dat_split_")

  • 0 kudos
1 More Replies
kivaniutenko
by New Contributor
  • 102 Views
  • 0 replies
  • 0 kudos

HTML Formatting Issue in Databricks Alerts

Hello everyone,I have recently encountered an issue with HTML formatting in custom templates for Databricks Alerts. Previously, the formatting worked correctly, but now the alerts display raw HTML instead of properly rendered content.For example, an ...

  • 102 Views
  • 0 replies
  • 0 kudos
skd217
by New Contributor
  • 341 Views
  • 3 replies
  • 0 kudos

Is there any way to connect polaris catalog from unity catalog?

Hi databricks community, I'd like to access data managed by polaris catalog through unity catalog to manage all data one place. But is there any way to connect? (I could access the data with all-purpose cluster without unity catalog.)

  • 341 Views
  • 3 replies
  • 0 kudos
Latest Reply
chandu402240
New Contributor II
  • 0 kudos

Can you provide info on how you access polaris catalog data from databricks cluster (without UC) ? any blog ?

  • 0 kudos
2 More Replies
shan-databricks
by New Contributor II
  • 237 Views
  • 2 replies
  • 0 kudos

Databricks Workflow Orchestration

I have 50 tables and will increase gradually, so I want to create a single workflow to orchestrate the job and run it table-wise. Is there an option to do this in Databricks workflow?

  • 237 Views
  • 2 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

Breakup these 50 tables logically or functionally and place them in their own workflows. A good strategy would be to group tables that are dependent in the same workflow. Then use a master workflow to trigger each child workflow. So it will be like a...

  • 0 kudos
1 More Replies
subhas_1729
by New Contributor II
  • 688 Views
  • 1 replies
  • 0 kudos

Dashboard

Hi       I want to design a dashboard that will show some variables of Spark-UI. Is it possible to access Spark-UI variables from my spark program. 

  • 688 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @subhas_1729, You can achieve this by leveraging Spark's monitoring and instrumentation APIs. Spark provides metrics that can be accessed through the SparkListener interface as well as the REST API. The SparkListener interface allows you to receiv...

  • 0 kudos
miki1999
by New Contributor
  • 241 Views
  • 1 replies
  • 0 kudos

Problem connecting vscode with databricks

I have a problem connecting VSCode with databricks. I am doing all the steps to add databricks in VSCode but after the last step I get this error in VSCode:“Error connecting to the workspace: "Can't set configuration 'authProfile' without selecting a...

  • 241 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @miki1999, Try removing any comments (lines starting with ‘;’) from the configuration file. • Open VSCode and go to the Extensions view (Ctrl+Shift+X). • Search for “Databricks” and ensure you have the latest version installed.

  • 0 kudos
William_Scardua
by Valued Contributor
  • 9801 Views
  • 3 replies
  • 1 kudos

How to read data from Azure Log Analitycs ?

Hi guys,I need to read data from Azure Log Analitycs Workspace directaly, have any idea ?thank you

  • 9801 Views
  • 3 replies
  • 1 kudos
Latest Reply
alexott
Databricks Employee
  • 1 kudos

You can use Kusto Spark connector for that: https://github.com/Azure/azure-kusto-spark/blob/master/docs/KustoSource.md#source-read-command It heavily depends on how you access data, there could be a need for using ADX cluster for it: https://learn.mi...

  • 1 kudos
2 More Replies
dbhavesh
by New Contributor II
  • 280 Views
  • 3 replies
  • 1 kudos

How to Apply row_num in DLT

Hi all,how to use row_num in DLT or What is the alternative for row_num function in DLT.We are looking for same functionality which row num is doing. Thanks in advance.

  • 280 Views
  • 3 replies
  • 1 kudos
Latest Reply
Takuya-Omi
Valued Contributor II
  • 1 kudos

@dbhavesh I apologize for the lack of explanation.The ROW_NUMBER function requires ordering over the entire dataset, making it a non-time-based window function. When applied to streaming data, it results in the "NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREA...

  • 1 kudos
2 More Replies
sachin_kanchan
by New Contributor III
  • 581 Views
  • 6 replies
  • 0 kudos

Unable to log in into Community Edition

So I just registered for the Databricks Community Edition. And received an email for verification.When I click the link, I'm redirected to this website (image attached) where I am asked to input email. And when I do that, it sends me a verification c...

db_fail.png
  • 581 Views
  • 6 replies
  • 0 kudos
Latest Reply
sachin_kanchan
New Contributor III
  • 0 kudos

What a disappointment this has been

  • 0 kudos
5 More Replies
prasidataengine
by New Contributor II
  • 348 Views
  • 2 replies
  • 0 kudos

Issue when connecting with Databricks cluster 15.4 without unity catalog using databricks connect

Hi,I have a shared cluster created on databricks which uses 15.4 runtime.I dont want to enable the unity catalog for this cluster.Previously I used python 3.9.13 version to connect to 11.3 cluster using databricks connect 11.3Now my company has restr...

Data Engineering
Databricks
databricks-connect
  • 348 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @prasidataengine, For DBR runtime 13.3 LTS and above you must have Unity Catalog enabled to be able to use databricks-connect. A Databricks account and workspace that have Unity Catalog enabled. See Set up and manage Unity Catalog and Enable a wo...

  • 0 kudos
1 More Replies
vidya_kothavale
by New Contributor III
  • 210 Views
  • 2 replies
  • 0 kudos

MongoDB Streaming Not Receiving Records in Databricks

Batch Read (spark.read.format("mongodb")) works fine.Streaming Read (spark.readStream.format("mongodb")) runs but receives no records.Batch Read (Works):df = spark.read.format("mongodb")\.option("database", database)\.option("spark.mongodb.read.conne...

  • 210 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @vidya_kothavale, MongoDB requires the use of change streams to enable streaming. Change streams allow applications to access real-time data changes without polling the database. Ensure that your MongoDB instance is configured to support change...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels