cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Kurtis_R
by New Contributor II
  • 853 Views
  • 2 replies
  • 0 kudos

Excel Formula results

Hi all,Just wanted to raise a question regarding Databricks workbooks and viewing the results in the cells. For the example provided in the screenshot I want to view the results of an excel formula that has been applied to a cell in our workbooks. Fo...

Kurtis_R_0-1725568966348.png Kurtis_R_1-1725569630650.png
  • 853 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16756723392
Databricks Employee
  • 0 kudos

@Kurtis_R do you want to display the value of 45 or formula of how 45 is achieved.?

  • 0 kudos
1 More Replies
Sharmila_12
by New Contributor
  • 608 Views
  • 1 replies
  • 0 kudos

I don't have any Last name. what should give in the mandatory last name field?

Hi, I was about to register for Databricks certified Data engineer Associate exam. while registering for the exam, it is asking for Last name which is the mandatory field. But none of my Government proofs have last name, only first name is there. wha...

  • 608 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anushree_Tatode
Honored Contributor II
  • 0 kudos

Hi,To proceed with the registration, please enter a space or a full stop in the last name field. This should allow you to continue with the process, feel free to reach out if you need any further assistance.Best Regards,Anushree

  • 0 kudos
Enrique1987
by New Contributor III
  • 6154 Views
  • 1 replies
  • 0 kudos

Photon Benchmark

I'm conducting my own comparative study between a cluster with Photon enabled and a cluster without Photon to see what improvements occur. According to Databricks, there should be up to 12x better performance, but I'm only finding about a 20% improve...

  • 6154 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Enrique1987 ,You can find more information about photon in below whitepaper:https://people.eecs.berkeley.edu/~matei/papers/2022/sigmod_photon.pdf

  • 0 kudos
AmineHY
by Contributor
  • 13169 Views
  • 5 replies
  • 6 kudos

Resolved! How to read JSON files embedded in a list of lists?

HelloI am trying to read this JSON file but didn't succeed  You can see the head of the file, JSON inside a list of lists. Any idea how to read this file?

image image image
  • 13169 Views
  • 5 replies
  • 6 kudos
Latest Reply
adriennn
Valued Contributor
  • 6 kudos

The correct way to do this without using open, which will work only with local/mounted files is to read the files as binaryfile and then you will get the entire json string on each row, from there you can use from_json() and explode() to extract the ...

  • 6 kudos
4 More Replies
guangyi
by Contributor III
  • 3159 Views
  • 4 replies
  • 0 kudos

Resolved! Unable to call UDF inside the Spark SQL: RuntimeError: SparkSession should be create

Here is how I define the UDF inside the file udf_define.py:from pyspark.sql.functions import length, udf from pyspark.sql.types import IntegerType from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() def strlen(s): ret...

  • 3159 Views
  • 4 replies
  • 0 kudos
Latest Reply
guangyi
Contributor III
  • 0 kudos

And I tried getActiveSession() it is not working 

  • 0 kudos
3 More Replies
Personal1
by New Contributor II
  • 4168 Views
  • 3 replies
  • 2 kudos

Problems with Azure Databricks

Hi,I want to use Databricks the first time, and am having many problems and confusions. Please help me resolve them.1. I created a free Databricks Community account on Azure and get error when creating the cluster/compute"Azure Quota Exceeded Excepti...

  • 4168 Views
  • 3 replies
  • 2 kudos
Latest Reply
ThierryBa
New Contributor III
  • 2 kudos

you must have created some resources with public IP addresses in your azure subscription, ie: storage account, etc...Try to avoid using public IPs as much as possible to secure your tenant/subscription.try to find which of your Azure resources are us...

  • 2 kudos
2 More Replies
mac08_flo
by New Contributor
  • 1297 Views
  • 1 replies
  • 1 kudos

Creation of logs in a file

Good afternoon.I am trying to add logs in the creation of my code. The issue is that I haven't yet found a way to write the logs to a separate file, rather than having them output to the terminal; I want them to be stored in a file (example.log).I ha...

  • 1297 Views
  • 1 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @mac08_flo ,Use logging library. You can configure to log to terminal, to files etc.https://www.highlight.io/blog/5-best-python-logging-libraries

  • 1 kudos
ak4
by New Contributor II
  • 1971 Views
  • 2 replies
  • 0 kudos

Failed to read job commit marker error

Recently, we migrate from DBR 11.3 LTS ML to DBR 14.3 LTS ML. We are struggling on one data source where we consume parquet files. New data are appended every 30 minutes to that data source. The data are generated by Databricks notebook which runs on...

  • 1971 Views
  • 2 replies
  • 0 kudos
Latest Reply
ak4
New Contributor II
  • 0 kudos

Thanks @menotron from your reply!Interestingly, we have been using REFRESH TABLE command even before this issue and it worked well so far. However, now with new runtime, it doesn't work anymore. I should specify the code which we use. It actually fai...

  • 0 kudos
1 More Replies
standup1
by Contributor
  • 1956 Views
  • 7 replies
  • 3 kudos

Delt Live Table Path/Directory help

Hello, I am working on a dlt pipeline and I've been facing an issue. I hope someone here can help me find a solution.My files are json in azure storage. These files are stored in dircctory like this ( blobName/FolderName/xx.csv).The folder name is li...

  • 1956 Views
  • 7 replies
  • 3 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 3 kudos

Hi @standup1 , I'm glad the example was helpful

  • 3 kudos
6 More Replies
JR61276126
by New Contributor II
  • 1408 Views
  • 5 replies
  • 1 kudos

Data Engineering with Databricks 3.1.12 - Unable to run Classroom-Setup-01.2

Receiving the following error when attempting to run the classroom setup for lesson 1.2 of the Data Engineering with Databricks 3.1.12. This has been tested with multiple accounts, both admins and non-admins.Below is the error message I am receiving....

  • 1408 Views
  • 5 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @JR61276126 ,Since your workspace is deployed in azure with vent injection I assume it might be a network/firewall related issue. Could you check your driver logs also?

  • 1 kudos
4 More Replies
ADB0513
by New Contributor III
  • 3827 Views
  • 1 replies
  • 1 kudos

Databricks Asset Bundle "Credential was not sent or was of an unsupported type"

I am working on setting up an asset bundle and it is failing when I try to validate the bundle.  I am getting an error saying "Credential was not sent or was of an unsupported type for this API."I have a profile created and am using an access token t...

  • 3827 Views
  • 1 replies
  • 1 kudos
Latest Reply
mvmiller
New Contributor III
  • 1 kudos

I am having a similar issue, when trying to deploy my asset bundle.I ran the following:databricks auth login --host <hostname>I then was authenticated just fine, without issue. I then pointed to the relevant directory containing the asset bundle and ...

  • 1 kudos
Mathias_Peters
by Contributor II
  • 2000 Views
  • 2 replies
  • 0 kudos

How to properly implement incremental batching from Kinesis Data Streams

Hi, I implemented a job that should incrementally read all the available data from a Kinesis Data Stream and terminate afterwards. I schedule the job daily. The data retention period of the data stream is 7 days, i.e., there should be enough time to ...

  • 2000 Views
  • 2 replies
  • 0 kudos
Latest Reply
fixhour
New Contributor II
  • 0 kudos

It seems like the issue might be caused by potential data loss in the Kinesis stream. Even though you're using checkpoints and specifying the "earliest" position, data can expire due to the 7-day retention period, especially if there's a delay in job...

  • 0 kudos
1 More Replies
MGeiss
by New Contributor III
  • 3111 Views
  • 3 replies
  • 1 kudos

Resolved! Suddenly Getting Timeout Errors Across All Environments while waiting for Python REPL to start.

Hey - we currently have 4 environments spread out across separate workspaces, and as of Monday we've began to have transient failures in our DLT pipeline runs with the following error:"java.util.concurrent.TimeoutException: Timed out after 60 seconds...

  • 3111 Views
  • 3 replies
  • 1 kudos
Latest Reply
MGeiss
New Contributor III
  • 1 kudos

For anyone else who may be experiencing this issue - it seems to have been related to serverless compute for notebooks/workflows, which we had enabled for the account, but WERE NOT using for our DLT pipelines. After noticing references to serverless ...

  • 1 kudos
2 More Replies
varshini_reddy
by New Contributor III
  • 3429 Views
  • 14 replies
  • 2 kudos
  • 3429 Views
  • 14 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @varshini_reddy ,There is no option to stop all the other iterations when for each is running and one of the iterations failed.This is why the shared workaround, that will simply skip/fail all the next iterations without doing anything.You can fai...

  • 2 kudos
13 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels