cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

T__V__K__Hanuma
by New Contributor II
  • 6791 Views
  • 4 replies
  • 0 kudos

I am struggling to optimize my Spark Application Code. Is there someone who can assist me in optimizing it? I am using Spark over Hadoop Yarn.

I will elaborate my problem. I am using a 6-node Spark cluster over Hadoop Yarn out of which one node acts as the master and the other 5 are acting as worker nodes. I am running my Spark application over the cluster. After completion, when I check th...

01_Jobs 02_DAG_and_Metrics 03_Event_Timeline 04_Tasks
  • 6791 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @T. V. K. Hanuman​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us...

  • 0 kudos
3 More Replies
EDDatabricks
by Contributor
  • 1210 Views
  • 2 replies
  • 3 kudos

SQL endpoint increased response times

We have observed that an SQL endpoint has increased response times after a long time being idle. This endpoint is always running and does not terminate. Are there any checks/overheads due to being idle that could impact performance?

  • 1210 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @EDDatabricks EDDatabricks​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, pleas...

  • 3 kudos
1 More Replies
140015
by New Contributor III
  • 1611 Views
  • 2 replies
  • 1 kudos

Pyspark 3.3.0 exceptAll working on 11.3 LTS but not locally

Hello,Currently I'm in process of upgrading the DBR version in my jobs to version 11.3 LTS. After upgrading pyspark version to 3.3.0 on my local machine I found that exceptAll function is broken (it looks like others have similar problem). It throws ...

Local error
  • 1611 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Jacek Dembowiak​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us ...

  • 1 kudos
1 More Replies
Himanshu_90
by New Contributor III
  • 4871 Views
  • 8 replies
  • 7 kudos

Databricks sql not able to evaluate expression current_user

Hi,I have a table as below:create table default.test_user(ID bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY (START WITH 1 INCREMENT BY 1),usr1 varchar(255) NOT NULL,ts1 timestamp NOT NULL,usr2 varchar(255) NOT NULL,ts2 timestamp NOT NULL) USING Del...

  • 4871 Views
  • 8 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Himanshu Agrawal​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us...

  • 7 kudos
7 More Replies
Gaurav_784295
by New Contributor III
  • 1615 Views
  • 2 replies
  • 1 kudos

In delta while query on delta unable to see previous partition where as while reading data using parquet file format it is showing whole partition data column .

In delta while query on delta unable to see previous partition where as while reading data using parquet file format it is showing whole partition data column .Delta Format = spark.read.format("delta").load("") Parquet Format ==> spark.read.parquet("...

While reading through parquet Delta_Table_Screenshot
  • 1615 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Gaurav Rawat​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 1 kudos
1 More Replies
gentresh
by New Contributor III
  • 1045 Views
  • 2 replies
  • 0 kudos

Is it possible to generate Databricks tokens using an Azure Service Principal?

Our organization has setup a databricks service on top of Azure (that is, the Azure-managed service). These are all defined with terraform. Our intention is to use an Azure service principal (with correct permissions) to be able to generate tokens, p...

  • 1045 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Gent Reshtani​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 0 kudos
1 More Replies
193801
by New Contributor
  • 3333 Views
  • 2 replies
  • 0 kudos

Autoloader and json

Hello, I am looking for help with autoloader. I have few questions. My target is to read the files in s3 location and get filename, fileDate, file content in one table and in another table want to convert the file content to json struct and read to 1...

  • 3333 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Neeharika Andavarapu​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...

  • 0 kudos
1 More Replies
Sandy84
by New Contributor II
  • 3742 Views
  • 3 replies
  • 2 kudos

Need help skipping previously executed cells in a failed Databricks job calling a notebook with multiple SQL cells

In Azure databricks, I have a job that calls a notebook which has multiple cells with sql queries. In case of any cell fails and when we restart the databricks job then how to skip previous cell which already ran and start only from the failed cell? ...

  • 3742 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Sandip Rath​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 2 kudos
2 More Replies
Kenny92
by New Contributor III
  • 9135 Views
  • 2 replies
  • 1 kudos

Resolved! How does Auto Loader ingest data?

I have recently completed the Data Engineering with Databricks v3 course on the Partner Academy. Some of the quiz questions have me mixed up.Specifically, I am wondering about this question from the "Build Data Pipelines with Delta Live Tables and Sp...

Which of the following correctly describes how Auto Loader ingests data_ Select one response.
  • 9135 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Kenny Shaevel​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 1 kudos
1 More Replies
Pawelski
by New Contributor
  • 1184 Views
  • 2 replies
  • 1 kudos
  • 1184 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @PaweÅ‚ Tomczyk​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

  • 1 kudos
1 More Replies
Vindhya
by New Contributor II
  • 1701 Views
  • 2 replies
  • 0 kudos

Dataframes to Pandas conversion step is failing with exception ""java.lang.IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))"

Dataframes to Pandas conversion step is failing with exception ""java.lang.IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))", PFB screenshot for more details

sccreenshot
  • 1701 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Vindhya D​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 0 kudos
1 More Replies
h_aloha
by New Contributor III
  • 1720 Views
  • 2 replies
  • 0 kudos

Difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2

Hi,Does anyone know what's the difference of V3 exam for Databricks Certified Data Engineer Associate, comparing with V2?Looks like there is no practice exam for V3?Which version covers more stuff?Thanks,h_aloha

  • 1720 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Helen Morgen​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 0 kudos
1 More Replies
vdp_dlv
by New Contributor III
  • 2040 Views
  • 3 replies
  • 0 kudos

Resolved! when trying to use %run to a notebook, I'm getting an error.

this error is occurring randomly. sometimes it resolves on its own. not sure what is the cause of the error. The notebook I'm sourcing runs flawlessly. I'm only trying to import dates from this notebook

  • 2040 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @viswa p​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ca...

  • 0 kudos
2 More Replies
132736
by New Contributor
  • 1510 Views
  • 2 replies
  • 0 kudos

Can sql result display more than 25 records per page?

Hi! I have a result table with 41 rows. What should I do to make all rows available on the same page?

image
  • 1510 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @wenting_deng wenting_deng​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, pleas...

  • 0 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 2915 Views
  • 2 replies
  • 2 kudos

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

Hi! I'm optimizing several Tb of partitioned data on ZSTD lvl 9.It surprises me the level of shuffle write, it could make sense because of ZORDER but I want to be sure that I'm not missing something, here is some context: Could I be missing something...

image image.png image
  • 2915 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Alejandro Martinez​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels