cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

wojciech_jakubo
by New Contributor III
  • 6029 Views
  • 7 replies
  • 2 kudos

Question about monitoring driver memory utilization

Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...

Driver memory cycles_ Busy cluster
  • 6029 Views
  • 7 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 2 kudos

Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...

  • 2 kudos
6 More Replies
David_K93
by Contributor
  • 2017 Views
  • 1 replies
  • 2 kudos

Resolved! Building a Document Store on Databricks

Hello,I am somewhat new to Databricks and am trying to build a Q&A application based on a collection of documents. I need to move .pdf and .docx files from my local machine to storage in Databricks and eventually a document store. My questions are:Wh...

  • 2017 Views
  • 1 replies
  • 2 kudos
Latest Reply
David_K93
Contributor
  • 2 kudos

Hi all,I took an initial stab at task one with some success using the Databricks CLI. Here are the steps below:Open Command/Anaconda prompt and enter: pip install databricks-cliGo to your Databricks console and under settings find "User Settings" and...

  • 2 kudos
QPeiran
by New Contributor III
  • 1771 Views
  • 3 replies
  • 5 kudos

How to exit the entire job in the orchestration scenario?

Hi, can anybody answer this question I posted on StackOverflow? https://stackoverflow.com/questions/73314048/databricks-how-to-exit-the-entire-job-in-the-notebooks-orchestration-scenario

  • 1771 Views
  • 3 replies
  • 5 kudos
Latest Reply
CarterM
New Contributor III
  • 5 kudos

@Vidula Khanna​ @Vidula Khanna​ We are experiencing the same issue in our Workflows and I was wondering if there has been any update.We need the functionality to call a method similar to `dbutils.notebook.exit` in a notebook that will cancel the exec...

  • 5 kudos
2 More Replies
Gk
by New Contributor III
  • 2026 Views
  • 2 replies
  • 1 kudos

DataFrame

How can we create empty dataframe in databricks and how many ways we can create dataframe?

  • 2026 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vartika
Moderator
  • 1 kudos

Hi @Govardhana Reddy​ Hope everything is going great.Does @Suteja Kanuri​'s answer help? If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. Cheers!

  • 1 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 1514 Views
  • 3 replies
  • 1 kudos

Resolved! Upsert When the Origin NOT Exists, but you need to change status in the target

Hi guys,I have a question about upsert/merge ... What do you do when que origin NOT exists, but you need to change status in the target​For exemple:01/03 : source dataset [ id =1 and status = Active] ; target table [*not exists*] >> in this time the ...

  • 1514 Views
  • 3 replies
  • 1 kudos
Latest Reply
NandiniN
Honored Contributor
  • 1 kudos

Hello @William Scardua​ , Just adding to what @Vigneshraja Palaniraj​ replied.Reference: https://docs.databricks.com/sql/language-manual/delta-merge-into.htmlThanks & Regards,Nandini

  • 1 kudos
2 More Replies
self-employed
by Contributor
  • 1865 Views
  • 3 replies
  • 6 kudos

Resolved! Can anyone help me to understand one question in PracticeExam-DataEngineerAssociate?

It is the practice exam for data engineer associateThe question is:A data engineering team has created a series of tables using Parquet data stored in an external system. The team is noticing that after appending new rows to the data in the external ...

  • 1865 Views
  • 3 replies
  • 6 kudos
Latest Reply
suny
New Contributor II
  • 6 kudos

Not an answer, just asking the databricks folks to clarify:I would also like to understand this. If there is no event emitted from the external parquet table (push) , and no active pulling or refreshing from the delta table side (pull), how is the un...

  • 6 kudos
2 More Replies
leos1
by New Contributor II
  • 1051 Views
  • 2 replies
  • 0 kudos

Resolved! Question regarding ZORDER option of OPTIMIZE

Is the order of the columns in ZORDER important? For example, does ZORDER BY (product, site) and ZORDER BY (site, product) produce the same results?

  • 1051 Views
  • 2 replies
  • 0 kudos
Latest Reply
leos1
New Contributor II
  • 0 kudos

thanks for the quick reply

  • 0 kudos
1 More Replies
jerry747847
by New Contributor III
  • 1836 Views
  • 6 replies
  • 1 kudos

Resolved! Databricks Associate Practice Exam -query

Dear Experts, Can anyone please let me know how option "C" is the answer to Question 31 for PracticeExam-DataEngineerAssociate. https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf?_ga=2.185796329.11...

  • 1836 Views
  • 6 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Question 17 is even worse. "A data engineer is overwriting data" vs "should simply be overwritten instead"One situation I assume is DROP and CREATE and another is INSERT INTO OVERWRITE but here both are called the same.A data engineer is overwriting ...

  • 1 kudos
5 More Replies
Jin_Kim
by New Contributor II
  • 819 Views
  • 1 replies
  • 0 kudos

Question on single job with multi task

Say, I have a job with 10 parallel tasks. I had to cancel one of the tasks to fix something and I unable to restart just that task. Is this by design? Should I restart the job in this case.Q2) If one of the tasks fails, will it auto recover just tha...

  • 819 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Jin Kim​, Please enable "Task Orchestration in Jobs" in your Admin Console, and then you can add as many tasks to your job. You can also specify the dependency of your task.

  • 0 kudos
RengarLee
by Contributor
  • 4307 Views
  • 15 replies
  • 6 kudos

Resolved! The Databricks-academy question

I'm learning the  Data Engineeing with Databricks of Course, I have a question.if I run cmd4, it tells me an error.Course URL:https://customer-academy.databricks.com/learn/course/62/play/4290/providing-options-for-external-sources;lp=10Chapter: DE 4....

  • 4307 Views
  • 15 replies
  • 6 kudos
Latest Reply
Panna
New Contributor II
  • 6 kudos

Same issue occurred to me

  • 6 kudos
14 More Replies
nicole_wong
by New Contributor II
  • 1579 Views
  • 2 replies
  • 1 kudos

Resolved! Best practices for working with Redshift

I have a customer with the following question - I'm posting on their behalf to introduce them to the community. For doing modeling in a python environment what is our best practice for getting the data from redshift? A "load" option seems to leave me...

  • 1579 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @Nicole Wong​ ,Have you check the docs from here? As far as I know, this might be the only way to read/write data to/from redshift.

  • 1 kudos
1 More Replies
Erik
by Valued Contributor II
  • 3689 Views
  • 6 replies
  • 7 kudos

Databricks query performance when filtering on a column correlated to the partition-column

(This is a copy of a question I asked on stackoverflow here, but maybe this community is a better fit for the question):Setting: Delta-lake, Databricks SQL compute used by powerbi. I am wondering about the following scenario: We have a column `timest...

  • 3689 Views
  • 6 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

In query I would just query first by date (generated from timestamp which we want to query) and than by exact timestamp, so it will use partitioning benefit.

  • 7 kudos
5 More Replies
afshinR
by New Contributor III
  • 608 Views
  • 1 replies
  • 1 kudos

Hi, could you please help me with my question? i have not get any answers.

Hi,could you please help me with my question? i have not get any answers.

  • 608 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @afshin riahi​ , Yes, Definitely I can help you with it.Please wait while I or someone from the community gets back with a response.Thank you for your patience .

  • 1 kudos
Van-DuyetLe
by New Contributor
  • 25324 Views
  • 5 replies
  • 1 kudos

What's the difference between Interactive Clusters and Job Cluster?

I am new to databricks. I would like to know what is the difference between Interactive Clusters and Job Cluster? There are no official document now.

  • 25324 Views
  • 5 replies
  • 1 kudos
Latest Reply
Forum_Admin
Contributor
  • 1 kudos

Sports news Football news International football news Football news Thai football news, Thai football Follow news, know sports news at Siamsportnews

  • 1 kudos
4 More Replies
Labels