cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

HashMan
by New Contributor III
  • 3691 Views
  • 6 replies
  • 4 kudos

Resolved! Learn Apache Spark

I want to learn Apache Spark for Developer, where do I start and want materials are recommended.

  • 3691 Views
  • 6 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

If you are a databricks customer (any paid subscription like Azure databricks), please register through https://databricks.com/learn/training/home to Academy using email from subscription.Course there are the best on the internet.If you will not see ...

  • 4 kudos
5 More Replies
Constantine
by Contributor III
  • 19595 Views
  • 4 replies
  • 5 kudos

Resolved! How to provide UPSERT condition in PySpark

I have a table `demo_table_one` in which I want to upsert the following valuesdata = [   (11111 , 'CA', '2020-01-26'), (11111 , 'CA', '2020-02-26'), (88888 , 'CA', '2020-06-10'), (88888 , 'CA', '2020-05-10'), (88888 , 'WA', '2020-07-10'), ...

  • 19595 Views
  • 4 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@John Constantine,​ can you additionally share what data is in demo_table_one? as we have only df (alias update_table) in that example

  • 5 kudos
3 More Replies
sarvesh242
by Contributor
  • 10524 Views
  • 3 replies
  • 2 kudos

Resolved! java.lang.NoSuchMethodError in databricks

I have created a package in scala. Now, I am calling a method from that package and using it in my notebook. During run time, it throws me an error java.lang.NoSuchMethodError. The method exists in the package but still, I am getting this error. Plea...

  • 10524 Views
  • 3 replies
  • 2 kudos
Latest Reply
sarvesh242
Contributor
  • 2 kudos

Hi! @Kaniz Fatma​ . I am using scala version 2.11 with spark 2.4.3. According to Apache spark official website https://spark.apache.org/docs/2.4.3/#:~:text=For%20the%20Scala%20API%2C%20Spark,x.) Spark 2.4.3 uses Scala 2.12. (https://spark.apache.org/...

  • 2 kudos
2 More Replies
KC_1205
by New Contributor III
  • 3520 Views
  • 5 replies
  • 3 kudos

Resolved! Update Databricks at notebook level?

Hi all,Can we update the Databricks from existing to newer at notebook level?I know we can create a newer cluster and attach the note book to update to the newer version. but, can we also update DB at notebook as we update the libraries?if we can't,...

  • 3520 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hey there @Kiran Chalasani​ Just checking in. Glad that you were able to resolve your query. Would you be happy to mark the answer as best so that other members can find the solution more quickly?

  • 3 kudos
4 More Replies
lily1
by New Contributor III
  • 5051 Views
  • 3 replies
  • 2 kudos

Resolved! NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor

When I execute a function in google-cloud-bigquery:2.7.0 jar, it executes a function in gax:2.12.2 jar and then this gax jar file executes a function in guava jar. And this guava jar file is a Databricks default library which is located at /databrick...

  • 5051 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Lily Kim​ Hope you are doing well!Thank you for posting your question. We are happy that you were able to find the solution.Would you please like to mark the answer as best?We'd love to hear from you.

  • 2 kudos
2 More Replies
AJ270990
by Contributor II
  • 7396 Views
  • 8 replies
  • 3 kudos

Resolved! Powerpoint file operations in Databricks

Hi Team, I am writing a python code in Azure Databricks where I have mounted a Azure storage and accessing the input dataset from Azure storage resource. I am accessing the input data from Azure storage and generating charts from that data in databri...

  • 7396 Views
  • 8 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Abhishek Jain​ Thanks for sending in your query. We are glad that you found a solution. Would you like to mark the answer as best so the other members can benefit from it too?Cheers!

  • 3 kudos
7 More Replies
MarcoData01
by New Contributor III
  • 3205 Views
  • 6 replies
  • 4 kudos

Resolved! Is there the possibility to protect Init script folder on DBFS

Hi everyone,We are looking for a way to protect the folder where init script is hosted from editing.This because we have implemented inside init script a parameter that blocks the download file from R Studio APP Emulator and we would like to avoid th...

  • 3205 Views
  • 6 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Marco Data​ Thank you for sending in your question. It is awesome that you found a solution. Would you like to mark the answer as best so others can find the solution quickly?Cheers!

  • 4 kudos
5 More Replies
ChriChri
by New Contributor II
  • 5173 Views
  • 2 replies
  • 4 kudos

Azure Databricks Delta live table tab is missing

In my Azure Databricks workspace UI I do not have the tab "Delta live tables". In the documentation it says that there is a tab after clicking on Jobs in the main menu. I just created this Databricks resource in Azure and from my understanding the DL...

  • 5173 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Chr Jon​ How are you doing? Thanks for posting your question. Just checking in to see if one of the answers helped, would you let us know?

  • 4 kudos
1 More Replies
Mark1
by New Contributor II
  • 2542 Views
  • 2 replies
  • 2 kudos

Resolved! Using Delta Tables without Time Travel features?

Hi Everyone / Experts,is it possible to use Delta Tables without the Time Travel features? We are primarily interested in using the DML Features (delete, update, merge into, etc)Thanks,Mark

  • 2542 Views
  • 2 replies
  • 2 kudos
Latest Reply
Mark1
New Contributor II
  • 2 kudos

Thank you Hubert

  • 2 kudos
1 More Replies
ADB_482115_ADB_
by New Contributor
  • 1098 Views
  • 0 replies
  • 0 kudos

Reading multiple file and writing final results in Databricks getting below error

Job aborted due to stage failure: Task 12 in stage 1446.0 failed 4 times, most recent failure: Lost task 12.3 in stage 1446.0 (TID 2922) (10.24.175.143 executor 41): ExecutorLostFailure (executor 41 exited caused by one of the running tasks) Reason: ...

  • 1098 Views
  • 0 replies
  • 0 kudos
haseebkhan1421
by New Contributor
  • 14888 Views
  • 2 replies
  • 1 kudos

Resolved! How can I access python variable in Spark SQL?

I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql. Below is the example:%python RunID_Goal = sqlContext.sql("SELECT CONCAT(SUBSTRING(RunID,...

  • 14888 Views
  • 2 replies
  • 1 kudos
Latest Reply
Nirupam
New Contributor III
  • 1 kudos

You can use {} in spark.sql() of pyspark/scala instead of making a sql cell using %sql.This will result in a dataframe. If you want you can create a view on top of this using createOrReplaceTempView()Below is an example to use a variable:-# A variab...

  • 1 kudos
1 More Replies
Sugumar_Sriniva
by New Contributor III
  • 8310 Views
  • 11 replies
  • 5 kudos

Resolved! Data bricks cluster creation is failing while running the Cron job scheduling script through init script method from azure data bricks.

Dear connections,I'm unable to run a shell script which contains scheduling a Cron job through init script method on Azure Data bricks cluster nodes.Error from Azure Data bricks workspace:"databricks_error_message": "Cluster scoped init script dbfs:/...

  • 8310 Views
  • 11 replies
  • 5 kudos
Latest Reply
User16764241763
Honored Contributor
  • 5 kudos

Hello @Sugumar Srinivasan​  Could you please enable cluster log delivery and inspect the INIT script logs in the below path dbfs:/cluster-logs/<clusterId>/init_scripts path.https://docs.databricks.com/clusters/configure.html#cluster-log-delivery-1

  • 5 kudos
10 More Replies
sannycse
by New Contributor II
  • 4177 Views
  • 4 replies
  • 6 kudos

Resolved! read the csv file as shown in description

Project_Details.csvProjectNo|ProjectName|EmployeeNo100|analytics|1100|analytics|2101|machine learning|3101|machine learning|1101|machine learning|4Find each employee in the form of list working on each project?Output:ProjectNo|employeeNo100|[1,2]101|...

  • 4177 Views
  • 4 replies
  • 6 kudos
Latest Reply
User16764241763
Honored Contributor
  • 6 kudos

@SANJEEV BANDRU​  You can simply do thisJust change the file path CREATE TEMPORARY VIEW readcsv USING CSV OPTIONS ( path "dbfs:/docs/test.csv", header "true", delimiter "|", mode "FAILFAST");select ProjectNo, collect_list(EmployeeNo) Employeesfrom re...

  • 6 kudos
3 More Replies
weldermartins
by Honored Contributor
  • 4086 Views
  • 5 replies
  • 13 kudos

Hello everyone, I have a directory with 40 files. File names are divided into prefixes. I need to rename the prefix k3241 according to the name in the...

Hello everyone, I have a directory with 40 files.File names are divided into prefixes. I need to rename the prefix k3241 according to the name in the last prefix.I even managed to insert the csv extension at the end of the file. but renaming files ba...

Template
  • 4086 Views
  • 5 replies
  • 13 kudos
Latest Reply
Anonymous
Not applicable
  • 13 kudos

Hi @welder martins​ How are you doing?Thank you for posting that question. We are glad you could resolve the issue. Would you want to mark an answer as the best solution?Cheers

  • 13 kudos
4 More Replies
cristianc
by Contributor
  • 2995 Views
  • 5 replies
  • 3 kudos

Is it required to run OPTIMIZE after doing GDPR DELETEs?

Greetings,I have been reading the excellent article from https://docs.databricks.com/security/privacy/gdpr-delta.html?_ga=2.130942095.1400636634.1649068106-1416403472.1644480995&_gac=1.24792648.1647880283.CjwKCAjwxOCRBhA8EiwA0X8hi4Jsx2PulVs_FGMBdByBk...

  • 2995 Views
  • 5 replies
  • 3 kudos
Latest Reply
cristianc
Contributor
  • 3 kudos

@Hubert Dudek​ thanks for the hint, exactly as written in the article VACUUM is required after the GDPR delete operation, however do we need to OPTIMIZE ZSORT again the table or is the ordering maintained?

  • 3 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels