Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
If you are a databricks customer (any paid subscription like Azure databricks), please register through https://databricks.com/learn/training/home to Academy using email from subscription.Course there are the best on the internet.If you will not see ...
I have a table `demo_table_one` in which I want to upsert the following valuesdata = [
(11111 , 'CA', '2020-01-26'),
(11111 , 'CA', '2020-02-26'),
(88888 , 'CA', '2020-06-10'),
(88888 , 'CA', '2020-05-10'),
(88888 , 'WA', '2020-07-10'),
...
I have created a package in scala. Now, I am calling a method from that package and using it in my notebook. During run time, it throws me an error java.lang.NoSuchMethodError. The method exists in the package but still, I am getting this error. Plea...
Hi! @Kaniz Fatma . I am using scala version 2.11 with spark 2.4.3. According to Apache spark official website https://spark.apache.org/docs/2.4.3/#:~:text=For%20the%20Scala%20API%2C%20Spark,x.) Spark 2.4.3 uses Scala 2.12. (https://spark.apache.org/...
Hi all,Can we update the Databricks from existing to newer at notebook level?I know we can create a newer cluster and attach the note book to update to the newer version. but, can we also update DB at notebook as we update the libraries?if we can't,...
Hey there @Kiran Chalasani Just checking in. Glad that you were able to resolve your query. Would you be happy to mark the answer as best so that other members can find the solution more quickly?
When I execute a function in google-cloud-bigquery:2.7.0 jar, it executes a function in gax:2.12.2 jar and then this gax jar file executes a function in guava jar. And this guava jar file is a Databricks default library which is located at /databrick...
Hey there @Lily Kim Hope you are doing well!Thank you for posting your question. We are happy that you were able to find the solution.Would you please like to mark the answer as best?We'd love to hear from you.
Hi Team, I am writing a python code in Azure Databricks where I have mounted a Azure storage and accessing the input dataset from Azure storage resource. I am accessing the input data from Azure storage and generating charts from that data in databri...
Hi @Abhishek Jain Thanks for sending in your query. We are glad that you found a solution. Would you like to mark the answer as best so the other members can benefit from it too?Cheers!
Hi everyone,We are looking for a way to protect the folder where init script is hosted from editing.This because we have implemented inside init script a parameter that blocks the download file from R Studio APP Emulator and we would like to avoid th...
Hi @Marco Data Thank you for sending in your question. It is awesome that you found a solution. Would you like to mark the answer as best so others can find the solution quickly?Cheers!
In my Azure Databricks workspace UI I do not have the tab "Delta live tables". In the documentation it says that there is a tab after clicking on Jobs in the main menu. I just created this Databricks resource in Azure and from my understanding the DL...
Hi Everyone / Experts,is it possible to use Delta Tables without the Time Travel features? We are primarily interested in using the DML Features (delete, update, merge into, etc)Thanks,Mark
Job aborted due to stage failure: Task 12 in stage 1446.0 failed 4 times, most recent failure: Lost task 12.3 in stage 1446.0 (TID 2922) (10.24.175.143 executor 41): ExecutorLostFailure (executor 41 exited caused by one of the running tasks) Reason: ...
I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql. Below is the example:%python
RunID_Goal = sqlContext.sql("SELECT CONCAT(SUBSTRING(RunID,...
You can use {} in spark.sql() of pyspark/scala instead of making a sql cell using %sql.This will result in a dataframe. If you want you can create a view on top of this using createOrReplaceTempView()Below is an example to use a variable:-# A variab...
Dear connections,I'm unable to run a shell script which contains scheduling a Cron job through init script method on Azure Data bricks cluster nodes.Error from Azure Data bricks workspace:"databricks_error_message": "Cluster scoped init script dbfs:/...
Hello @Sugumar Srinivasan Could you please enable cluster log delivery and inspect the INIT script logs in the below path dbfs:/cluster-logs/<clusterId>/init_scripts path.https://docs.databricks.com/clusters/configure.html#cluster-log-delivery-1
Project_Details.csvProjectNo|ProjectName|EmployeeNo100|analytics|1100|analytics|2101|machine learning|3101|machine learning|1101|machine learning|4Find each employee in the form of list working on each project?Output:ProjectNo|employeeNo100|[1,2]101|...
Hello everyone, I have a directory with 40 files.File names are divided into prefixes. I need to rename the prefix k3241 according to the name in the last prefix.I even managed to insert the csv extension at the end of the file. but renaming files ba...
Hi @welder martins How are you doing?Thank you for posting that question. We are glad you could resolve the issue. Would you want to mark an answer as the best solution?Cheers
Greetings,I have been reading the excellent article from https://docs.databricks.com/security/privacy/gdpr-delta.html?_ga=2.130942095.1400636634.1649068106-1416403472.1644480995&_gac=1.24792648.1647880283.CjwKCAjwxOCRBhA8EiwA0X8hi4Jsx2PulVs_FGMBdByBk...
@Hubert Dudek thanks for the hint, exactly as written in the article VACUUM is required after the GDPR delete operation, however do we need to OPTIMIZE ZSORT again the table or is the ordering maintained?