cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Nuthan_1994
by New Contributor II
  • 4333 Views
  • 3 replies
  • 3 kudos

Resolved! Installing new libraries on Azure Databricks Clusters

Hi Everyone,I was trying to install the newest python version on the Databricks Clusters and it has the runtime version 7.3 LTS, but no matter how many times I try it keeps installing the 3.7.5 version of python.I know that Runtime version 7.3 LTS co...

  • 4333 Views
  • 3 replies
  • 3 kudos
Latest Reply
dazfuller
Contributor III
  • 3 kudos

I've done this before using a custom docker image, but even then the runtime itself continues to use the version of python 3 which is installed as part of the OS. The easiest way to get to a newer version is to use a newer runtime. If you're sticking...

  • 3 kudos
2 More Replies
Anonymous
by Not applicable
  • 1857 Views
  • 1 replies
  • 1 kudos

Resolved! Access to Cluster Logs for non-admins

Suppose I have a DevOps team that needs near real-time access to cluster logs to troubleshoot job failures. What is the best way for me to grant access to view logs without granting them admin access?

  • 1857 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Please use logging option and set destination for sending logs in cluster settings to other Azure Blob or S3 storage (need to be mounted first):

  • 1 kudos
User16857281869
by New Contributor II
  • 2605 Views
  • 1 replies
  • 1 kudos

Resolved! Why do I see a cost explosion in my blob storage account (DBFS storage, blob storage, ...) for my structures streaming job?

Its usually one or more of the following reasons:1) If you are streaming into a table, you should be using .Trigger option to specify the frequency of checkpointing. Otherwise, the job will call the storage API every 10ms to log the transaction data...

  • 2605 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

please mount cheaper storage (LRS) to custom mount and set there checkpoints,please clear data regularly,if you are using forEac/forEatchBatchh in stream it will save every dataframe on dbfs,please remember not to use display() in production,if on th...

  • 1 kudos
User16857281869
by New Contributor II
  • 2294 Views
  • 1 replies
  • 1 kudos

Resolved! What is the best way to do time series analysis and forecasting with Spark?

We have developed a library on spark which makes typical operations on time series much simpler. You can check the repo in Github for more info. You could also check out one of our blogs which demos an implementation of a forecasting usecase with S...

  • 2294 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Currently on databricks there is MLFlow with forecasting option - please check it.

  • 1 kudos
brickster_2018
by Databricks Employee
  • 1627 Views
  • 1 replies
  • 0 kudos
  • 1627 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

This is a lit of configuration keys to enable or alter the blacklist mechanism:spark.blacklist.enabled – set to Truespark.blacklist.task.maxTaskAttemptsPerExecutor (1 by default)spark.blacklist.task.maxTaskAttemptsPerNode (2 by default)spark.blacklis...

  • 0 kudos
DievanB
by New Contributor
  • 2063 Views
  • 1 replies
  • 0 kudos

pyspark: How to run selenium in UDF

Hi all, I am building a webscraper to get prices of certain EAN's from the amazon website. Therefore I use selenium to get the product links. I wrote te following function to get the productlinks based on a EAN: def getProductLinkAmazonPY(EAN): st...

  • 2063 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

UDF functions are serialized and then executed on executors. I don't think it will be possible with Selenium.

  • 0 kudos
Emre
by New Contributor II
  • 1725 Views
  • 1 replies
  • 2 kudos

Resolved! The license of JDBC connector for BI vendors

Hey all,We would like to support Databricks in our BI tool, which is an open-source Java application. (See https://github.com/metriql/metriql)In order to connect Databricks, we need to use the JDBC connector similar to the other BI tools such as Look...

  • 1725 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

It doesn't look so bad after all (mean terms and conditions on https://databricks.com/jdbc-odbc-driver-license )but I think the best solution is to open ticket via https://databricks.com/company/contact

  • 2 kudos
missyT
by New Contributor III
  • 1840 Views
  • 1 replies
  • 3 kudos

Is there a reason lists don't have a .sum() method?

I do a lot of work with numpy arrays and pytorch tensors, but occasionally throw some native lists around. I naturally want to write <list>.sum(), which would work for these other third-party iterables, but doesn't work for native lists.It'd be very ...

  • 1840 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

I think reason is that list can contain different type of objects than just integers and floats (so nested lists, string and all possible other kind of objects) so it doesn't make sense to implement .sum method as it would fail in many cases.

  • 3 kudos
Anonymous
by Not applicable
  • 676 Views
  • 0 replies
  • 0 kudos

www.jamboreeindia.com

Jamboree is the leading institute offering specialized classroom and online test prep solutions for study abroad entrance exams like GMAT, GRE, SAT, TOEFL and IELTS.https://www.jamboreeindia.com/

  • 676 Views
  • 0 replies
  • 0 kudos
Rithwik_Malla_2
by New Contributor
  • 3173 Views
  • 3 replies
  • 5 kudos

Resolved! Terraform - Databricks CI/CD pipeline

Can anyone help me in configuring the CI/CD for a ADB terraform code .The problem I am facing authentication something went wrong in there . Can anyone help me on this .Pipeline implementation in Azure DevOps.

  • 3173 Views
  • 3 replies
  • 5 kudos
Latest Reply
Ravi
Databricks Employee
  • 5 kudos

Hi @Rithwik Aditya Manoj Malla​ , as requested by @Prabakar Ammeappin​ earlier could you please share the code block and the error details. Also, you can refer to the below doc to authenticate ADB from TF code.https://registry.terraform.io/providers/...

  • 5 kudos
2 More Replies
mcharl02
by New Contributor III
  • 10992 Views
  • 9 replies
  • 6 kudos

How do I restore auto-close quote & auto-close parentheses functionality?

Over the last two days, my team's databricks notebooks (using Python interpreter) have stopped automatically adding a close single quote (') with a cursor between the two. Same issue with automatically adding close parentheses.The cluster has been re...

  • 10992 Views
  • 9 replies
  • 6 kudos
Latest Reply
Sajesh
Databricks Employee
  • 6 kudos

The fix for this issue will be most likely released to all regions/workspaces by 18th Nov 21

  • 6 kudos
8 More Replies
kjoth
by Contributor II
  • 6906 Views
  • 7 replies
  • 12 kudos

Resolved! Databricks cluster Encryption keystore_password

How to set up this value? Is this any value we can provide or the default value we have to p#!/bin/bash   keystore_file="/dbfs/<keystore_directory>/jetty_ssl_driver_keystore.jks" keystore_password="gb1gQqZ9ZIHS" sasl_secret=$(sha256sum $keystore_file...

  • 6906 Views
  • 7 replies
  • 12 kudos
Latest Reply
Prabakar
Databricks Employee
  • 12 kudos

Hi @karthick J​ please refer to this notebook.https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/cluster-encryption-init-script.htmlFurther, if you will be using %pip magic command the below post will be helpful.https://community.dat...

  • 12 kudos
6 More Replies
sarvesh
by Contributor III
  • 4498 Views
  • 0 replies
  • 0 kudos

Can we read an excel file with many sheets with there indexes?

I am trying to read a excel file which has 3 sheets which have integers as there names,sheet 1 name = 21sheet 2 name = 24sheet 3 name = 224i got this data from a user so I can't change the sheet name, but with spark reading these is an issue.code -v...

  • 4498 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels