cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 2405 Views
  • 1 replies
  • 1 kudos

Resolved! Access to Cluster Logs for non-admins

Suppose I have a DevOps team that needs near real-time access to cluster logs to troubleshoot job failures. What is the best way for me to grant access to view logs without granting them admin access?

  • 2405 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Please use logging option and set destination for sending logs in cluster settings to other Azure Blob or S3 storage (need to be mounted first):

  • 1 kudos
User16857281869
by Databricks Employee
  • 3468 Views
  • 1 replies
  • 1 kudos

Resolved! Why do I see a cost explosion in my blob storage account (DBFS storage, blob storage, ...) for my structures streaming job?

Its usually one or more of the following reasons:1) If you are streaming into a table, you should be using .Trigger option to specify the frequency of checkpointing. Otherwise, the job will call the storage API every 10ms to log the transaction data...

  • 3468 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

please mount cheaper storage (LRS) to custom mount and set there checkpoints,please clear data regularly,if you are using forEac/forEatchBatchh in stream it will save every dataframe on dbfs,please remember not to use display() in production,if on th...

  • 1 kudos
User16857281869
by Databricks Employee
  • 3131 Views
  • 1 replies
  • 1 kudos

Resolved! What is the best way to do time series analysis and forecasting with Spark?

We have developed a library on spark which makes typical operations on time series much simpler. You can check the repo in Github for more info. You could also check out one of our blogs which demos an implementation of a forecasting usecase with S...

  • 3131 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 1 kudos

Currently on databricks there is MLFlow with forecasting option - please check it.

  • 1 kudos
brickster_2018
by Databricks Employee
  • 2425 Views
  • 1 replies
  • 0 kudos
  • 2425 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

This is a lit of configuration keys to enable or alter the blacklist mechanism:spark.blacklist.enabled – set to Truespark.blacklist.task.maxTaskAttemptsPerExecutor (1 by default)spark.blacklist.task.maxTaskAttemptsPerNode (2 by default)spark.blacklis...

  • 0 kudos
DievanB
by New Contributor
  • 2583 Views
  • 1 replies
  • 0 kudos

pyspark: How to run selenium in UDF

Hi all, I am building a webscraper to get prices of certain EAN's from the amazon website. Therefore I use selenium to get the product links. I wrote te following function to get the productlinks based on a EAN: def getProductLinkAmazonPY(EAN): st...

  • 2583 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

UDF functions are serialized and then executed on executors. I don't think it will be possible with Selenium.

  • 0 kudos
Emre
by New Contributor II
  • 2463 Views
  • 1 replies
  • 2 kudos

Resolved! The license of JDBC connector for BI vendors

Hey all,We would like to support Databricks in our BI tool, which is an open-source Java application. (See https://github.com/metriql/metriql)In order to connect Databricks, we need to use the JDBC connector similar to the other BI tools such as Look...

  • 2463 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

It doesn't look so bad after all (mean terms and conditions on https://databricks.com/jdbc-odbc-driver-license )but I think the best solution is to open ticket via https://databricks.com/company/contact

  • 2 kudos
missyT
by New Contributor III
  • 2408 Views
  • 1 replies
  • 3 kudos

Is there a reason lists don't have a .sum() method?

I do a lot of work with numpy arrays and pytorch tensors, but occasionally throw some native lists around. I naturally want to write <list>.sum(), which would work for these other third-party iterables, but doesn't work for native lists.It'd be very ...

  • 2408 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

I think reason is that list can contain different type of objects than just integers and floats (so nested lists, string and all possible other kind of objects) so it doesn't make sense to implement .sum method as it would fail in many cases.

  • 3 kudos
Anonymous
by Not applicable
  • 1110 Views
  • 0 replies
  • 0 kudos

www.jamboreeindia.com

Jamboree is the leading institute offering specialized classroom and online test prep solutions for study abroad entrance exams like GMAT, GRE, SAT, TOEFL and IELTS.https://www.jamboreeindia.com/

  • 1110 Views
  • 0 replies
  • 0 kudos
Rithwik_Malla_2
by New Contributor
  • 4155 Views
  • 3 replies
  • 5 kudos

Resolved! Terraform - Databricks CI/CD pipeline

Can anyone help me in configuring the CI/CD for a ADB terraform code .The problem I am facing authentication something went wrong in there . Can anyone help me on this .Pipeline implementation in Azure DevOps.

  • 4155 Views
  • 3 replies
  • 5 kudos
Latest Reply
Ravi
Databricks Employee
  • 5 kudos

Hi @Rithwik Aditya Manoj Malla​ , as requested by @Prabakar Ammeappin​ earlier could you please share the code block and the error details. Also, you can refer to the below doc to authenticate ADB from TF code.https://registry.terraform.io/providers/...

  • 5 kudos
2 More Replies
mcharl02
by New Contributor III
  • 16378 Views
  • 9 replies
  • 6 kudos

How do I restore auto-close quote & auto-close parentheses functionality?

Over the last two days, my team's databricks notebooks (using Python interpreter) have stopped automatically adding a close single quote (') with a cursor between the two. Same issue with automatically adding close parentheses.The cluster has been re...

  • 16378 Views
  • 9 replies
  • 6 kudos
Latest Reply
Sajesh
Databricks Employee
  • 6 kudos

The fix for this issue will be most likely released to all regions/workspaces by 18th Nov 21

  • 6 kudos
8 More Replies
kjoth
by Contributor II
  • 9042 Views
  • 7 replies
  • 12 kudos

Resolved! Databricks cluster Encryption keystore_password

How to set up this value? Is this any value we can provide or the default value we have to p#!/bin/bash   keystore_file="/dbfs/<keystore_directory>/jetty_ssl_driver_keystore.jks" keystore_password="gb1gQqZ9ZIHS" sasl_secret=$(sha256sum $keystore_file...

  • 9042 Views
  • 7 replies
  • 12 kudos
Latest Reply
Prabakar
Databricks Employee
  • 12 kudos

Hi @karthick J​ please refer to this notebook.https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/cluster-encryption-init-script.htmlFurther, if you will be using %pip magic command the below post will be helpful.https://community.dat...

  • 12 kudos
6 More Replies
sarvesh
by Contributor III
  • 5271 Views
  • 0 replies
  • 0 kudos

Can we read an excel file with many sheets with there indexes?

I am trying to read a excel file which has 3 sheets which have integers as there names,sheet 1 name = 21sheet 2 name = 24sheet 3 name = 224i got this data from a user so I can't change the sheet name, but with spark reading these is an issue.code -v...

  • 5271 Views
  • 0 replies
  • 0 kudos
StephanieAlba
by Databricks Employee
  • 8330 Views
  • 2 replies
  • 3 kudos

Resolved! Best Data Model for moving from DW to Delta lake

I’m curious what Databricks recommends how we model the data. Do they recommend that the data be in 3rd normal form (3NF). Or should be it be dimensionally modeled (facts and dimensions)

  • 8330 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

It all depends on the use case.3NF is ideal for transactional systems. So for a data warehouse/lakehouse that might not be ideal.However there certainly are cases where it is interesting.Star schema's are def still relevant, BUT with the processing p...

  • 3 kudos
1 More Replies
Junee
by New Contributor III
  • 8876 Views
  • 5 replies
  • 3 kudos

Resolved! What happens to the clusters whose jobs are canceled or terminated due to failures? (Jobs triggered through Job API2.1 using runs/submit)

I am using Databeicks Job Api 2.1 to trigger and run my jobs. "jobs/runs/submit" this API helps in starting the cluster, as well as create the job and run it. This API works great for normal jobs as it also cleans the cluster once job is finished suc...

  • 8876 Views
  • 5 replies
  • 3 kudos
Latest Reply
User16871418122
Databricks Employee
  • 3 kudos

@Junee, Anytime! It is crisply mentioned in the doc too. https://docs.databricks.com/clusters/index.html

  • 3 kudos
4 More Replies
Labels