cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16857281869
by New Contributor II
  • 2226 Views
  • 1 replies
  • 1 kudos

Resolved! Why do I see a cost explosion in my blob storage account (DBFS storage, blob storage, ...) for my structures streaming job?

Its usually one or more of the following reasons:1) If you are streaming into a table, you should be using .Trigger option to specify the frequency of checkpointing. Otherwise, the job will call the storage API every 10ms to log the transaction data...

  • 2226 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

please mount cheaper storage (LRS) to custom mount and set there checkpoints,please clear data regularly,if you are using forEac/forEatchBatchh in stream it will save every dataframe on dbfs,please remember not to use display() in production,if on th...

  • 1 kudos
User16857281869
by New Contributor II
  • 1872 Views
  • 1 replies
  • 1 kudos

Resolved! What is the best way to do time series analysis and forecasting with Spark?

We have developed a library on spark which makes typical operations on time series much simpler. You can check the repo in Github for more info. You could also check out one of our blogs which demos an implementation of a forecasting usecase with S...

  • 1872 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Currently on databricks there is MLFlow with forecasting option - please check it.

  • 1 kudos
brickster_2018
by Databricks Employee
  • 1316 Views
  • 1 replies
  • 0 kudos
  • 1316 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

This is a lit of configuration keys to enable or alter the blacklist mechanism:spark.blacklist.enabled – set to Truespark.blacklist.task.maxTaskAttemptsPerExecutor (1 by default)spark.blacklist.task.maxTaskAttemptsPerNode (2 by default)spark.blacklis...

  • 0 kudos
DievanB
by New Contributor
  • 1768 Views
  • 1 replies
  • 0 kudos

pyspark: How to run selenium in UDF

Hi all, I am building a webscraper to get prices of certain EAN's from the amazon website. Therefore I use selenium to get the product links. I wrote te following function to get the productlinks based on a EAN: def getProductLinkAmazonPY(EAN): st...

  • 1768 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

UDF functions are serialized and then executed on executors. I don't think it will be possible with Selenium.

  • 0 kudos
Emre
by New Contributor II
  • 1497 Views
  • 1 replies
  • 2 kudos

Resolved! The license of JDBC connector for BI vendors

Hey all,We would like to support Databricks in our BI tool, which is an open-source Java application. (See https://github.com/metriql/metriql)In order to connect Databricks, we need to use the JDBC connector similar to the other BI tools such as Look...

  • 1497 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

It doesn't look so bad after all (mean terms and conditions on https://databricks.com/jdbc-odbc-driver-license )but I think the best solution is to open ticket via https://databricks.com/company/contact

  • 2 kudos
missyT
by New Contributor III
  • 1603 Views
  • 1 replies
  • 3 kudos

Is there a reason lists don't have a .sum() method?

I do a lot of work with numpy arrays and pytorch tensors, but occasionally throw some native lists around. I naturally want to write <list>.sum(), which would work for these other third-party iterables, but doesn't work for native lists.It'd be very ...

  • 1603 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

I think reason is that list can contain different type of objects than just integers and floats (so nested lists, string and all possible other kind of objects) so it doesn't make sense to implement .sum method as it would fail in many cases.

  • 3 kudos
Anonymous
by Not applicable
  • 545 Views
  • 0 replies
  • 0 kudos

www.jamboreeindia.com

Jamboree is the leading institute offering specialized classroom and online test prep solutions for study abroad entrance exams like GMAT, GRE, SAT, TOEFL and IELTS.https://www.jamboreeindia.com/

  • 545 Views
  • 0 replies
  • 0 kudos
Rithwik_Malla_2
by New Contributor
  • 2816 Views
  • 3 replies
  • 5 kudos

Resolved! Terraform - Databricks CI/CD pipeline

Can anyone help me in configuring the CI/CD for a ADB terraform code .The problem I am facing authentication something went wrong in there . Can anyone help me on this .Pipeline implementation in Azure DevOps.

  • 2816 Views
  • 3 replies
  • 5 kudos
Latest Reply
Ravi
Databricks Employee
  • 5 kudos

Hi @Rithwik Aditya Manoj Malla​ , as requested by @Prabakar Ammeappin​ earlier could you please share the code block and the error details. Also, you can refer to the below doc to authenticate ADB from TF code.https://registry.terraform.io/providers/...

  • 5 kudos
2 More Replies
mcharl02
by New Contributor III
  • 9307 Views
  • 9 replies
  • 6 kudos

How do I restore auto-close quote & auto-close parentheses functionality?

Over the last two days, my team's databricks notebooks (using Python interpreter) have stopped automatically adding a close single quote (') with a cursor between the two. Same issue with automatically adding close parentheses.The cluster has been re...

  • 9307 Views
  • 9 replies
  • 6 kudos
Latest Reply
Sajesh
Databricks Employee
  • 6 kudos

The fix for this issue will be most likely released to all regions/workspaces by 18th Nov 21

  • 6 kudos
8 More Replies
User16790091296
by Contributor II
  • 2785 Views
  • 2 replies
  • 5 kudos

Resolved! How do I use databricks-cli without manual configuration

I want to use databricks cli:databricks clusters listbut this requires a manual step that requires interactive work with the user:databricks configure --tokenIs there a way to use databricks cli without manual intervention so that you can run it as p...

  • 2785 Views
  • 2 replies
  • 5 kudos
Latest Reply
alexott
Databricks Employee
  • 5 kudos

You can set two environment variables: DATABRICKS_HOST and DATABRICKS_TOKEN, and databricks-cli will use them. See the example of that in the DevOps pipelinesee the full list of environment variables at the end of the Authentication section of docume...

  • 5 kudos
1 More Replies
kjoth
by Contributor II
  • 6094 Views
  • 7 replies
  • 12 kudos

Resolved! Databricks cluster Encryption keystore_password

How to set up this value? Is this any value we can provide or the default value we have to p#!/bin/bash   keystore_file="/dbfs/<keystore_directory>/jetty_ssl_driver_keystore.jks" keystore_password="gb1gQqZ9ZIHS" sasl_secret=$(sha256sum $keystore_file...

  • 6094 Views
  • 7 replies
  • 12 kudos
Latest Reply
Prabakar
Databricks Employee
  • 12 kudos

Hi @karthick J​ please refer to this notebook.https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/cluster-encryption-init-script.htmlFurther, if you will be using %pip magic command the below post will be helpful.https://community.dat...

  • 12 kudos
6 More Replies
sarvesh
by Contributor III
  • 4074 Views
  • 0 replies
  • 0 kudos

Can we read an excel file with many sheets with there indexes?

I am trying to read a excel file which has 3 sheets which have integers as there names,sheet 1 name = 21sheet 2 name = 24sheet 3 name = 224i got this data from a user so I can't change the sheet name, but with spark reading these is an issue.code -v...

  • 4074 Views
  • 0 replies
  • 0 kudos
StephanieAlba
by Databricks Employee
  • 5481 Views
  • 2 replies
  • 3 kudos

Resolved! Best Data Model for moving from DW to Delta lake

I’m curious what Databricks recommends how we model the data. Do they recommend that the data be in 3rd normal form (3NF). Or should be it be dimensionally modeled (facts and dimensions)

  • 5481 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

It all depends on the use case.3NF is ideal for transactional systems. So for a data warehouse/lakehouse that might not be ideal.However there certainly are cases where it is interesting.Star schema's are def still relevant, BUT with the processing p...

  • 3 kudos
1 More Replies
Junee
by New Contributor III
  • 5679 Views
  • 5 replies
  • 3 kudos

Resolved! What happens to the clusters whose jobs are canceled or terminated due to failures? (Jobs triggered through Job API2.1 using runs/submit)

I am using Databeicks Job Api 2.1 to trigger and run my jobs. "jobs/runs/submit" this API helps in starting the cluster, as well as create the job and run it. This API works great for normal jobs as it also cleans the cluster once job is finished suc...

  • 5679 Views
  • 5 replies
  • 3 kudos
Latest Reply
User16871418122
Contributor III
  • 3 kudos

@Junee, Anytime! It is crisply mentioned in the doc too. https://docs.databricks.com/clusters/index.html

  • 3 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels