cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 1505 Views
  • 1 replies
  • 0 kudos
  • 1505 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

This is a lit of configuration keys to enable or alter the blacklist mechanism:spark.blacklist.enabled – set to Truespark.blacklist.task.maxTaskAttemptsPerExecutor (1 by default)spark.blacklist.task.maxTaskAttemptsPerNode (2 by default)spark.blacklis...

  • 0 kudos
DievanB
by New Contributor
  • 1937 Views
  • 1 replies
  • 0 kudos

pyspark: How to run selenium in UDF

Hi all, I am building a webscraper to get prices of certain EAN's from the amazon website. Therefore I use selenium to get the product links. I wrote te following function to get the productlinks based on a EAN: def getProductLinkAmazonPY(EAN): st...

  • 1937 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

UDF functions are serialized and then executed on executors. I don't think it will be possible with Selenium.

  • 0 kudos
Emre
by New Contributor II
  • 1660 Views
  • 1 replies
  • 2 kudos

Resolved! The license of JDBC connector for BI vendors

Hey all,We would like to support Databricks in our BI tool, which is an open-source Java application. (See https://github.com/metriql/metriql)In order to connect Databricks, we need to use the JDBC connector similar to the other BI tools such as Look...

  • 1660 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

It doesn't look so bad after all (mean terms and conditions on https://databricks.com/jdbc-odbc-driver-license )but I think the best solution is to open ticket via https://databricks.com/company/contact

  • 2 kudos
missyT
by New Contributor III
  • 1767 Views
  • 1 replies
  • 3 kudos

Is there a reason lists don't have a .sum() method?

I do a lot of work with numpy arrays and pytorch tensors, but occasionally throw some native lists around. I naturally want to write <list>.sum(), which would work for these other third-party iterables, but doesn't work for native lists.It'd be very ...

  • 1767 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

I think reason is that list can contain different type of objects than just integers and floats (so nested lists, string and all possible other kind of objects) so it doesn't make sense to implement .sum method as it would fail in many cases.

  • 3 kudos
Anonymous
by Not applicable
  • 644 Views
  • 0 replies
  • 0 kudos

www.jamboreeindia.com

Jamboree is the leading institute offering specialized classroom and online test prep solutions for study abroad entrance exams like GMAT, GRE, SAT, TOEFL and IELTS.https://www.jamboreeindia.com/

  • 644 Views
  • 0 replies
  • 0 kudos
Rithwik_Malla_2
by New Contributor
  • 3074 Views
  • 3 replies
  • 5 kudos

Resolved! Terraform - Databricks CI/CD pipeline

Can anyone help me in configuring the CI/CD for a ADB terraform code .The problem I am facing authentication something went wrong in there . Can anyone help me on this .Pipeline implementation in Azure DevOps.

  • 3074 Views
  • 3 replies
  • 5 kudos
Latest Reply
Ravi
Databricks Employee
  • 5 kudos

Hi @Rithwik Aditya Manoj Malla​ , as requested by @Prabakar Ammeappin​ earlier could you please share the code block and the error details. Also, you can refer to the below doc to authenticate ADB from TF code.https://registry.terraform.io/providers/...

  • 5 kudos
2 More Replies
mcharl02
by New Contributor III
  • 10568 Views
  • 9 replies
  • 6 kudos

How do I restore auto-close quote & auto-close parentheses functionality?

Over the last two days, my team's databricks notebooks (using Python interpreter) have stopped automatically adding a close single quote (') with a cursor between the two. Same issue with automatically adding close parentheses.The cluster has been re...

  • 10568 Views
  • 9 replies
  • 6 kudos
Latest Reply
Sajesh
Databricks Employee
  • 6 kudos

The fix for this issue will be most likely released to all regions/workspaces by 18th Nov 21

  • 6 kudos
8 More Replies
kjoth
by Contributor II
  • 6662 Views
  • 7 replies
  • 12 kudos

Resolved! Databricks cluster Encryption keystore_password

How to set up this value? Is this any value we can provide or the default value we have to p#!/bin/bash   keystore_file="/dbfs/<keystore_directory>/jetty_ssl_driver_keystore.jks" keystore_password="gb1gQqZ9ZIHS" sasl_secret=$(sha256sum $keystore_file...

  • 6662 Views
  • 7 replies
  • 12 kudos
Latest Reply
Prabakar
Databricks Employee
  • 12 kudos

Hi @karthick J​ please refer to this notebook.https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/cluster-encryption-init-script.htmlFurther, if you will be using %pip magic command the below post will be helpful.https://community.dat...

  • 12 kudos
6 More Replies
sarvesh
by Contributor III
  • 4414 Views
  • 0 replies
  • 0 kudos

Can we read an excel file with many sheets with there indexes?

I am trying to read a excel file which has 3 sheets which have integers as there names,sheet 1 name = 21sheet 2 name = 24sheet 3 name = 224i got this data from a user so I can't change the sheet name, but with spark reading these is an issue.code -v...

  • 4414 Views
  • 0 replies
  • 0 kudos
StephanieAlba
by Databricks Employee
  • 7071 Views
  • 2 replies
  • 3 kudos

Resolved! Best Data Model for moving from DW to Delta lake

I’m curious what Databricks recommends how we model the data. Do they recommend that the data be in 3rd normal form (3NF). Or should be it be dimensionally modeled (facts and dimensions)

  • 7071 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

It all depends on the use case.3NF is ideal for transactional systems. So for a data warehouse/lakehouse that might not be ideal.However there certainly are cases where it is interesting.Star schema's are def still relevant, BUT with the processing p...

  • 3 kudos
1 More Replies
Junee
by New Contributor III
  • 6477 Views
  • 5 replies
  • 3 kudos

Resolved! What happens to the clusters whose jobs are canceled or terminated due to failures? (Jobs triggered through Job API2.1 using runs/submit)

I am using Databeicks Job Api 2.1 to trigger and run my jobs. "jobs/runs/submit" this API helps in starting the cluster, as well as create the job and run it. This API works great for normal jobs as it also cleans the cluster once job is finished suc...

  • 6477 Views
  • 5 replies
  • 3 kudos
Latest Reply
User16871418122
Contributor III
  • 3 kudos

@Junee, Anytime! It is crisply mentioned in the doc too. https://docs.databricks.com/clusters/index.html

  • 3 kudos
4 More Replies
francescocamuss
by New Contributor III
  • 21440 Views
  • 12 replies
  • 10 kudos

Resolved! Databricks rbase container: Rstudio doesn´t work

Hello, How are you? I hope you are doing well!I´m trying to use a databrick´s image (link: containers/ubuntu/R at master · databricks/containers (github.com)) to run a container when starting a cluster. I need that Rstudio is installed on the contain...

1 2 3 5
  • 21440 Views
  • 12 replies
  • 10 kudos
Latest Reply
Prabakar
Databricks Employee
  • 10 kudos

If the issue is resolved would you be happy to mark the answer as best so that others can quickly find the solution in the future.

  • 10 kudos
11 More Replies
Chris_Shehu
by Valued Contributor III
  • 8960 Views
  • 7 replies
  • 2 kudos

Resolved! Can I disable the workspace directory for specific user groups?

We want to use the REPO directory in our production environment only and have a dev environment with less restrictions. If I use the checkbox on the group admin screen to disable workspace access, it locks out the entire Data Engineering section.

  • 8960 Views
  • 7 replies
  • 2 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 2 kudos

So I found a way to get 85% of the way there:1) Disable workspace access for the users group.2) Create a new group or use another group that you created for the next step.3) Go to the workspace and right click on whitespace in the root directory.4) A...

  • 2 kudos
6 More Replies
bdc
by New Contributor III
  • 8702 Views
  • 4 replies
  • 5 kudos

Resolved! Is it possible to show multiple cmd output in a dashboard?

I have a loop that outputs a dataframe for values in a list; basically a loop. I can create a dashboard if there is only one df but in the loop, I'm only able to see the charts in the notebook if I switch the view to charts not in the dashboard. In t...

  • 8702 Views
  • 4 replies
  • 5 kudos
Latest Reply
Wanda11
New Contributor II
  • 5 kudos

If you want to be able to easily run and kill multiple process with ctrl-c, this is my favorite method: spawn multiple background processes in a (…) subshell, and trap SIGINT to execute kill 0, which will kill everything spawned in the subshell group...

  • 5 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels