cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DievanB
by New Contributor
  • 2541 Views
  • 1 replies
  • 0 kudos

pyspark: How to run selenium in UDF

Hi all, I am building a webscraper to get prices of certain EAN's from the amazon website. Therefore I use selenium to get the product links. I wrote te following function to get the productlinks based on a EAN: def getProductLinkAmazonPY(EAN): st...

  • 2541 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

UDF functions are serialized and then executed on executors. I don't think it will be possible with Selenium.

  • 0 kudos
Emre
by New Contributor II
  • 2412 Views
  • 1 replies
  • 2 kudos

Resolved! The license of JDBC connector for BI vendors

Hey all,We would like to support Databricks in our BI tool, which is an open-source Java application. (See https://github.com/metriql/metriql)In order to connect Databricks, we need to use the JDBC connector similar to the other BI tools such as Look...

  • 2412 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

It doesn't look so bad after all (mean terms and conditions on https://databricks.com/jdbc-odbc-driver-license )but I think the best solution is to open ticket via https://databricks.com/company/contact

  • 2 kudos
missyT
by New Contributor III
  • 2370 Views
  • 1 replies
  • 3 kudos

Is there a reason lists don't have a .sum() method?

I do a lot of work with numpy arrays and pytorch tensors, but occasionally throw some native lists around. I naturally want to write <list>.sum(), which would work for these other third-party iterables, but doesn't work for native lists.It'd be very ...

  • 2370 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 3 kudos

I think reason is that list can contain different type of objects than just integers and floats (so nested lists, string and all possible other kind of objects) so it doesn't make sense to implement .sum method as it would fail in many cases.

  • 3 kudos
Anonymous
by Not applicable
  • 1092 Views
  • 0 replies
  • 0 kudos

www.jamboreeindia.com

Jamboree is the leading institute offering specialized classroom and online test prep solutions for study abroad entrance exams like GMAT, GRE, SAT, TOEFL and IELTS.https://www.jamboreeindia.com/

  • 1092 Views
  • 0 replies
  • 0 kudos
Rithwik_Malla_2
by New Contributor
  • 4067 Views
  • 3 replies
  • 5 kudos

Resolved! Terraform - Databricks CI/CD pipeline

Can anyone help me in configuring the CI/CD for a ADB terraform code .The problem I am facing authentication something went wrong in there . Can anyone help me on this .Pipeline implementation in Azure DevOps.

  • 4067 Views
  • 3 replies
  • 5 kudos
Latest Reply
Ravi
Databricks Employee
  • 5 kudos

Hi @Rithwik Aditya Manoj Malla​ , as requested by @Prabakar Ammeappin​ earlier could you please share the code block and the error details. Also, you can refer to the below doc to authenticate ADB from TF code.https://registry.terraform.io/providers/...

  • 5 kudos
2 More Replies
mcharl02
by New Contributor III
  • 16158 Views
  • 9 replies
  • 6 kudos

How do I restore auto-close quote & auto-close parentheses functionality?

Over the last two days, my team's databricks notebooks (using Python interpreter) have stopped automatically adding a close single quote (') with a cursor between the two. Same issue with automatically adding close parentheses.The cluster has been re...

  • 16158 Views
  • 9 replies
  • 6 kudos
Latest Reply
Sajesh
Databricks Employee
  • 6 kudos

The fix for this issue will be most likely released to all regions/workspaces by 18th Nov 21

  • 6 kudos
8 More Replies
kjoth
by Contributor II
  • 8921 Views
  • 7 replies
  • 12 kudos

Resolved! Databricks cluster Encryption keystore_password

How to set up this value? Is this any value we can provide or the default value we have to p#!/bin/bash   keystore_file="/dbfs/<keystore_directory>/jetty_ssl_driver_keystore.jks" keystore_password="gb1gQqZ9ZIHS" sasl_secret=$(sha256sum $keystore_file...

  • 8921 Views
  • 7 replies
  • 12 kudos
Latest Reply
Prabakar
Databricks Employee
  • 12 kudos

Hi @karthick J​ please refer to this notebook.https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/cluster-encryption-init-script.htmlFurther, if you will be using %pip magic command the below post will be helpful.https://community.dat...

  • 12 kudos
6 More Replies
sarvesh
by Contributor III
  • 5241 Views
  • 0 replies
  • 0 kudos

Can we read an excel file with many sheets with there indexes?

I am trying to read a excel file which has 3 sheets which have integers as there names,sheet 1 name = 21sheet 2 name = 24sheet 3 name = 224i got this data from a user so I can't change the sheet name, but with spark reading these is an issue.code -v...

  • 5241 Views
  • 0 replies
  • 0 kudos
StephanieAlba
by Databricks Employee
  • 8262 Views
  • 2 replies
  • 3 kudos

Resolved! Best Data Model for moving from DW to Delta lake

I’m curious what Databricks recommends how we model the data. Do they recommend that the data be in 3rd normal form (3NF). Or should be it be dimensionally modeled (facts and dimensions)

  • 8262 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

It all depends on the use case.3NF is ideal for transactional systems. So for a data warehouse/lakehouse that might not be ideal.However there certainly are cases where it is interesting.Star schema's are def still relevant, BUT with the processing p...

  • 3 kudos
1 More Replies
Junee
by New Contributor III
  • 8748 Views
  • 5 replies
  • 3 kudos

Resolved! What happens to the clusters whose jobs are canceled or terminated due to failures? (Jobs triggered through Job API2.1 using runs/submit)

I am using Databeicks Job Api 2.1 to trigger and run my jobs. "jobs/runs/submit" this API helps in starting the cluster, as well as create the job and run it. This API works great for normal jobs as it also cleans the cluster once job is finished suc...

  • 8748 Views
  • 5 replies
  • 3 kudos
Latest Reply
User16871418122
Databricks Employee
  • 3 kudos

@Junee, Anytime! It is crisply mentioned in the doc too. https://docs.databricks.com/clusters/index.html

  • 3 kudos
4 More Replies
francescocamuss
by Databricks Partner
  • 30670 Views
  • 12 replies
  • 10 kudos

Resolved! Databricks rbase container: Rstudio doesn´t work

Hello, How are you? I hope you are doing well!I´m trying to use a databrick´s image (link: containers/ubuntu/R at master · databricks/containers (github.com)) to run a container when starting a cluster. I need that Rstudio is installed on the contain...

1 2 3 5
  • 30670 Views
  • 12 replies
  • 10 kudos
Latest Reply
Prabakar
Databricks Employee
  • 10 kudos

If the issue is resolved would you be happy to mark the answer as best so that others can quickly find the solution in the future.

  • 10 kudos
11 More Replies
Chris_Shehu
by Valued Contributor III
  • 12797 Views
  • 7 replies
  • 2 kudos

Resolved! Can I disable the workspace directory for specific user groups?

We want to use the REPO directory in our production environment only and have a dev environment with less restrictions. If I use the checkbox on the group admin screen to disable workspace access, it locks out the entire Data Engineering section.

  • 12797 Views
  • 7 replies
  • 2 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 2 kudos

So I found a way to get 85% of the way there:1) Disable workspace access for the users group.2) Create a new group or use another group that you created for the next step.3) Go to the workspace and right click on whitespace in the root directory.4) A...

  • 2 kudos
6 More Replies
bdc
by New Contributor III
  • 12600 Views
  • 4 replies
  • 5 kudos

Resolved! Is it possible to show multiple cmd output in a dashboard?

I have a loop that outputs a dataframe for values in a list; basically a loop. I can create a dashboard if there is only one df but in the loop, I'm only able to see the charts in the notebook if I switch the view to charts not in the dashboard. In t...

  • 12600 Views
  • 4 replies
  • 5 kudos
Latest Reply
Wanda11
New Contributor II
  • 5 kudos

If you want to be able to easily run and kill multiple process with ctrl-c, this is my favorite method: spawn multiple background processes in a (…) subshell, and trap SIGINT to execute kill 0, which will kill everything spawned in the subshell group...

  • 5 kudos
3 More Replies
Prabakar
by Databricks Employee
  • 9121 Views
  • 2 replies
  • 5 kudos

Resolved! %pip/%conda doesn't work with encrypted clusters starting DBR 9.x

While trying to use the magic command %pip/%conda with DBR 9.x or above it fails with the following error:   %pip install numpy org.apache.spark.SparkException: %pip/%conda commands use unencrypted NFS and are disabled by default when SSL encryption ...

  • 9121 Views
  • 2 replies
  • 5 kudos
Latest Reply
Prabakar
Databricks Employee
  • 5 kudos

If you are not aware of the traffic encryption between cluster worker nodes, you can refer to the below link.https://docs.microsoft.com/en-us/azure/databricks/security/encryption/encrypt-otw

  • 5 kudos
1 More Replies
Labels