cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

SepidehEb
by Databricks Employee
  • 3866 Views
  • 6 replies
  • 7 kudos

Resolved! How to get a minor DBR image?

In short, we aim to add a step to a CI job that would run tests in a container, which supposedly should mimic DBR of our clusters – currently we use 7.3 . We consider using one of databricksruntime images (possibly a standard:7.x for now, https://hub...

  • 3866 Views
  • 6 replies
  • 7 kudos
Latest Reply
Atanu
Databricks Employee
  • 7 kudos

Hi @Sepideh Ebrahimi​ , since cluster is Databricks proprietary, you ca not run it locally. as @Werner Stinckens​  said, you can build your own image but that has to be run in cluster. but there is databricks connect (https://docs.databricks.com/dev-...

  • 7 kudos
5 More Replies
sunil_smile
by Contributor
  • 6286 Views
  • 5 replies
  • 6 kudos

Apart from notebook , is it possible to deploy an application (Pyspark , or R+spark) as a package or file and execute them in Databricks ?

Hi,With the help of Databricks-connect i was able to connect the cluster to my local IDE like Pycharm and Rstudio desktop version and able to develop the application and committed the code in Git.When i try to add that repo to the Databricks workspac...

image image
  • 6286 Views
  • 5 replies
  • 6 kudos
Latest Reply
Atanu
Databricks Employee
  • 6 kudos

may be you will be interested our db connect . not sure if that resolve your issue to connect with 3rd party tool and setup ur supported IDE notebook serverhttps://docs.databricks.com/dev-tools/databricks-connect.html

  • 6 kudos
4 More Replies
Abela
by New Contributor III
  • 7042 Views
  • 3 replies
  • 7 kudos

Resolved! Databricks drop and remove s3 storage files safely

After dropping a delta table using DROP command in databricks, is there a way to drop the s3 files in databricks without using rm command? Looking for a solution where junior developers can safely drop a table wihout messing with the rm command where...

  • 7042 Views
  • 3 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 7 kudos

Hi @Alina Bella​ ,Like @Hubert Dudek​ mentioned, we have a best practice guide for dropping managed tables. You can find the docs here

  • 7 kudos
2 More Replies
itay
by New Contributor II
  • 1839 Views
  • 2 replies
  • 1 kudos

Streaming with runOnce and groupBy window queries

I have a streaming job running a groupBy query with a Window of 3 days. The query is searching for different types of events.The stream is configured with runOnce and there is a job scheduled for every hour.Now, I'm not sure what data is processed ea...

  • 1839 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @itay k​ ,You will need to take a look at the Progress Reporter. This will show the Micro-batch JSON metrics. For example, the metric called "numInputRows" which will display the number of input rows that it processed for the micro-batch. You will...

  • 1 kudos
1 More Replies
kmartin62
by New Contributor III
  • 5383 Views
  • 9 replies
  • 4 kudos

Resolved! Configure Databricks (spark) context from PyCharm

Hello. I'm trying to connect to Databricks from my IDE (PyCharm) and then run delta table queries from there. However, the cluster I'm trying to access has to give me permission. In this case, I'd go to my cluster, run the cell which gives me permiss...

  • 5383 Views
  • 9 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

"I'm trying to connect to Databricks from my IDE (PyCharm) and then run delta table queries from there."If you are going to deploy later your code to databricks the only solutions which I see is to use databricks-connect or just make development envi...

  • 4 kudos
8 More Replies
prasadvaze
by Valued Contributor II
  • 18752 Views
  • 7 replies
  • 3 kudos

Resolved! How to make delta table column values case-insensitive?

 we have many delta tables with string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case. Its lot of code change to use upper/lower function on column value compare (in ...

  • 18752 Views
  • 7 replies
  • 3 kudos
Latest Reply
lizou
Contributor II
  • 3 kudos

Well, the unintended benefit is now I am using int\big int as surrogate keysfor all tables (preferred in DW). All joins are made on integer data types. Query efficiency is also improved.The string matching using upper() is done only on ETL when com...

  • 3 kudos
6 More Replies
Nuthan_1994
by New Contributor II
  • 3511 Views
  • 3 replies
  • 3 kudos

Resolved! Installing new libraries on Azure Databricks Clusters

Hi Everyone,I was trying to install the newest python version on the Databricks Clusters and it has the runtime version 7.3 LTS, but no matter how many times I try it keeps installing the 3.7.5 version of python.I know that Runtime version 7.3 LTS co...

  • 3511 Views
  • 3 replies
  • 3 kudos
Latest Reply
dazfuller
Contributor III
  • 3 kudos

I've done this before using a custom docker image, but even then the runtime itself continues to use the version of python 3 which is installed as part of the OS. The easiest way to get to a newer version is to use a newer runtime. If you're sticking...

  • 3 kudos
2 More Replies
Anonymous
by Not applicable
  • 1496 Views
  • 1 replies
  • 1 kudos

Resolved! Access to Cluster Logs for non-admins

Suppose I have a DevOps team that needs near real-time access to cluster logs to troubleshoot job failures. What is the best way for me to grant access to view logs without granting them admin access?

  • 1496 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Please use logging option and set destination for sending logs in cluster settings to other Azure Blob or S3 storage (need to be mounted first):

  • 1 kudos
User16857281869
by New Contributor II
  • 2059 Views
  • 1 replies
  • 1 kudos

Resolved! Why do I see a cost explosion in my blob storage account (DBFS storage, blob storage, ...) for my structures streaming job?

Its usually one or more of the following reasons:1) If you are streaming into a table, you should be using .Trigger option to specify the frequency of checkpointing. Otherwise, the job will call the storage API every 10ms to log the transaction data...

  • 2059 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

please mount cheaper storage (LRS) to custom mount and set there checkpoints,please clear data regularly,if you are using forEac/forEatchBatchh in stream it will save every dataframe on dbfs,please remember not to use display() in production,if on th...

  • 1 kudos
User16857281869
by New Contributor II
  • 1662 Views
  • 1 replies
  • 1 kudos

Resolved! What is the best way to do time series analysis and forecasting with Spark?

We have developed a library on spark which makes typical operations on time series much simpler. You can check the repo in Github for more info. You could also check out one of our blogs which demos an implementation of a forecasting usecase with S...

  • 1662 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Currently on databricks there is MLFlow with forecasting option - please check it.

  • 1 kudos
brickster_2018
by Databricks Employee
  • 1187 Views
  • 1 replies
  • 0 kudos
  • 1187 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

This is a lit of configuration keys to enable or alter the blacklist mechanism:spark.blacklist.enabled – set to Truespark.blacklist.task.maxTaskAttemptsPerExecutor (1 by default)spark.blacklist.task.maxTaskAttemptsPerNode (2 by default)spark.blacklis...

  • 0 kudos
DievanB
by New Contributor
  • 1652 Views
  • 1 replies
  • 0 kudos

pyspark: How to run selenium in UDF

Hi all, I am building a webscraper to get prices of certain EAN's from the amazon website. Therefore I use selenium to get the product links. I wrote te following function to get the productlinks based on a EAN: def getProductLinkAmazonPY(EAN): st...

  • 1652 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

UDF functions are serialized and then executed on executors. I don't think it will be possible with Selenium.

  • 0 kudos
User16752244127
by Contributor
  • 14588 Views
  • 3 replies
  • 5 kudos
  • 14588 Views
  • 3 replies
  • 5 kudos
Latest Reply
Atanu
Databricks Employee
  • 5 kudos

Currently supported datasources with Databricks https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/ and may be this block will have more insight - https://blogs.sap.com/2019/10/24/your-sap-on-azure-part-22-read-sap-hana-data-from-azu...

  • 5 kudos
2 More Replies
Emre
by New Contributor II
  • 1378 Views
  • 1 replies
  • 2 kudos

Resolved! The license of JDBC connector for BI vendors

Hey all,We would like to support Databricks in our BI tool, which is an open-source Java application. (See https://github.com/metriql/metriql)In order to connect Databricks, we need to use the JDBC connector similar to the other BI tools such as Look...

  • 1378 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

It doesn't look so bad after all (mean terms and conditions on https://databricks.com/jdbc-odbc-driver-license )but I think the best solution is to open ticket via https://databricks.com/company/contact

  • 2 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels