Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
A job is a way of running a notebook either immediately or on a scheduled basis. Here's a quick video (4:04) on how to schedule a job and automate a workflow for Databricks on AWS. To follow along with the video, import this notebook into your worksp...
Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...
Lets ask me List of 300+ Quality Marketing, Business, SEO, Tech & Wordpress Guest Blogging Sites That Accept Guest Posts.https://letsaskme.com/digital-marketing/free-paid-guest-posting-blog-post-websites-list-2020/#guestpost​ #blogger​
Hi all, I'm trying to register a model with python 3 support, but continue getting only python 2. I can see that runtime 6.0 and above get python 3 by default, but I don't see a way to set neither runtime version, nor python version during model regi...
Hi team, thanks for getting back to me. Let's put this on hold for now. I will update once it's needed again. It was solely for education purpose and right now I have quite urgent stuff to do.Have a great day.
While Databricks is currently available and integrated into all three major cloud platforms (Azure, AWS, GCP) , following are pertinent questions that comes across in the real-world scenarios,1) Whether Databricks can be cloud agnostic (i.e.,) In ca...
You'll be interested in the Unity Catalog.The notebooks should be the same across all the clouds and there are no syntax differences. The key things are going to be just changing paths from S3 to ADL2 and having different usernames/logins across the...
I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row. For example.Original DF.sno Object Name shape rating1 Fruit apple round ...
basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().Then you have to determine which shape...
Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?
Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled
My team and I noticed an issue lately where notebooks, when exported to HTML format, are not viewable in a stand-alone state anymore. Older notebooks which were exported have no issues, but newer exports are not viewable. The only way we can view t...
Our team would like to use the Repos functionality but our security prevents outside traffic through public networks. Is there any way we can still use Repos?
I have separate column value defined in 13 diffrent notebook and i want merge into 1 databrick notebook and want to pass dynamic parameter using databrick so it will help me to run in single databricks notebook .
Hi @siddhesh Bhavar​ you can use widgets with the %run command to achieve this. https://docs.databricks.com/notebooks/widgets.html#use-widgets-with-run%run /path/to/notebook $X="10" $Y="1"
Hi guys,I have a trial databricks account, I realized that when I shutdown the cluster my databases and tables is disappear .. that is correct or thats is because my account is trial ?
@William Scardua​ if it's an external hive metastore or Glue catalog you might be missing the configuration on the cluster. https://docs.databricks.com/data/metastores/index.htmlAlso as mentioned by @Hubert Dudek​ , if it's a community edition then t...
In a sandbox environment, you will find the Designer enabled. You can activate Designer by selecting the design icon Designer. on a page, or by choosing the Design menu item in the Settings Settings menu.
please check following code:df.write.jdbc(
url="jdbc:sqlserver://<host>:1433;database=<db>;user=<user>;password=<password>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;driver=com.microsof...
Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.
I have a dataframe like below with col2 as key-value pairs. I would like to filter col2 to only the rows with a valid schema. There could be many of pairs, sometimes less, sometimes more and this is fine as long as the structure is fine. Nulls in col...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.