cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MadelynM
by Databricks Employee
  • 706 Views
  • 0 replies
  • 0 kudos

vimeo.com

A job is a way of running a notebook either immediately or on a scheduled basis. Here's a quick video (4:04) on how to schedule a job and automate a workflow for Databricks on AWS. To follow along with the video, import this notebook into your worksp...

  • 706 Views
  • 0 replies
  • 0 kudos
MadelynM
by Databricks Employee
  • 821 Views
  • 0 replies
  • 1 kudos

vimeo.com

Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...

  • 821 Views
  • 0 replies
  • 1 kudos
marchello
by New Contributor III
  • 2634 Views
  • 5 replies
  • 6 kudos

Resolved! register model - need python 3, but get only python 2

Hi all, I'm trying to register a model with python 3 support, but continue getting only python 2. I can see that runtime 6.0 and above get python 3 by default, but I don't see a way to set neither runtime version, nor python version during model regi...

  • 2634 Views
  • 5 replies
  • 6 kudos
Latest Reply
marchello
New Contributor III
  • 6 kudos

Hi team, thanks for getting back to me. Let's put this on hold for now. I will update once it's needed again. It was solely for education purpose and right now I have quite urgent stuff to do.Have a great day. 

  • 6 kudos
4 More Replies
Murugan
by New Contributor II
  • 3487 Views
  • 4 replies
  • 1 kudos

Databricks interoperability between cloud environments

While Databricks is currently available and integrated into all three major cloud platforms (Azure, AWS, GCP) , following are pertinent questions that comes across in the real-world scenarios,1) Whether Databricks can be cloud agnostic (i.e.,) In ca...

  • 3487 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

You'll be interested in the Unity Catalog.The notebooks should be the same across all the clouds and there are no syntax differences. The key things are going to be just changing paths from S3 to ADL2 and having different usernames/logins across the...

  • 1 kudos
3 More Replies
as999
by New Contributor III
  • 1648 Views
  • 3 replies
  • 1 kudos

python dataframe or hiveSql update based on predecessor value?

I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row.  For example.Original DF.sno Object Name  shape  rating1  Fruit apple round ...

  • 1648 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().Then you have to determine which shape...

  • 1 kudos
2 More Replies
Sam
by New Contributor III
  • 1390 Views
  • 1 replies
  • 4 kudos

collect_set/ collect_list Pushdown

Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?

  • 1390 Views
  • 1 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled

  • 4 kudos
WayneDeleersnyd
by New Contributor III
  • 7391 Views
  • 11 replies
  • 0 kudos

Resolved! Unable to view exported notebooks in HTML format

My team and I noticed an issue lately where notebooks, when exported to HTML format, are not viewable in a stand-alone state anymore. Older notebooks which were exported have no issues, but newer exports are not viewable. The only way we can view t...

  • 7391 Views
  • 11 replies
  • 0 kudos
Latest Reply
cconnell
Contributor II
  • 0 kudos

I can confirm that the Community Edition now does correct readable HTML export.

  • 0 kudos
10 More Replies
User16826992666
by Valued Contributor
  • 2343 Views
  • 3 replies
  • 0 kudos

If our company has an Enterprise Git server deployed on a private network, can we use Repos?

Our team would like to use the Repos functionality but our security prevents outside traffic through public networks. Is there any way we can still use Repos?

  • 2343 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16781336501
Databricks Employee
  • 0 kudos

Please contact your account team for some options that are in preview right now.

  • 0 kudos
2 More Replies
Siddhesh2525
by New Contributor III
  • 5764 Views
  • 2 replies
  • 6 kudos

How to pass dynamic value in databricks

I have separate column value defined in 13 diffrent notebook and i want merge into 1 databrick notebook and want to pass dynamic parameter using databrick so it will help me to run in single databricks notebook .

  • 5764 Views
  • 2 replies
  • 6 kudos
Latest Reply
Prabakar
Databricks Employee
  • 6 kudos

Hi @siddhesh Bhavar​ you can use widgets with the %run command to achieve this. https://docs.databricks.com/notebooks/widgets.html#use-widgets-with-run%run /path/to/notebook $X="10" $Y="1"

  • 6 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 7189 Views
  • 5 replies
  • 12 kudos

The database and tables disappears when I delete the cluster

Hi guys,I have a trial databricks account, I realized that when I shutdown the cluster my databases and tables is disappear .. that is correct or thats is because my account is trial ?

  • 7189 Views
  • 5 replies
  • 12 kudos
Latest Reply
Prabakar
Databricks Employee
  • 12 kudos

@William Scardua​ if it's an external hive metastore or Glue catalog you might be missing the configuration on the cluster. https://docs.databricks.com/data/metastores/index.htmlAlso as mentioned by @Hubert Dudek​ , if it's a community edition then t...

  • 12 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 10025 Views
  • 6 replies
  • 3 kudos

Resolved! How do you create a Sandbox in your data environment ?

Hi guys,How do you create a Sandbox in your data environment ? have any idea ?Azzure/AWS + Data Lake + Databricks

  • 10025 Views
  • 6 replies
  • 3 kudos
Latest Reply
missyT
New Contributor III
  • 3 kudos

In a sandbox environment, you will find the Designer enabled. You can activate Designer by selecting the design icon Designer. on a page, or by choosing the Design menu item in the Settings Settings menu.

  • 3 kudos
5 More Replies
Chris_Shehu
by Valued Contributor III
  • 5864 Views
  • 2 replies
  • 10 kudos

Resolved! When trying to use pyodbc connector to write files to SQL server receiving error. java.lang.ClassNotFoundException Any alternatives or ways to fix this?

jdbcUsername = ******** jdbcPassword = *************** server_name = "jdbc:sqlserver://***********:******" database_name = "********" url = server_name + ";" + "databaseName=" + database_name + ";"   table_name = "PatientTEST"   try: df.write \ ...

  • 5864 Views
  • 2 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

please check following code:df.write.jdbc( url="jdbc:sqlserver://<host>:1433;database=<db>;user=<user>;password=<password>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;driver=com.microsof...

  • 10 kudos
1 More Replies
Ericsson
by New Contributor II
  • 3482 Views
  • 2 replies
  • 1 kudos

SQL week format issue its not showing result as 01(ww)

Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.

result
  • 3482 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lauri
New Contributor III
  • 1 kudos

You can use lpad() to achieve the 'ww' format.

  • 1 kudos
1 More Replies
Braxx
by Contributor II
  • 13857 Views
  • 11 replies
  • 2 kudos

Resolved! Validate a schema of json in column

I have a dataframe like below with col2 as key-value pairs. I would like to filter col2 to only the rows with a valid schema. There could be many of pairs, sometimes less, sometimes more and this is fine as long as the structure is fine. Nulls in col...

df
  • 13857 Views
  • 11 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Bartosz Wachocki​ - Thank you for sharing your solution and marking it as best.

  • 2 kudos
10 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels