cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

MadelynM
by New Contributor III
  • 509 Views
  • 0 replies
  • 0 kudos

vimeo.com

A job is a way of running a notebook either immediately or on a scheduled basis. Here's a quick video (4:04) on how to schedule a job and automate a workflow for Databricks on AWS. To follow along with the video, import this notebook into your worksp...

  • 509 Views
  • 0 replies
  • 0 kudos
MadelynM
by New Contributor III
  • 480 Views
  • 0 replies
  • 1 kudos

vimeo.com

Auto Loader provides Python and Scala methods to ingest new data from a folder location into a Delta Lake table by using directory listing or file notifications. Here's a quick video (7:00) on how to use Auto Loader for Databricks on AWS with Databri...

  • 480 Views
  • 0 replies
  • 1 kudos
marchello
by New Contributor III
  • 1780 Views
  • 5 replies
  • 6 kudos

Resolved! register model - need python 3, but get only python 2

Hi all, I'm trying to register a model with python 3 support, but continue getting only python 2. I can see that runtime 6.0 and above get python 3 by default, but I don't see a way to set neither runtime version, nor python version during model regi...

  • 1780 Views
  • 5 replies
  • 6 kudos
Latest Reply
marchello
New Contributor III
  • 6 kudos

Hi team, thanks for getting back to me. Let's put this on hold for now. I will update once it's needed again. It was solely for education purpose and right now I have quite urgent stuff to do.Have a great day. 

  • 6 kudos
4 More Replies
Murugan
by New Contributor II
  • 2494 Views
  • 4 replies
  • 1 kudos

Databricks interoperability between cloud environments

While Databricks is currently available and integrated into all three major cloud platforms (Azure, AWS, GCP) , following are pertinent questions that comes across in the real-world scenarios,1) Whether Databricks can be cloud agnostic (i.e.,) In ca...

  • 2494 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

You'll be interested in the Unity Catalog.The notebooks should be the same across all the clouds and there are no syntax differences. The key things are going to be just changing paths from S3 to ADL2 and having different usernames/logins across the...

  • 1 kudos
3 More Replies
as999
by New Contributor III
  • 1154 Views
  • 3 replies
  • 1 kudos

python dataframe or hiveSql update based on predecessor value?

I have a million in rows that I need to update which looks for the highest count of the predecessor from the same source data and replaces the same value on a different row.  For example.Original DF.sno Object Name  shape  rating1  Fruit apple round ...

  • 1154 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

basically you have to create a dataframe (or use a window function, that will also work) which gives you the group combination with the most occurances. So a window/groupby on object, name, shape with a count().Then you have to determine which shape...

  • 1 kudos
2 More Replies
Sam
by New Contributor III
  • 925 Views
  • 1 replies
  • 4 kudos

collect_set/ collect_list Pushdown

Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?

  • 925 Views
  • 1 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled

  • 4 kudos
WayneDeleersnyd
by New Contributor III
  • 4929 Views
  • 11 replies
  • 0 kudos

Resolved! Unable to view exported notebooks in HTML format

My team and I noticed an issue lately where notebooks, when exported to HTML format, are not viewable in a stand-alone state anymore. Older notebooks which were exported have no issues, but newer exports are not viewable. The only way we can view t...

  • 4929 Views
  • 11 replies
  • 0 kudos
Latest Reply
cconnell
Contributor II
  • 0 kudos

I can confirm that the Community Edition now does correct readable HTML export.

  • 0 kudos
10 More Replies
User16826992666
by Valued Contributor
  • 1679 Views
  • 3 replies
  • 0 kudos

If our company has an Enterprise Git server deployed on a private network, can we use Repos?

Our team would like to use the Repos functionality but our security prevents outside traffic through public networks. Is there any way we can still use Repos?

  • 1679 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16781336501
New Contributor III
  • 0 kudos

Please contact your account team for some options that are in preview right now.

  • 0 kudos
2 More Replies
Siddhesh2525
by New Contributor III
  • 4968 Views
  • 2 replies
  • 6 kudos

How to pass dynamic value in databricks

I have separate column value defined in 13 diffrent notebook and i want merge into 1 databrick notebook and want to pass dynamic parameter using databrick so it will help me to run in single databricks notebook .

  • 4968 Views
  • 2 replies
  • 6 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 6 kudos

Hi @siddhesh Bhavar​ you can use widgets with the %run command to achieve this. https://docs.databricks.com/notebooks/widgets.html#use-widgets-with-run%run /path/to/notebook $X="10" $Y="1"

  • 6 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 5240 Views
  • 5 replies
  • 12 kudos

The database and tables disappears when I delete the cluster

Hi guys,I have a trial databricks account, I realized that when I shutdown the cluster my databases and tables is disappear .. that is correct or thats is because my account is trial ?

  • 5240 Views
  • 5 replies
  • 12 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 12 kudos

@William Scardua​ if it's an external hive metastore or Glue catalog you might be missing the configuration on the cluster. https://docs.databricks.com/data/metastores/index.htmlAlso as mentioned by @Hubert Dudek​ , if it's a community edition then t...

  • 12 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 6817 Views
  • 6 replies
  • 3 kudos

Resolved! How do you create a Sandbox in your data environment ?

Hi guys,How do you create a Sandbox in your data environment ? have any idea ?Azzure/AWS + Data Lake + Databricks

  • 6817 Views
  • 6 replies
  • 3 kudos
Latest Reply
missyT
New Contributor III
  • 3 kudos

In a sandbox environment, you will find the Designer enabled. You can activate Designer by selecting the design icon Designer. on a page, or by choosing the Design menu item in the Settings Settings menu.

  • 3 kudos
5 More Replies
Chris_Shehu
by Valued Contributor III
  • 5043 Views
  • 2 replies
  • 10 kudos

Resolved! When trying to use pyodbc connector to write files to SQL server receiving error. java.lang.ClassNotFoundException Any alternatives or ways to fix this?

jdbcUsername = ******** jdbcPassword = *************** server_name = "jdbc:sqlserver://***********:******" database_name = "********" url = server_name + ";" + "databaseName=" + database_name + ";"   table_name = "PatientTEST"   try: df.write \ ...

  • 5043 Views
  • 2 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

please check following code:df.write.jdbc( url="jdbc:sqlserver://<host>:1433;database=<db>;user=<user>;password=<password>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;driver=com.microsof...

  • 10 kudos
1 More Replies
Ericsson
by New Contributor II
  • 2126 Views
  • 2 replies
  • 1 kudos

SQL week format issue its not showing result as 01(ww)

Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.

result
  • 2126 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lauri
New Contributor III
  • 1 kudos

You can use lpad() to achieve the 'ww' format.

  • 1 kudos
1 More Replies
Braxx
by Contributor II
  • 9095 Views
  • 12 replies
  • 2 kudos

Resolved! Validate a schema of json in column

I have a dataframe like below with col2 as key-value pairs. I would like to filter col2 to only the rows with a valid schema. There could be many of pairs, sometimes less, sometimes more and this is fine as long as the structure is fine. Nulls in col...

df
  • 9095 Views
  • 12 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Bartosz Wachocki​ - Thank you for sharing your solution and marking it as best.

  • 2 kudos
11 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels