Data Engineering

Forum Posts

Sorted by:

by siva_thiru • Contributor

11-15-2022 11:50:20 PM

574 Views
0 replies
6 kudos

Happy to share that #WAVICLE was able to do a hands-on workshop on #[Databricks notebook] #[Databricks SQL] #[Databricks cluster] Fundamentals wi...

Happy to share that #WAVICLE was able to do a hands-on workshop on #[Databricks notebook] #[Databricks SQL] #[Databricks cluster] Fundamentals with KCT College, Coimbatore, India.

Data Engineering

574 Views
0 replies
6 kudos

11-15-2022 11:50:20 PM

by Deiry • New Contributor III

11-15-2022 1:53:55 PM

445 Views
1 replies
3 kudos

Hi I'm Deiry &#xd83d;&#xde0a; I'm 25 (almost 26) years old, I'm a Databricks expert &#xd83d;&#xde0e; Or at least that's my goal I work at Celerik....

Hi I'm Deiry I'm 25 (almost 26) years old, I'm a Databricks expert Or at least that's my goalI work at Celerik.My goal is to be a certified Machine Learning professional, so here we go

Data Engineering

445 Views
1 replies
3 kudos

11-15-2022 1:53:55 PM

View Replies

Latest Reply

NhatHoang
Valued Contributor II

11-15-2022 11:35:20 PM

3 kudos

Very confident, go ahead. :D

3 kudos

11-15-2022 11:35:20 PM

by Mado • Valued Contributor II

11-15-2022 3:17:36 AM

843 Views
3 replies
1 kudos

Resolved! When should I use STREAM() when defining a DLT table?

Hi, I am a little confused when I should use STREAM() when we define a table based on a DLT table. There is a pattern explained in the documentation. CREATE OR REFRESH STREAMING LIVE TABLE streaming_bronze AS SELECT * FROM cloud_files( "s3://p...

Data Engineering

843 Views
3 replies
1 kudos

11-15-2022 3:17:36 AM

View Replies

Latest Reply

Mado
Valued Contributor II

11-15-2022 4:14:55 PM

1 kudos

Thanks @Landan George Since "streaming_silver" is a streaming live table, I expected the last line of the code to be:AS SELECT count(*) FROM STREAM(LIVE.streaming_silver) GROUP BY user_idBut, as you can see the "live_gold" is defined by: AS SELECT c...

1 kudos

11-15-2022 4:14:55 PM

2 More Replies

by kkumar • New Contributor III

11-14-2022 11:36:07 PM

807 Views
2 replies
2 kudos

ADLS Gen 2 Delta Tables memory allocation

if i mount a gen2(ADLS 1) to another gen2(ADLS2) account and create a delta table on ADLS2 will it copy the data or just create something link External table.i don't want to duplicate the the data.

Data Engineering

807 Views
2 replies
2 kudos

11-14-2022 11:36:07 PM

View Replies

Latest Reply

Pat
Honored Contributor III

11-15-2022 7:16:44 AM

2 kudos

Hi @keerthi kumar ,so basically you can CREATE EXTERNAL TABLES on top of the data stored somewhere - in your case ADLS. Data won't be copied, it will stay where it is, by creating external tables you are actually storing the metadata in your metasto...

2 kudos

11-15-2022 7:16:44 AM

1 More Replies

by Michael42 • New Contributor III

11-11-2022 3:08:58 PM

751 Views
2 replies
1 kudos

Would like to start a discussion regarding techniques for joining two relatively large tables of roughly equal size on a daily basis. I realize this may be a bit of a conundrum with databricks, but review the details.

Input Data:One batch load of a daily dataset, roughly 10 million items a day of transactions.Another daily batch load of roughly the same size.Each row in one dataset should have a corresponding row in the other dataset.Problem to solve:The problem i...

Data Engineering

751 Views
2 replies
1 kudos

11-11-2022 3:08:58 PM

View Replies

Latest Reply

Lennart
New Contributor II

11-13-2022 11:13:28 AM

1 kudos

I've dealt with something similar in the past.There was an order system that had order items that was supposed to be matched up against corresponding products in another system that acted as a master and handled invoicing.As for unqiue considerations...

1 kudos

11-13-2022 11:13:28 AM

1 More Replies

by noimeta • Contributor II

11-06-2022 8:42:24 PM

3898 Views
6 replies
4 kudos

Resolved! Databricks SQL dashboard refresh

We have scheduled a dashboard to automatically refresh at some specific time.However, some visualizations in the dashboard don't get refreshed at the scheduled time.Checking the query logs, we found the source queries were properly executed, but the ...

Data Engineering

3898 Views
6 replies
4 kudos

11-06-2022 8:42:24 PM

View Replies

Latest Reply

noimeta
Contributor II

11-08-2022 6:35:19 PM

4 kudos

Thank you for the answers.I'm using Databricks SQL environment, not the Data Science & Engineering one. And, I scheduled the dashboard following this guideline: https://docs.databricks.com/sql/user/dashboards/index.html#automatically-refresh-a-dashbo...

4 kudos

11-08-2022 6:35:19 PM

5 More Replies

by Dave_B_ • New Contributor III

11-09-2022 5:58:28 PM

1868 Views
5 replies
2 kudos

Resolved! Git Integration Configuration via Command Line or API

I have an Azure service principle that is used for our CI/CD pipelines. We do not have access to the Databricks UI via user logins. Our github repos also require SSO PATs. How can I configure the git integration for the service principal so that I ca...

Data Engineering

1868 Views
5 replies
2 kudos

11-09-2022 5:58:28 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-13-2022 12:32:18 PM

2 kudos

Hi @David Benedict, Please go through this Databricks article and let us know if that helps. best-practices-for-integrating-repos-with-cicd-workflows

2 kudos

11-13-2022 12:32:18 PM

4 More Replies

by AJ270990 • Contributor II

11-09-2022 10:16:10 PM

4508 Views
2 replies
4 kudos

Resolved! Export commands and output of the Databricks Notebook to MS Excel

I have a Databricks notebook and I have several headers, SQL commands and their output. I am currently copying the output and SQL commands manually to excel for a report. How can I reduce the manual work of copy pasting from Notebook to excel and au...

Data Engineering

4508 Views
2 replies
4 kudos

11-09-2022 10:16:10 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-15-2022 1:55:49 AM

4 kudos

Hi @Abhishek Jain , We haven’t heard from you since the last response from me and I was checking back to see if my suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others.Also, Ple...

4 kudos

11-15-2022 1:55:49 AM

1 More Replies

by markdias • New Contributor II

10-06-2022 7:52:25 AM

755 Views
3 replies
2 kudos

Which is quicker: grouping a table that is a join of several others or querying data?

This may be a tricky question, so please bear with meIn a real life scenario, i have a dataframe (i'm using pyspark) called age, with is a groupBy of other 4 dataframes. I join these 4 so at the end I have a few million rows, but after the groupBy th...

Data Engineering

755 Views
3 replies
2 kudos

10-06-2022 7:52:25 AM

View Replies

Latest Reply

NhatHoang
Valued Contributor II

11-15-2022 1:14:38 AM

2 kudos

Hi @Marcos Dias ,Frankly, I think we need more detail to answer your question:Are these 4 dataframes updated their data?How often you use the groupBy-dataframe?

2 kudos

11-15-2022 1:14:38 AM

2 More Replies

by aarave • New Contributor III

10-06-2022 3:44:44 AM

1658 Views
8 replies
5 kudos

remote database connection error

Hi,I am using databricks through azure. I am trying to connect to remote oracle database using jdbc url. I am getting an error of no suitable driver found."java.sql.SQLException: No suitable driver"can somebody help mw with this?

Data Engineering

1658 Views
8 replies
5 kudos

10-06-2022 3:44:44 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-15-2022 12:47:02 AM

5 kudos

Hi @vikas k Hope all is well! Does @Hubert Dudek response were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

5 kudos

11-15-2022 12:47:02 AM

7 More Replies

by Swapnil1998 • New Contributor III

11-14-2022 11:28:46 PM

636 Views
0 replies
2 kudos

Date Formats while extracting data from Cosmos Mongo DB using Azure Databricks.

I have been trying to extract one date field from cosmos which looks like as below:"lastModifiedDate" : { "$date" : 1668443121840 }when the above field is extracted using Databricks it gets converted into a date format which looks like this...

Data Engineering

636 Views
0 replies
2 kudos

11-14-2022 11:28:46 PM

by bl12 • New Contributor II

07-26-2022 12:45:49 PM

1634 Views
2 replies
1 kudos

Resolved! Use API to Clone Dashboards with Widgets in Databricks SQL?

Hi, I manually created a template dashboard with one widget. I wanted to clone this dashboard using the Create Dashboard API however I don't know what to put for the widget object. What I did was use the Retrieve API on the template dashboard, and th...

Data Engineering

1634 Views
2 replies
1 kudos

07-26-2022 12:45:49 PM

View Replies

Latest Reply

Wout
Contributor

11-14-2022 10:42:45 PM

1 kudos

@Akash Bhat how do I clone a dashboard across workspaces? Being able to deploy dashboards (with widgets!) through the API is essential to set up proper Data Engineering workflows.

1 kudos

11-14-2022 10:42:45 PM

1 More Replies

by csw77 • New Contributor

11-14-2022 5:58:05 PM

331 Views
0 replies
0 kudos

Simple PySpark query very slow in pushing to snowflake

Hi all,I have a question which is likely very fundamental. I am passing data from hive to snowflake using pyspark. My query is very simple - "select from table limit 100". The table I am querying is very large, but this query can be shown to the co...

Data Engineering

331 Views
0 replies
0 kudos

11-14-2022 5:58:05 PM

by wats0ns • New Contributor III

11-07-2022 8:37:22 AM

9534 Views
9 replies
10 kudos

Resolved! Migrate tables from one azure databricks workspace to another

Hello all,I'm currently trying to move the tables contained in one azure workspace to another, because of a change in the way we use our resources groups. I have not been able to move more than metadata with the databrickslabs/migrate repo. I was won...

Data Engineering

9534 Views
9 replies
10 kudos

11-07-2022 8:37:22 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-14-2022 8:08:06 AM

10 kudos

Hi @Quentin Maire, We haven’t heard from you on the last response from @Pat Sienkiewicz, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpfu...

10 kudos

11-14-2022 8:08:06 AM

8 More Replies

by Orianh • Valued Contributor II

09-07-2022 1:03:07 AM

16304 Views
25 replies
35 kudos

Fatal error: Python kernel is unresponsive

Hey guys, I'm using petastorm to train DNN, First i convert spark df with make_spark_convertor and then open a reader on the materialized dataset.While i start training session only on subset of the data every thing works fine but when I'm using all...

Data Engineering

16304 Views
25 replies
35 kudos

09-07-2022 1:03:07 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-11-2022 7:26:56 PM

35 kudos

Same error. This started a few days ago on notebooks that used to run fine in the past. Now, I cannot finish a notebook.I have already disabled almost all output being streamed to the result buffer, but the problem persists. I am left with <50 lines ...

35 kudos

11-11-2022 7:26:56 PM

24 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Happy to share that #WAVICLE was able to do a hands-on workshop on #[Databricks notebook] #[Databricks SQL] #[Databricks cluster] Fundamentals wi...

Hi I'm Deiry &#xd83d;&#xde0a; I'm 25 (almost 26) years old, I'm a Databricks expert &#xd83d;&#xde0e; Or at least that's my goal I work at Celerik....

Resolved! When should I use STREAM() when defining a DLT table?

ADLS Gen 2 Delta Tables memory allocation

Would like to start a discussion regarding techniques for joining two relatively large tables of roughly equal size on a daily basis. I realize this may be a bit of a conundrum with databricks, but review the details.

Resolved! Databricks SQL dashboard refresh

Resolved! Git Integration Configuration via Command Line or API

Resolved! Export commands and output of the Databricks Notebook to MS Excel

Which is quicker: grouping a table that is a join of several others or querying data?

remote database connection error

Date Formats while extracting data from Cosmos Mongo DB using Azure Databricks.

Resolved! Use API to Clone Dashboards with Widgets in Databricks SQL?

Simple PySpark query very slow in pushing to snowflake

Resolved! Migrate tables from one azure databricks workspace to another

Fatal error: Python kernel is unresponsive

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...