cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

siva_thiru
by Contributor
  • 574 Views
  • 0 replies
  • 6 kudos

Happy to share that #WAVICLE​  was able to do a hands-on workshop on #[Databricks notebook]​ #[Databricks SQL]​ #[Databricks cluster]​ Fundamentals wi...

Happy to share that #WAVICLE​  was able to do a hands-on workshop on #[Databricks notebook]​ #[Databricks SQL]​ #[Databricks cluster]​ Fundamentals with KCT College, Coimbatore, India.

Workshop Standee
  • 574 Views
  • 0 replies
  • 6 kudos
Deiry
by New Contributor III
  • 445 Views
  • 1 replies
  • 3 kudos

Hi I'm Deiry �� I'm 25 (almost 26) years old, I'm a Databricks expert ��  Or at least that's my goal I work at Celerik....

Hi I'm Deiry I'm 25 (almost 26) years old, I'm a Databricks expert Or at least that's my goalI work at Celerik.My goal is to be a certified Machine Learning professional, so here we go

  • 445 Views
  • 1 replies
  • 3 kudos
Latest Reply
NhatHoang
Valued Contributor II
  • 3 kudos

Very confident, go ahead. :D​

  • 3 kudos
Mado
by Valued Contributor II
  • 843 Views
  • 3 replies
  • 1 kudos

Resolved! When should I use STREAM() when defining a DLT table?

Hi, I am a little confused when I should use STREAM() when we define a table based on a DLT table. There is a pattern explained in the documentation. CREATE OR REFRESH STREAMING LIVE TABLE streaming_bronze   AS SELECT * FROM cloud_files(   "s3://p...

  • 843 Views
  • 3 replies
  • 1 kudos
Latest Reply
Mado
Valued Contributor II
  • 1 kudos

Thanks @Landan George​ Since "streaming_silver" is a streaming live table, I expected the last line of the code to be:AS SELECT count(*) FROM STREAM(LIVE.streaming_silver) GROUP BY user_idBut, as you can see the "live_gold" is defined by: AS SELECT c...

  • 1 kudos
2 More Replies
kkumar
by New Contributor III
  • 807 Views
  • 2 replies
  • 2 kudos

ADLS Gen 2 Delta Tables memory allocation

if i mount a gen2(ADLS 1) to another gen2(ADLS2) account and create a delta table on ADLS2 will it copy the data or just create something link External table.i don't want to duplicate the the data.

  • 807 Views
  • 2 replies
  • 2 kudos
Latest Reply
Pat
Honored Contributor III
  • 2 kudos

Hi @keerthi kumar​ ,so basically you can CREATE EXTERNAL TABLES on top of the data stored somewhere - in your case ADLS. Data won't be copied, it will stay where it is, by creating external tables you are actually storing the metadata in your metasto...

  • 2 kudos
1 More Replies
Michael42
by New Contributor III
  • 751 Views
  • 2 replies
  • 1 kudos

Would like to start a discussion regarding techniques for joining two relatively large tables of roughly equal size on a daily basis. I realize this may be a bit of a conundrum with databricks, but review the details.

Input Data:One batch load of a daily dataset, roughly 10 million items a day of transactions.Another daily batch load of roughly the same size.Each row in one dataset should have a corresponding row in the other dataset.Problem to solve:The problem i...

  • 751 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lennart
New Contributor II
  • 1 kudos

I've dealt with something similar in the past.There was an order system that had order items that was supposed to be matched up against corresponding products in another system that acted as a master and handled invoicing.As for unqiue considerations...

  • 1 kudos
1 More Replies
noimeta
by Contributor II
  • 3898 Views
  • 6 replies
  • 4 kudos

Resolved! Databricks SQL dashboard refresh

We have scheduled a dashboard to automatically refresh at some specific time.However, some visualizations in the dashboard don't get refreshed at the scheduled time.Checking the query logs, we found the source queries were properly executed, but the ...

  • 3898 Views
  • 6 replies
  • 4 kudos
Latest Reply
noimeta
Contributor II
  • 4 kudos

Thank you for the answers.I'm using Databricks SQL environment, not the Data Science & Engineering one. And, I scheduled the dashboard following this guideline: https://docs.databricks.com/sql/user/dashboards/index.html#automatically-refresh-a-dashbo...

  • 4 kudos
5 More Replies
Dave_B_
by New Contributor III
  • 1868 Views
  • 5 replies
  • 2 kudos

Resolved! Git Integration Configuration via Command Line or API

I have an Azure service principle that is used for our CI/CD pipelines. We do not have access to the Databricks UI via user logins. Our github repos also require SSO PATs. How can I configure the git integration for the service principal so that I ca...

  • 1868 Views
  • 5 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @David Benedict​, Please go through this Databricks article and let us know if that helps. best-practices-for-integrating-repos-with-cicd-workflows

  • 2 kudos
4 More Replies
AJ270990
by Contributor II
  • 4508 Views
  • 2 replies
  • 4 kudos

Resolved! Export commands and output of the Databricks Notebook to MS Excel

I have a Databricks notebook and I have several headers, SQL commands and their output. I am currently copying the output and SQL commands manually to excel for a report. How can I reduce the manual work of copy pasting from Notebook to excel and au...

  • 4508 Views
  • 2 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @Abhishek Jain​ , We haven’t heard from you since the last response from me and I was checking back to see if my suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others.Also, Ple...

  • 4 kudos
1 More Replies
markdias
by New Contributor II
  • 755 Views
  • 3 replies
  • 2 kudos

Which is quicker: grouping a table that is a join of several others or querying data?

This may be a tricky question, so please bear with meIn a real life scenario, i have a dataframe (i'm using pyspark) called age, with is a groupBy of other 4 dataframes. I join these 4 so at the end I have a few million rows, but after the groupBy th...

  • 755 Views
  • 3 replies
  • 2 kudos
Latest Reply
NhatHoang
Valued Contributor II
  • 2 kudos

Hi @Marcos Dias​ ,Frankly, I think we need more detail to answer your question:Are these 4 dataframes​ updated their data?How often you use the groupBy-dataframe?

  • 2 kudos
2 More Replies
aarave
by New Contributor III
  • 1658 Views
  • 8 replies
  • 5 kudos

remote database connection error

Hi,I am using databricks through azure. I am trying to connect to remote oracle database using jdbc url. I am getting an error of no suitable driver found."java.sql.SQLException: No suitable driver"can somebody help mw with this?

  • 1658 Views
  • 8 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @vikas k​ Hope all is well! Does @Hubert Dudek​  response were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 5 kudos
7 More Replies
Swapnil1998
by New Contributor III
  • 636 Views
  • 0 replies
  • 2 kudos

Date Formats while extracting data from Cosmos Mongo DB using Azure Databricks.

I have been trying to extract one date field from cosmos which looks like as below:"lastModifiedDate" : {        "$date" : 1668443121840    }when the above field is extracted using Databricks it gets converted into a date format which looks like this...

  • 636 Views
  • 0 replies
  • 2 kudos
bl12
by New Contributor II
  • 1634 Views
  • 2 replies
  • 1 kudos

Resolved! Use API to Clone Dashboards with Widgets in Databricks SQL?

Hi, I manually created a template dashboard with one widget. I wanted to clone this dashboard using the Create Dashboard API however I don't know what to put for the widget object. What I did was use the Retrieve API on the template dashboard, and th...

  • 1634 Views
  • 2 replies
  • 1 kudos
Latest Reply
Wout
Contributor
  • 1 kudos

@Akash Bhat​ how do I clone a dashboard across workspaces? Being able to deploy dashboards (with widgets!) through the API is essential to set up proper Data Engineering workflows.

  • 1 kudos
1 More Replies
csw77
by New Contributor
  • 331 Views
  • 0 replies
  • 0 kudos

Simple PySpark query very slow in pushing to snowflake

Hi all,​I have a question which is likely very fundamental. I am passing data from hive to snowflake using pyspark. My query is very simple - "select from table limit 100".​ The table I am querying is very large, but this query can be shown to the co...

  • 331 Views
  • 0 replies
  • 0 kudos
wats0ns
by New Contributor III
  • 9534 Views
  • 9 replies
  • 10 kudos

Resolved! Migrate tables from one azure databricks workspace to another

Hello all,I'm currently trying to move the tables contained in one azure workspace to another, because of a change in the way we use our resources groups. I have not been able to move more than metadata with the databrickslabs/migrate repo. I was won...

  • 9534 Views
  • 9 replies
  • 10 kudos
Latest Reply
Kaniz
Community Manager
  • 10 kudos

Hi @Quentin Maire​, We haven’t heard from you on the last response from @Pat Sienkiewicz​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpfu...

  • 10 kudos
8 More Replies
Orianh
by Valued Contributor II
  • 16304 Views
  • 25 replies
  • 35 kudos

Fatal error: Python kernel is unresponsive

Hey guys, I'm using petastorm to train DNN, First i convert spark df with make_spark_convertor and then open a reader on the materialized dataset.While i start training session only on subset of the data every thing works fine but when I'm using all...

  • 16304 Views
  • 25 replies
  • 35 kudos
Latest Reply
Anonymous
Not applicable
  • 35 kudos

Same error. This started a few days ago on notebooks that used to run fine in the past. Now, I cannot finish a notebook.I have already disabled almost all output being streamed to the result buffer, but the problem persists. I am left with <50 lines ...

  • 35 kudos
24 More Replies
Labels
Top Kudoed Authors