Happy to share that #WAVICLE​ was able to do a hands-on workshop on #[Databricks notebook]​ #[Databricks SQL]​ #[Databricks cluster]​ Fundamentals with KCT College, Coimbatore, India.
Hi I'm Deiry I'm 25 (almost 26) years old, I'm a Databricks expert Or at least that's my goalI work at Celerik.My goal is to be a certified Machine Learning professional, so here we go
Hi, I am a little confused when I should use STREAM() when we define a table based on a DLT table. There is a pattern explained in the documentation. CREATE OR REFRESH STREAMING LIVE TABLE streaming_bronze
AS SELECT * FROM cloud_files(
"s3://p...
Thanks @Landan George​ Since "streaming_silver" is a streaming live table, I expected the last line of the code to be:AS SELECT count(*) FROM STREAM(LIVE.streaming_silver) GROUP BY user_idBut, as you can see the "live_gold" is defined by: AS SELECT c...
if i mount a gen2(ADLS 1) to another gen2(ADLS2) account and create a delta table on ADLS2 will it copy the data or just create something link External table.i don't want to duplicate the the data.
Hi @keerthi kumar​ ,so basically you can CREATE EXTERNAL TABLES on top of the data stored somewhere - in your case ADLS. Data won't be copied, it will stay where it is, by creating external tables you are actually storing the metadata in your metasto...
Input Data:One batch load of a daily dataset, roughly 10 million items a day of transactions.Another daily batch load of roughly the same size.Each row in one dataset should have a corresponding row in the other dataset.Problem to solve:The problem i...
I've dealt with something similar in the past.There was an order system that had order items that was supposed to be matched up against corresponding products in another system that acted as a master and handled invoicing.As for unqiue considerations...
We have scheduled a dashboard to automatically refresh at some specific time.However, some visualizations in the dashboard don't get refreshed at the scheduled time.Checking the query logs, we found the source queries were properly executed, but the ...
Thank you for the answers.I'm using Databricks SQL environment, not the Data Science & Engineering one. And, I scheduled the dashboard following this guideline: https://docs.databricks.com/sql/user/dashboards/index.html#automatically-refresh-a-dashbo...
I have an Azure service principle that is used for our CI/CD pipelines. We do not have access to the Databricks UI via user logins. Our github repos also require SSO PATs. How can I configure the git integration for the service principal so that I ca...
Hi @David Benedict​, Please go through this Databricks article and let us know if that helps. best-practices-for-integrating-repos-with-cicd-workflows
I have a Databricks notebook and I have several headers, SQL commands and their output. I am currently copying the output and SQL commands manually to excel for a report. How can I reduce the manual work of copy pasting from Notebook to excel and au...
Hi @Abhishek Jain​ , We haven’t heard from you since the last response from me and I was checking back to see if my suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others.Also, Ple...
This may be a tricky question, so please bear with meIn a real life scenario, i have a dataframe (i'm using pyspark) called age, with is a groupBy of other 4 dataframes. I join these 4 so at the end I have a few million rows, but after the groupBy th...
Hi @Marcos Dias​ ,Frankly, I think we need more detail to answer your question:Are these 4 dataframes​ updated their data?How often you use the groupBy-dataframe?
Hi,I am using databricks through azure. I am trying to connect to remote oracle database using jdbc url. I am getting an error of no suitable driver found."java.sql.SQLException: No suitable driver"can somebody help mw with this?
Hi @vikas k​ Hope all is well! Does @Hubert Dudek​ response were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
I have been trying to extract one date field from cosmos which looks like as below:"lastModifiedDate" : { "$date" : 1668443121840 }when the above field is extracted using Databricks it gets converted into a date format which looks like this...
Hi, I manually created a template dashboard with one widget. I wanted to clone this dashboard using the Create Dashboard API however I don't know what to put for the widget object. What I did was use the Retrieve API on the template dashboard, and th...
@Akash Bhat​ how do I clone a dashboard across workspaces? Being able to deploy dashboards (with widgets!) through the API is essential to set up proper Data Engineering workflows.
Hi all,​I have a question which is likely very fundamental. I am passing data from hive to snowflake using pyspark. My query is very simple - "select from table limit 100".​ The table I am querying is very large, but this query can be shown to the co...
Hello all,I'm currently trying to move the tables contained in one azure workspace to another, because of a change in the way we use our resources groups. I have not been able to move more than metadata with the databrickslabs/migrate repo. I was won...
Hi @Quentin Maire​, We haven’t heard from you on the last response from @Pat Sienkiewicz​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpfu...
Hey guys, I'm using petastorm to train DNN, First i convert spark df with make_spark_convertor and then open a reader on the materialized dataset.While i start training session only on subset of the data every thing works fine but when I'm using all...
Same error. This started a few days ago on notebooks that used to run fine in the past. Now, I cannot finish a notebook.I have already disabled almost all output being streamed to the result buffer, but the problem persists. I am left with <50 lines ...