cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Michael42
by New Contributor III
  • 2377 Views
  • 2 replies
  • 1 kudos

Would like to start a discussion regarding techniques for joining two relatively large tables of roughly equal size on a daily basis. I realize this may be a bit of a conundrum with databricks, but review the details.

Input Data:One batch load of a daily dataset, roughly 10 million items a day of transactions.Another daily batch load of roughly the same size.Each row in one dataset should have a corresponding row in the other dataset.Problem to solve:The problem i...

  • 2377 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lennart
New Contributor II
  • 1 kudos

I've dealt with something similar in the past.There was an order system that had order items that was supposed to be matched up against corresponding products in another system that acted as a master and handled invoicing.As for unqiue considerations...

  • 1 kudos
1 More Replies
Yaswanth
by New Contributor III
  • 20191 Views
  • 2 replies
  • 12 kudos

Resolved! How can Delta table protocol version be downgraded from higher version to lower version the table properties minReader from 2 to 1 and MaxWriter from 5 to 3.

Is there a possibility to downgrade the Delta Table protocol versions minReader from 2 to 1 and maxWriter from 5 to 3? I have set the TBL properties to 2 and 5 and columnmapping mode to rename the columns in the DeltaTable but the other users are rea...

  • 20191 Views
  • 2 replies
  • 12 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 12 kudos

Unfortunately You can't downgrade the version. it's an irreversible operation.

  • 12 kudos
1 More Replies
noimeta
by Contributor III
  • 11355 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks SQL dashboard refresh

We have scheduled a dashboard to automatically refresh at some specific time.However, some visualizations in the dashboard don't get refreshed at the scheduled time.Checking the query logs, we found the source queries were properly executed, but the ...

  • 11355 Views
  • 3 replies
  • 3 kudos
Latest Reply
noimeta
Contributor III
  • 3 kudos

Thank you for the answers.I'm using Databricks SQL environment, not the Data Science & Engineering one. And, I scheduled the dashboard following this guideline: https://docs.databricks.com/sql/user/dashboards/index.html#automatically-refresh-a-dashbo...

  • 3 kudos
2 More Replies
markdias
by New Contributor II
  • 2268 Views
  • 3 replies
  • 2 kudos

Which is quicker: grouping a table that is a join of several others or querying data?

This may be a tricky question, so please bear with meIn a real life scenario, i have a dataframe (i'm using pyspark) called age, with is a groupBy of other 4 dataframes. I join these 4 so at the end I have a few million rows, but after the groupBy th...

  • 2268 Views
  • 3 replies
  • 2 kudos
Latest Reply
NhatHoang
Valued Contributor II
  • 2 kudos

Hi @Marcos Dias​ ,Frankly, I think we need more detail to answer your question:Are these 4 dataframes​ updated their data?How often you use the groupBy-dataframe?

  • 2 kudos
2 More Replies
aarave
by New Contributor III
  • 5114 Views
  • 5 replies
  • 4 kudos

remote database connection error

Hi,I am using databricks through azure. I am trying to connect to remote oracle database using jdbc url. I am getting an error of no suitable driver found."java.sql.SQLException: No suitable driver"can somebody help mw with this?

  • 5114 Views
  • 5 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @vikas k​ Hope all is well! Does @Hubert Dudek​  response were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 4 kudos
4 More Replies
Swapnil1998
by New Contributor III
  • 1753 Views
  • 0 replies
  • 2 kudos

Date Formats while extracting data from Cosmos Mongo DB using Azure Databricks.

I have been trying to extract one date field from cosmos which looks like as below:"lastModifiedDate" : {        "$date" : 1668443121840    }when the above field is extracted using Databricks it gets converted into a date format which looks like this...

  • 1753 Views
  • 0 replies
  • 2 kudos
bl12
by New Contributor II
  • 3984 Views
  • 2 replies
  • 1 kudos

Resolved! Use API to Clone Dashboards with Widgets in Databricks SQL?

Hi, I manually created a template dashboard with one widget. I wanted to clone this dashboard using the Create Dashboard API however I don't know what to put for the widget object. What I did was use the Retrieve API on the template dashboard, and th...

  • 3984 Views
  • 2 replies
  • 1 kudos
Latest Reply
Wout
Contributor
  • 1 kudos

@Akash Bhat​ how do I clone a dashboard across workspaces? Being able to deploy dashboards (with widgets!) through the API is essential to set up proper Data Engineering workflows.

  • 1 kudos
1 More Replies
csw77
by New Contributor
  • 962 Views
  • 0 replies
  • 0 kudos

Simple PySpark query very slow in pushing to snowflake

Hi all,​I have a question which is likely very fundamental. I am passing data from hive to snowflake using pyspark. My query is very simple - "select from table limit 100".​ The table I am querying is very large, but this query can be shown to the co...

  • 962 Views
  • 0 replies
  • 0 kudos
Orianh
by Valued Contributor II
  • 51459 Views
  • 25 replies
  • 37 kudos

Fatal error: Python kernel is unresponsive

Hey guys, I'm using petastorm to train DNN, First i convert spark df with make_spark_convertor and then open a reader on the materialized dataset.While i start training session only on subset of the data every thing works fine but when I'm using all...

  • 51459 Views
  • 25 replies
  • 37 kudos
Latest Reply
Anonymous
Not applicable
  • 37 kudos

Same error. This started a few days ago on notebooks that used to run fine in the past. Now, I cannot finish a notebook.I have already disabled almost all output being streamed to the result buffer, but the problem persists. I am left with <50 lines ...

  • 37 kudos
24 More Replies
pawelmitrus
by Contributor
  • 2838 Views
  • 3 replies
  • 3 kudos

log4j vulnerability - action plan for clients

I'm looking for some information regarding log4j vulnerability - if any databricks runtime should be changed manually by the client or when specific update will be applied.I know I can go through the docs by myself, finding out which log4j library is...

  • 2838 Views
  • 3 replies
  • 3 kudos
Latest Reply
Mr_Srinivasa
New Contributor II
  • 3 kudos

Thanks for sharing such important facts. I got the best security service provider website on the internet. They are excellence in the field of security. 

  • 3 kudos
2 More Replies
Cosimo_F_
by Contributor
  • 12755 Views
  • 7 replies
  • 9 kudos

Incorrectly truncated long numbers in DS&E workspaces notebooks

Hello,I am getting inconsistent representation of long types.1661817599972 is the unix timestamp in milliseconds for Monday, August 29, 2022 11:59:59.972 PM GMTwhen I execute:`select 1661817599972 as t`the result is:166181759997 (last digit truncated...

  • 12755 Views
  • 7 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

Hi @Cosimo Felline​ Hope all is well!Does @Hubert Dudek (Customer)​ response answer your question?If you could resolve your issue, would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd...

  • 9 kudos
6 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels