cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kkumar
by New Contributor III
  • 1600 Views
  • 2 replies
  • 2 kudos

ADLS Gen 2 Delta Tables memory allocation

if i mount a gen2(ADLS 1) to another gen2(ADLS2) account and create a delta table on ADLS2 will it copy the data or just create something link External table.i don't want to duplicate the the data.

  • 1600 Views
  • 2 replies
  • 2 kudos
Latest Reply
Pat
Honored Contributor III
  • 2 kudos

Hi @keerthi kumar​ ,so basically you can CREATE EXTERNAL TABLES on top of the data stored somewhere - in your case ADLS. Data won't be copied, it will stay where it is, by creating external tables you are actually storing the metadata in your metasto...

  • 2 kudos
1 More Replies
Michael42
by New Contributor III
  • 1549 Views
  • 2 replies
  • 1 kudos

Would like to start a discussion regarding techniques for joining two relatively large tables of roughly equal size on a daily basis. I realize this may be a bit of a conundrum with databricks, but review the details.

Input Data:One batch load of a daily dataset, roughly 10 million items a day of transactions.Another daily batch load of roughly the same size.Each row in one dataset should have a corresponding row in the other dataset.Problem to solve:The problem i...

  • 1549 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lennart
New Contributor II
  • 1 kudos

I've dealt with something similar in the past.There was an order system that had order items that was supposed to be matched up against corresponding products in another system that acted as a master and handled invoicing.As for unqiue considerations...

  • 1 kudos
1 More Replies
Yaswanth
by New Contributor III
  • 18243 Views
  • 2 replies
  • 12 kudos

Resolved! How can Delta table protocol version be downgraded from higher version to lower version the table properties minReader from 2 to 1 and MaxWriter from 5 to 3.

Is there a possibility to downgrade the Delta Table protocol versions minReader from 2 to 1 and maxWriter from 5 to 3? I have set the TBL properties to 2 and 5 and columnmapping mode to rename the columns in the DeltaTable but the other users are rea...

  • 18243 Views
  • 2 replies
  • 12 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 12 kudos

Unfortunately You can't downgrade the version. it's an irreversible operation.

  • 12 kudos
1 More Replies
noimeta
by Contributor III
  • 8097 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks SQL dashboard refresh

We have scheduled a dashboard to automatically refresh at some specific time.However, some visualizations in the dashboard don't get refreshed at the scheduled time.Checking the query logs, we found the source queries were properly executed, but the ...

  • 8097 Views
  • 3 replies
  • 3 kudos
Latest Reply
noimeta
Contributor III
  • 3 kudos

Thank you for the answers.I'm using Databricks SQL environment, not the Data Science & Engineering one. And, I scheduled the dashboard following this guideline: https://docs.databricks.com/sql/user/dashboards/index.html#automatically-refresh-a-dashbo...

  • 3 kudos
2 More Replies
markdias
by New Contributor II
  • 1596 Views
  • 3 replies
  • 2 kudos

Which is quicker: grouping a table that is a join of several others or querying data?

This may be a tricky question, so please bear with meIn a real life scenario, i have a dataframe (i'm using pyspark) called age, with is a groupBy of other 4 dataframes. I join these 4 so at the end I have a few million rows, but after the groupBy th...

  • 1596 Views
  • 3 replies
  • 2 kudos
Latest Reply
NhatHoang
Valued Contributor II
  • 2 kudos

Hi @Marcos Dias​ ,Frankly, I think we need more detail to answer your question:Are these 4 dataframes​ updated their data?How often you use the groupBy-dataframe?

  • 2 kudos
2 More Replies
aarave
by New Contributor III
  • 3745 Views
  • 5 replies
  • 4 kudos

remote database connection error

Hi,I am using databricks through azure. I am trying to connect to remote oracle database using jdbc url. I am getting an error of no suitable driver found."java.sql.SQLException: No suitable driver"can somebody help mw with this?

  • 3745 Views
  • 5 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @vikas k​ Hope all is well! Does @Hubert Dudek​  response were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 4 kudos
4 More Replies
Swapnil1998
by New Contributor III
  • 1261 Views
  • 0 replies
  • 2 kudos

Date Formats while extracting data from Cosmos Mongo DB using Azure Databricks.

I have been trying to extract one date field from cosmos which looks like as below:"lastModifiedDate" : {        "$date" : 1668443121840    }when the above field is extracted using Databricks it gets converted into a date format which looks like this...

  • 1261 Views
  • 0 replies
  • 2 kudos
bl12
by New Contributor II
  • 3065 Views
  • 2 replies
  • 1 kudos

Resolved! Use API to Clone Dashboards with Widgets in Databricks SQL?

Hi, I manually created a template dashboard with one widget. I wanted to clone this dashboard using the Create Dashboard API however I don't know what to put for the widget object. What I did was use the Retrieve API on the template dashboard, and th...

  • 3065 Views
  • 2 replies
  • 1 kudos
Latest Reply
Wout
Contributor
  • 1 kudos

@Akash Bhat​ how do I clone a dashboard across workspaces? Being able to deploy dashboards (with widgets!) through the API is essential to set up proper Data Engineering workflows.

  • 1 kudos
1 More Replies
csw77
by New Contributor
  • 665 Views
  • 0 replies
  • 0 kudos

Simple PySpark query very slow in pushing to snowflake

Hi all,​I have a question which is likely very fundamental. I am passing data from hive to snowflake using pyspark. My query is very simple - "select from table limit 100".​ The table I am querying is very large, but this query can be shown to the co...

  • 665 Views
  • 0 replies
  • 0 kudos
Orianh
by Valued Contributor II
  • 38788 Views
  • 25 replies
  • 35 kudos

Fatal error: Python kernel is unresponsive

Hey guys, I'm using petastorm to train DNN, First i convert spark df with make_spark_convertor and then open a reader on the materialized dataset.While i start training session only on subset of the data every thing works fine but when I'm using all...

  • 38788 Views
  • 25 replies
  • 35 kudos
Latest Reply
Anonymous
Not applicable
  • 35 kudos

Same error. This started a few days ago on notebooks that used to run fine in the past. Now, I cannot finish a notebook.I have already disabled almost all output being streamed to the result buffer, but the problem persists. I am left with <50 lines ...

  • 35 kudos
24 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels