cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

gbalboa
by New Contributor
  • 16044 Views
  • 1 replies
  • 6 kudos

Resolved! How do temp views actually work?

So I'm querying data from parquet files that have a couple of billions records (table 1 or t1), and then have to filter and then join with other parquet files with another couple of billions records (t2). This takes quite a long time to run (like 10h...

  • 16044 Views
  • 1 replies
  • 6 kudos
Latest Reply
PeteStern
Databricks Employee
  • 6 kudos

Your intuition about views is correct. Views are not materialized, so they are basically just a saved query. Every time you access a view it will have to be recomputed. This is certainly not ideal if it take a long time (like 10hrs) to materialize a ...

  • 6 kudos
TheDataDexter
by New Contributor III
  • 4586 Views
  • 3 replies
  • 4 kudos

Resolved! VNET injected Databricks cluster not able to mount - 403 error

I'm mounting a Storage Account to a Databricks cluster in Azure. All the resources are included in a VNET and a private and public subnet has been associated to the Databricks resource. Below I've attached the guide we use for mounting the ADLS G2 to...

  • 4586 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Derrick Bakhuis​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 4 kudos
2 More Replies
Gangadhar
by New Contributor
  • 1339 Views
  • 0 replies
  • 0 kudos

GCP Databricks Cluster Start issue - Free trail Account

I have GCP trial account and took a 14 days databricks free trial from GCP. I created workspace and trying to create cluster and start it but it keeps on rotating/pending state. As a initial step, I tried to increase quotas mentioned on the page but ...

GCP Quotas Increase error Cluster Start Error
  • 1339 Views
  • 0 replies
  • 0 kudos
Lonnie
by New Contributor
  • 1993 Views
  • 0 replies
  • 0 kudos

Recommended Redshift-2-Delta Migration Path

Hello All!My team is previewing Databricks and are contemplating the steps to take to perform one-time migrations of datasets from Redshift to Delta. Based on our understandings of the tool, here are our initial thoughts:Export data from Redshift-2-S...

  • 1993 Views
  • 0 replies
  • 0 kudos
tom_shaffner
by New Contributor III
  • 9336 Views
  • 1 replies
  • 2 kudos

"Detected a data update", what changed?

In streaming flows I periodically get a "Detected a data update" error. This error generally seem to indicate that something has changed in the source table schema, but it's not immediately apparent what. In one case yesterday I pulled the source tab...

  • 9336 Views
  • 1 replies
  • 2 kudos
Latest Reply
tom_shaffner
New Contributor III
  • 2 kudos

@Kaniz Fatma​ , Thanks, that helps. I was assuming this warning indicated a schema evolution, and based on what you say it likely wasn't and I just have to turn on IgnoreChanges any time I have a stream from a table that receives updates/upserts.To b...

  • 2 kudos
sraj43
by New Contributor II
  • 1179 Views
  • 1 replies
  • 2 kudos

Unable to create account in Databricks community Edition

Unable to login to the community edition as the verification email is not delivered.

  • 1179 Views
  • 1 replies
  • 2 kudos
Latest Reply
himi1303
New Contributor II
  • 2 kudos

Hi sraj, facing same issue. Is your issue resolved. Please guide me also. That will be of great help.

  • 2 kudos
User16826994223
by Honored Contributor III
  • 4539 Views
  • 2 replies
  • 2 kudos

Mult task - restart of the failed jobs

Hi Team I am using Multitask and I am trying to restart only the failed task but seems like I have to restart complete workflow again and again , is there any way or workaround

  • 4539 Views
  • 2 replies
  • 2 kudos
Latest Reply
TheOptimizer
Contributor
  • 2 kudos

One way that works is to go to your task definition, click advanced options, and set retry policy. The task will restart per those instructions. Does that work for you?

  • 2 kudos
1 More Replies
rbiddle
by New Contributor
  • 8729 Views
  • 3 replies
  • 0 kudos

Specifying a Managed Resource Group name

Azure Databricks provisions a Managed Resource Group when you create your Workspace. Is there a way to specify the name of the Managed Resource Group and its resources during creation?The defaults created by the Workspace violate my company's standar...

  • 8729 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Robert Biddle​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 0 kudos
2 More Replies
Confused
by New Contributor III
  • 4936 Views
  • 3 replies
  • 3 kudos

Resolved! Dealing with updates to a delta table being used as a streaming source

Hi AllI have a requirement to perform updates on a delta table that is the source for a streaming query.I would like to be able to update the table and have the stream continue to work while also not ending up with duplicates.From my research it se...

  • 4936 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hey @Mathew Walters​ Hope you are doing great.Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
2 More Replies
JananiMohan
by New Contributor
  • 6281 Views
  • 4 replies
  • 0 kudos

Resolved! ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

After the new release of numpy 1.22.0 on Dec 31st, Databricks failed with this error for my existing Databricks Notebook Version 10.1 and numpy 1.20.0Qn: Why did the earlier releases after 1.20.0 uptil 1.22.0 did not raise the same exception. ?

  • 6281 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Janani Mohan​ Hope you are doing well.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 0 kudos
3 More Replies
KC_1205
by New Contributor III
  • 4522 Views
  • 9 replies
  • 1 kudos

Resolved! Migrating DB from 7.3 LTS to 9.1 LTS

Hi All,I have a code in the dev and production using DB 7.3 LTS. Now, I would like to update the environment to 9.1 LTS as support is going to finish. I have gone through the documentation given in the following link. https://docs.databricks.com/rele...

  • 4522 Views
  • 9 replies
  • 1 kudos
Latest Reply
gmondauto
New Contributor II
  • 1 kudos

@Kiran Chalasani​  Hey Have you ever been able to run 7.3run time with multi_gpus before you migrated to 9.1?

  • 1 kudos
8 More Replies
Mr__E
by Contributor II
  • 1935 Views
  • 1 replies
  • 1 kudos

Databricks dashboard removing order, incorrectly sorted.

I created a table that aggregates data by year and week of year and display this in a chart over time. As part of the query, I order by the year, then week columns. In the visualization on the query (in the SQL editor), I disabled the sort, because i...

  • 1935 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Mr. E (Customer)​ , did you get a chance to check Kaniz's previous comments ? is this issue resolved or do you need any further help here ?

  • 1 kudos
pantelis_mare
by Contributor III
  • 5257 Views
  • 7 replies
  • 10 kudos

Resolved! Delta Upsert performance on empty table

Hello all,I was just wandering, performance wise how does it compare a plain write operation with a merge operation on an EMPTY delta table. Do we really risk to get significant performance drop?The use case would be to have the same pipeline for ini...

  • 5257 Views
  • 7 replies
  • 10 kudos
Latest Reply
pantelis_mare
Contributor III
  • 10 kudos

Hello @Kaniz Fatma​ ,Unfortunately I did not do any further investigation on the subject. Given that the merge on an empty table will only be done once at the creation of a table, it wouldn't really matter to be honest.

  • 10 kudos
6 More Replies
720677
by New Contributor III
  • 1894 Views
  • 1 replies
  • 2 kudos

Resolved! Databricks Clusters on GCP stop working "Environment directory not found" issue - waitForEnvironmentFileSystem

Starting from yesterday 17/5/2022 i start getting errors while running notebooks or jobs on clusters of Databricks GCP. The error is: SparkException: Environment directory not found at /local_disk0/.ephemeral_nfs/cluster_libraries/pythonThe job/noteb...

  • 1894 Views
  • 1 replies
  • 2 kudos
Latest Reply
720677
New Contributor III
  • 2 kudos

Databricks supports detected an issue with the NFS mounts on GCP. Looks like DBR 10.X versions were affected. After several hours they fixed it and now the same clusters are back to normal.

  • 2 kudos
JohanRex
by New Contributor II
  • 5691 Views
  • 3 replies
  • 5 kudos

Resolved! IllegalArgumentException: requirement failed: Result for RPC Some(e100cace-3836-4461-8902-80b3744fcb6b) lost, please retry your request.

I'm using databricks connect to talk to a cluster on Azure. When doing a count on a dataframe I sometimes get this error message. Once I've gotten it once I don't seem to be able to get rid of it even if I restart my dev environment. ----------------...

  • 5691 Views
  • 3 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Johan Rex​ We checked with databricks connect team, this issue can happen when the library is too large to upload, Databricks recommends that you use dbx by Databricks Labs for local development instead of Databricks Connect. Databricks plans no ...

  • 5 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels