cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ksenija
by Contributor
  • 1153 Views
  • 3 replies
  • 1 kudos

Resolved! DLT pipeline - silver table, joining streaming data

Hello!I'm trying to do my modeling in DLT pipelines. For bronze, I created 3 streaming views. When I try to join them to create silver table, I got an error that I can't join stream and stream without watermarks. I tried adding them but then I got no...

  • 1153 Views
  • 3 replies
  • 1 kudos
Latest Reply
Ravivarma
Databricks Employee
  • 1 kudos

Hello @ksenija , Greetings! Streaming uses watermarks to control the threshold for how long to continue processing updates for a given state entity. Common examples of state entities include: Aggregations over a time window. Unique keys in a join b...

  • 1 kudos
2 More Replies
ShankarM
by New Contributor III
  • 496 Views
  • 1 replies
  • 1 kudos

Resolved! Serverless feature audit in data engg.

As recently announced in the summit that notebooks, jobs, workflows will run in serverless mode, how do we track/debug the compute cluster metrics in this case especially when there are performance issues while running jobs/workflows.

  • 496 Views
  • 1 replies
  • 1 kudos
Latest Reply
imsabarinath
New Contributor III
  • 1 kudos

Databricks is planning to enable some system tables to capture some of these metrics and same can be leveraged for troubleshooting as starting point is my view

  • 1 kudos
vkumar
by New Contributor
  • 509 Views
  • 0 replies
  • 0 kudos

Receiving Null values from Eventhub streaming.

Hi, I am new to PySpark, and facing an issue while consuming data from the Azure eventhub. I am unable to deserialize the consumed data. I see only null values upon deserializing data using the schema. Please find the below schema, eventhub message, ...

  • 509 Views
  • 0 replies
  • 0 kudos
Oliver_Angelil
by Valued Contributor II
  • 9315 Views
  • 9 replies
  • 6 kudos

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...

Screenshot 2023-05-09 at 17.02.04
  • 9315 Views
  • 9 replies
  • 6 kudos
Latest Reply
Rahul_S
New Contributor II
  • 6 kudos

Informative.

  • 6 kudos
8 More Replies
jwilliam
by Contributor
  • 3781 Views
  • 3 replies
  • 6 kudos

Resolved! Has Unity Catalog been available in Azure Gov Cloud?

We are using Databricks with Premium Tier in Azure Gov Cloud. We check the Data section but don't see any options to Create Metastore.

  • 3781 Views
  • 3 replies
  • 6 kudos
Latest Reply
User16672493709
Databricks Employee
  • 6 kudos

Azure.gov does not have Unity Catalog (as of July 2024). I think previous responses missed the context of government cloud in OP's question. UC has been open sourced since this question was asked, and is a more comprehensive solution in commercial cl...

  • 6 kudos
2 More Replies
bricksdata
by New Contributor
  • 9600 Views
  • 4 replies
  • 0 kudos

Unable to authenticate against https://accounts.cloud.databricks.com as an account admin.

ProblemI'm unable to authenticate against the https://accounts.cloud.databricks.com endpoint even though I'm an account admin. I need it to assign account level groups to workspaces via the workspace assignment api (https://api-docs.databricks.com/re...

  • 9600 Views
  • 4 replies
  • 0 kudos
Latest Reply
137292
New Contributor II
  • 0 kudos

From this doc: To automate Databricks account-level functionality, you cannot use Databricks personal access tokens. Instead, you must use either OAuth tokens for Databricks account admin users or service principals. For more information, see:Use a s...

  • 0 kudos
3 More Replies
thiagoawstest
by Contributor
  • 580 Views
  • 0 replies
  • 0 kudos

change network/vpc workspace

Hello, I have two workspaces, each workspace pointing to a VPC in AWS, in one of the accounts we need to remove a subnet, after removing the InvalidSubnetID.NotFound AWS error when starting the clueter, checked in Manager Account, the networl is poin...

thiagoawstest_0-1720808852626.png
  • 580 Views
  • 0 replies
  • 0 kudos
Avinash_Narala
by Contributor III
  • 445 Views
  • 0 replies
  • 0 kudos

Tracking Serverless cluster cost

Hi,I just explored serverless feature in databricks and wondering how can i track cost associated with it. Is it stored in system tables? If yes, then where can i find it?And also how can i prove that it's cost is relatively less compared to classic ...

  • 445 Views
  • 0 replies
  • 0 kudos
Avinash_Narala
by Contributor III
  • 682 Views
  • 0 replies
  • 0 kudos

File Trigger VS Autoloader

Hi,I recently came across File Trigger in Databricks and find mostly similar to Autoloader. My 1st question is why file trigger as we have autoloader.In which scenarios I can go with file triggers and autoloader.Can you please differentiate?

  • 682 Views
  • 0 replies
  • 0 kudos
FennVerm_60454
by New Contributor II
  • 4566 Views
  • 4 replies
  • 1 kudos

Resolved! How to clean up extremely large delta log checkpoints and many small files?

AWS by the way, if that matters. We have an old production table that has been running in the background for a couple of years, always with auto-optimize and auto-compaction turned off. Since then, it has written many small files (like 10,000 an hour...

  • 4566 Views
  • 4 replies
  • 1 kudos
Latest Reply
siddhathPanchal
Databricks Employee
  • 1 kudos

Sometime, if we have less commit versions for a delta table, it won't create checkpoint files in the table. Checkpoint file is responsible to trigger the log cleanup activities. In case, you observe that there are no checkpoint files available for th...

  • 1 kudos
3 More Replies
Kayla
by Valued Contributor
  • 1250 Views
  • 3 replies
  • 1 kudos

Resolved! Datadog Installation

Is anyone familiar with installing the Datadog agent on clusters? We're not having much luck.  We honestly might not be having the init script run since we're not seeing it in the log, but we can get just a generic "hellow world" init script to run a...

  • 1250 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kayla
Valued Contributor
  • 1 kudos

Responding here with the solution I found. Hopefully it'll help anyone with similar issues.First, the Datadog install script is practically a matryoshka doll- the script creates another script which creates a YAML file.One of the consequences of that...

  • 1 kudos
2 More Replies
erigaud
by Honored Contributor
  • 1943 Views
  • 4 replies
  • 0 kudos

Pass Dataframe to child job in "Run Job" task

Hello,I have a Job A that runs a Job B, and Job A defines a globalTempView and I would like to somehow access it in the child job. Is that in anyway possible ? Can the same cluster be used for both jobs ? If it is not possible, does someone know of a...

  • 1943 Views
  • 4 replies
  • 0 kudos
Latest Reply
rahuja
Contributor
  • 0 kudos

Hi @ranged_coop Yes, we are using the same job compute for using different workflows. But I think different tasks are like different docker containers so that is why it becomes an issue. It would be nice if you can explain a bit about the approach yo...

  • 0 kudos
3 More Replies
tf32
by New Contributor II
  • 1665 Views
  • 2 replies
  • 1 kudos

Resolved! ERROR com.databricks.common.client.DatabricksServiceHttpClientException: DEADLINE_EXCEEDED

Hi,I got this error "com.databricks.WorkflowException: com.databricks.common.client.DatabricksServiceHttpClientException: DEADLINE_EXCEEDED" during the run of a job workflow with an interactive cluster, at the start of this. It's a job that has been ...

  • 1665 Views
  • 2 replies
  • 1 kudos
Latest Reply
tf32
New Contributor II
  • 1 kudos

Yes, subsequent runs have been successful.Thank you for the explanation. 

  • 1 kudos
1 More Replies
Avinash_Narala
by Contributor III
  • 1569 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks AI Assistant Cost Implications

I'm worried about how much the Databricks AI assistant will cost me.I need to understand what I'll be charged for, especially when I give a prompt to the AI Assistant Pane and how it will operate in the background.

  • 1569 Views
  • 2 replies
  • 2 kudos
Latest Reply
Avinash_Narala
Contributor III
  • 2 kudos

Is there any token limit? like in response or the prompt we send?

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels