cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shan-databricks
by New Contributor III
  • 794 Views
  • 2 replies
  • 0 kudos

Databricks Workflow Orchestration

I have 50 tables and will increase gradually, so I want to create a single workflow to orchestrate the job and run it table-wise. Is there an option to do this in Databricks workflow?

  • 794 Views
  • 2 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

Breakup these 50 tables logically or functionally and place them in their own workflows. A good strategy would be to group tables that are dependent in the same workflow. Then use a master workflow to trigger each child workflow. So it will be like a...

  • 0 kudos
1 More Replies
subhas_1729
by New Contributor II
  • 914 Views
  • 1 replies
  • 0 kudos

Dashboard

Hi       I want to design a dashboard that will show some variables of Spark-UI. Is it possible to access Spark-UI variables from my spark program. 

  • 914 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @subhas_1729, You can achieve this by leveraging Spark's monitoring and instrumentation APIs. Spark provides metrics that can be accessed through the SparkListener interface as well as the REST API. The SparkListener interface allows you to receiv...

  • 0 kudos
dbhavesh
by New Contributor II
  • 867 Views
  • 3 replies
  • 1 kudos

How to Apply row_num in DLT

Hi all,how to use row_num in DLT or What is the alternative for row_num function in DLT.We are looking for same functionality which row num is doing. Thanks in advance.

  • 867 Views
  • 3 replies
  • 1 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 1 kudos

@dbhavesh I apologize for the lack of explanation.The ROW_NUMBER function requires ordering over the entire dataset, making it a non-time-based window function. When applied to streaming data, it results in the "NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREA...

  • 1 kudos
2 More Replies
sachin_kanchan
by New Contributor III
  • 1378 Views
  • 6 replies
  • 0 kudos

Unable to log in into Community Edition

So I just registered for the Databricks Community Edition. And received an email for verification.When I click the link, I'm redirected to this website (image attached) where I am asked to input email. And when I do that, it sends me a verification c...

db_fail.png
  • 1378 Views
  • 6 replies
  • 0 kudos
Latest Reply
sachin_kanchan
New Contributor III
  • 0 kudos

What a disappointment this has been

  • 0 kudos
5 More Replies
prasidataengine
by New Contributor II
  • 1049 Views
  • 2 replies
  • 0 kudos

Issue when connecting with Databricks cluster 15.4 without unity catalog using databricks connect

Hi,I have a shared cluster created on databricks which uses 15.4 runtime.I dont want to enable the unity catalog for this cluster.Previously I used python 3.9.13 version to connect to 11.3 cluster using databricks connect 11.3Now my company has restr...

Data Engineering
Databricks
databricks-connect
  • 1049 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @prasidataengine, For DBR runtime 13.3 LTS and above you must have Unity Catalog enabled to be able to use databricks-connect. A Databricks account and workspace that have Unity Catalog enabled. See Set up and manage Unity Catalog and Enable a wo...

  • 0 kudos
1 More Replies
vidya_kothavale
by Contributor
  • 614 Views
  • 2 replies
  • 0 kudos

MongoDB Streaming Not Receiving Records in Databricks

Batch Read (spark.read.format("mongodb")) works fine.Streaming Read (spark.readStream.format("mongodb")) runs but receives no records.Batch Read (Works):df = spark.read.format("mongodb")\.option("database", database)\.option("spark.mongodb.read.conne...

  • 614 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @vidya_kothavale, MongoDB requires the use of change streams to enable streaming. Change streams allow applications to access real-time data changes without polling the database. Ensure that your MongoDB instance is configured to support change...

  • 0 kudos
1 More Replies
Dianagarces8
by New Contributor
  • 260 Views
  • 1 replies
  • 0 kudos

The lifetime of files in the DBFS are NOT tied to the lifetime of our cluster

What happen so that the lifetime of files in the DBFS are NOT tied to the lifetime of our cluster?

  • 260 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

files in dbfs are typically not linked to a cluster or it's lifetime.There are tmp directories in dbfs so perhaps you are looking at those, but f.e. Filestore can definitely be used.However, I suggest not using dbfs but some data lake (S3/ADLS).

  • 0 kudos
AbishekP
by New Contributor
  • 497 Views
  • 1 replies
  • 0 kudos

Unable to run selected lines in Databricks

I'm using SQL language in databricks. Basically I'm a tester and I'm trying to test the data load on tables by writing various queries. I'm unable to select a particular query and run. Ctrl+Shift+Enter shortcut is not working.Currently I need to open...

  • 497 Views
  • 1 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

You cannot do this from the notebooks. But you can do it via the SQL editor as shown below. 

  • 0 kudos
Radix95
by New Contributor II
  • 1610 Views
  • 3 replies
  • 2 kudos

Resolved! Error updating tables in DLT

I'm working on a Delta Live Tables (DLT) pipeline in Databricks Serverless mode.I receive a stream of data from Event Hubs, where each incoming record contains a unique identifier (uuid) along with some attributes (code1, code2).My goal is to update ...

  • 1610 Views
  • 3 replies
  • 2 kudos
Latest Reply
Edthehead
Contributor III
  • 2 kudos

All the tables that DLT writes to or updates needs to be managed by DLT. The reason is that these tables are streaming tables and hence DLT needs to manage the checkpointing. It also does the optimization for such tables. So in your scenario, you can...

  • 2 kudos
2 More Replies
Hariharan49
by New Contributor
  • 1809 Views
  • 4 replies
  • 1 kudos

How can I use multiple schema in DLT?

Hi I would like to use multiple schema as destination in dlt but currently I can just give single unity schema . I have my tables of multi hop in different schema.

  • 1809 Views
  • 4 replies
  • 1 kudos
Latest Reply
kuldeep-in
Databricks Employee
  • 1 kudos

@Hariharan49  'Direct Publishing Mode' Public Preview is now live on all production regions. This feature will allow you to write to multiple schemas & catalogs from the same pipeline.

  • 1 kudos
3 More Replies
ADuma
by New Contributor III
  • 2623 Views
  • 0 replies
  • 0 kudos

Strcutured Streaming with queue in separate storage account

Hello,we are running a structured streaming job which consumes zipped Json files that arrive in our Azure Prod storage account. We are using AutoLoader and have set up an Eventgrid Queue which we pass to the streaming job using cloudFiles.queueName. ...

  • 2623 Views
  • 0 replies
  • 0 kudos
turagittech
by Contributor
  • 2635 Views
  • 0 replies
  • 0 kudos

Identify source of data in query

Hi All,I have an issue. I have several databases with the same schemas I need to source data from. Those databases are going to end up aggregated in a data warehouse. The problem is the id column in each means different things. Example: a client id i...

  • 2635 Views
  • 0 replies
  • 0 kudos
lauraxyz
by Contributor
  • 887 Views
  • 2 replies
  • 0 kudos

Online Table: create only if it does not exist

i'm following this Doc to create online table using Databricks SDK. How can i set it to create ONLY when it doesn't exist, to avoid the failure of "table already exists" error? Or, is there another way to programatic way to check existence of an Onli...

  • 887 Views
  • 2 replies
  • 0 kudos
Latest Reply
lauraxyz
Contributor
  • 0 kudos

Thank you @Alberto_Umana ,  that's a good way to go when there's no built-in creat-if-not-exist feature.i also tried a different way to use  information_schema, i think it should work toodef table_exists(table_name): return spark.sql(f""" ...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels