Data Engineering

Forum Posts

Sorted by:

by ashraf1395 • New Contributor II

2 weeks ago

437 Views
5 replies
4 kudos

Resolved! Starting Serverless sql cluster on GCP

Hello there,I am trying to start a serverless databricks SQL cluster in GCP. I am following this databricks doc: https://docs.gcp.databricks.com/en/admin/sql/serverless.htmlI have checked that all my requirements are fulfilled for activating the clus...

Data Engineering

437 Views
5 replies
4 kudos

2 weeks ago

View Replies

Latest Reply

ashraf1395
New Contributor II

2 weeks ago

4 kudos

I had another question. Though not related to this thread.Do databricks has any plan for startups, like they have normal free trial

4 kudos

2 weeks ago

4 More Replies

by Keyurishah_ • New Contributor

06-28-2023 4:06:34 PM

349 Views
1 replies
0 kudos

Excited for lakehouse iq

So excited to get my hands on lakehouse iq

Data Engineering

Summit 2023

349 Views
1 replies
0 kudos

06-28-2023 4:06:34 PM

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @Keyurishah_, We're thrilled to hear that you had a great experience at DAIS 2023! Your feedback is valuable to us, and we appreciate you taking the time to share it on the community platform. We wanted to let you know that the Databricks Communit...

0 kudos

2 weeks ago

by mickniz • Contributor

3 weeks ago

204 Views
3 replies
0 kudos

Connect to Databricks from PowerApps

Hi All,Currently I trying to connect databricks Unity Catalog from Powerapps Dataflow by using spark connector specifying http url and using databricks personal access token as specified in below screenshot: I am able to connect but the issue is when...

Data Engineering

204 Views
3 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

mickniz
Contributor

2 weeks ago

0 kudos

thanks for replying @Kaniz . I am using my user person token to connect and I have all access on catalog,schema and tables. I am able to view in Databricks Sql editor but not via Spark connector in power apps. It still shows me Is there anything else...

0 kudos

2 weeks ago

2 More Replies

by Nastia • New Contributor

a month ago

152 Views
1 replies
0 kudos

I am getting NoneType error when running a query from API on cluster

When I am running a query on Databricks itself from notebook, it is running fine and giving me results. But the same query when executed from FastAPI (Python, using databricks library) is giving me "TypeError: 'NoneType' object is not iterable".I can...

Data Engineering

152 Views
1 replies
0 kudos

a month ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @Nastia, The “TypeError: ‘NoneType’ object is not iterable” error typically occurs when you try to iterate over a variable that has a value of None. Let’s explore some possible solutions to address this issue: Check for None before Iterating: ...

0 kudos

2 weeks ago

by jitesh • New Contributor

3 weeks ago

137 Views
1 replies
0 kudos

Code reusability for silver table transformations

How/how many databricks notebooks should be created to populate multiple silver delta tables, all having different and complex transformations ? What's the best practice -1. create a notebook each for a silver table ?2. push SQL transformation logic ...

Data Engineering

137 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @jitesh, When organizing your Databricks Notebooks for multiple silver Delta tables with different and complex transformations, it’s essential to follow best practices. Here are some recommendations: Separate Notebooks for Each Layer: Bronze L...

0 kudos

2 weeks ago

by Phani1 • Valued Contributor

3 weeks ago

140 Views
1 replies
0 kudos

Databricks cell-level code parallel execution through the Python threading library

Hi Team,We are currently planning to implement Databricks cell-level code parallel execution through the Python threading library. We are interested in comprehending the resource consumption and allocation process from the cluster. Are there any pot...

Data Engineering

delta

140 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @Phani1, Implementing Databricks cell-level code parallel execution through the Python threading library can be beneficial for performance, but there are some considerations to keep in mind. Let’s break it down: Resource Consumption and Alloca...

0 kudos

2 weeks ago

by Fresher • New Contributor II

3 weeks ago

145 Views
1 replies
0 kudos

users are deleted/ unsynced from azure AD to databricks

In azure AD, it's shows users are synced to Databricks. But in Databricks, it's showing users is not a part of the group. The user is not part of only one group , he is part of remaining groups. All the syncing works fine till yesterday. I don't now ...

Data Engineering

145 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @Fresher, It sounds like you’re experiencing an issue with user synchronization between Azure AD and Databricks. Let’s troubleshoot this together! Here are some steps you can take to resolve the issue: Check SCIM Provisioning Configuration: En...

0 kudos

2 weeks ago

by chloeh • New Contributor II

3 weeks ago

114 Views
1 replies
0 kudos

Chaining window aggregations in SQL

In my SQL data transformation pipeline, I'm doing chained/cascading window aggregations: for example, I want to do average over the last 5 minutes, then compute average over the past day on top of the 5 minute average, so that my aggregations are mor...

Data Engineering

114 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @chloeh, You’re working with a Spark SQL data transformation pipeline involving chained window aggregations. Let’s look at your code snippet and see if we can identify the issue. First, let’s break down the steps you’ve implemented: You’re read...

0 kudos

2 weeks ago

by jainshasha • New Contributor II

3 weeks ago

556 Views
12 replies
2 kudos

Job Cluster in Databricks workflow

Hi,I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially w...

Data Engineering

556 Views
12 replies
2 kudos

3 weeks ago

View Replies

Latest Reply

emora
New Contributor II

2 weeks ago

2 kudos

Honestly you shouldn't have any kind of limitation executing diferent workflows.I did a test case in my Databricks and if you have your workflows with a job cluster your shouldn't have limitation. But I did all my test in Azure and just for you to kn...

2 kudos

2 weeks ago

11 More Replies

by namankhamesara • New Contributor II

2 weeks ago

96 Views
1 replies
0 kudos

Error while running Databricks modules

Hi Databricks Community,I am following https://customer-academy.databricks.com/learn/course/1266/data-engineering-with-databricks?generated_by=575333&hash=6edddab97f2f528922e2d38d8e4440cda4e5302a this course provided by databricks. In this when I am ...

Data Engineering

databrickscommunity

96 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @namankhamesara, Thank you for reaching out! It appears there might be an issue with accessing the data for your course. To expedite your request and resolve this issue promptly, please list your concerns on our ticketing portal. Our support staff...

0 kudos

2 weeks ago

by Shazam • New Contributor

2 weeks ago

129 Views
1 replies
0 kudos

Ingestion time clustering -Initial load

As per info available ingestion time clustering makes use of time of the time a file is written or ingested in databricks. In a use case where there is new delta table and an etl which runs in timely fashion(say daily) inserting records, am able to ...

Data Engineering

129 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @Shazam, Great questions! Let’s break down each scenario: Initial Data Migration: When migrating data from an existing platform to Databricks, you might have a large initial load of records. In this case, ingestion time clustering can still be...

0 kudos

2 weeks ago

by Anske • New Contributor III

3 weeks ago

356 Views
6 replies
1 kudos

Resolved! DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes( target = "cdctest", source = "cdctest_cdc_enriched", keys = ["ID"], sequence_by...

Data Engineering

Delta Live Tables

356 Views
6 replies
1 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

1 kudos

Hi @Anske, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where updates from the source table are not being correctly applied to the target table. Let’s troubleshoot this together! Pipeline Update Process: Whe...

1 kudos

2 weeks ago

5 More Replies

by halox6000 • New Contributor III

2 weeks ago

106 Views
1 replies
0 kudos

How do i stop pyspark from outputting text

I am using a tqdm progress bar to monitor the amount of data records I have collected via API. I am temporarily writing them to a file in the DBFS, then uploading to a Spark DataFrame. Each time I write to a file, I get a message like 'Wrote 8873925 ...

Data Engineering

106 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @halox6000, To stop the progress bar output from tqdm, you can use the disable argument. Set it to True to silence any tqdm output. In fact, it will not only hide the display but also skip the progress bar calculations entirely1. Here’s an examp...

0 kudos

2 weeks ago

by MrD • New Contributor

2 weeks ago

148 Views
1 replies
0 kudos

Issue with autoscalling the cluster

Hi All, My job is breaking as the cluster is not able to autoscale. below is the log,can it be due to AWS vms are not spinning up or can be due to issue databricks configuration.Does anyone has faced it before ?TERMINATING Compute terminated. Reason:...

Data Engineering

148 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

koushiknpvs
New Contributor III

2 weeks ago

0 kudos

Hey MrD,I faced this issue while running Azure VMs. A restart and re atatching the cluster helped me. Please let me know if that works for you.

0 kudos

2 weeks ago

by smukhi • New Contributor II

2 weeks ago

404 Views
2 replies
0 kudos

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

As of this morning we started receiving the following error message on a Databricks job with a single Pyspark Notebook task. The job has not had any code changes in 2 months. The cluster configuration has also not changed. The last successful run of ...

Data Engineering

404 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

smukhi
New Contributor II

2 weeks ago

0 kudos

As advised, I double confirmed that no code or cluster configuration was changed (even got a second set of eyes on it that confirmed the same).I was able to find a "fix" which puts a bandaid on the issue:I was able to pinpoint that the issue seems to...

0 kudos

2 weeks ago

1 More Replies

User

Count

1603

737

344

284

247

Databricks

Forum Posts

Resolved! Starting Serverless sql cluster on GCP

Excited for lakehouse iq

Connect to Databricks from PowerApps

I am getting NoneType error when running a query from API on cluster

Code reusability for silver table transformations

Databricks cell-level code parallel execution through the Python threading library

users are deleted/ unsynced from azure AD to databricks

Chaining window aggregations in SQL

Job Cluster in Databricks workflow

Error while running Databricks modules

Ingestion time clustering -Initial load

Resolved! DLT apply_changes applies only deletes and inserts not updates

How do i stop pyspark from outputting text

Issue with autoscalling the cluster

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

External table from external location

How to increase executor memory in Databricks jobs

Databricks job keep getting failed due to executor...

Set up connection to on prem sql server

Git Integration with Databricks Query Files and Az...