cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

EndreM
by New Contributor III
  • 2112 Views
  • 1 replies
  • 0 kudos

Replay stream to migrate to liquid cluster

The documentation is sparse about how to migrate a partition table to a liquid cluster as the Alter table suggested in the documentation doesnt work when its a partitioned table.The comments on this forum suggest replaying the stream. And this is wha...

  • 2112 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @EndreM , I did some digging internally and I have come up with some helpful tips/tricks to help guide you through this issue: Based on your situation, you're encountering several common challenges when migrating a partitioned table to liqu...

  • 0 kudos
soumiknow
by Contributor II
  • 1986 Views
  • 1 replies
  • 0 kudos

Unable to create databricks group and add permission via terraform

I have the following terraform code to create a databricks group and add permission to a workflow: resource "databricks_group" "dbx_group" { display_name = "ENV_MONITORING_TEAM" } resource "databricks_permissions" "workflow_permission" { job_id ...

Data Engineering
databricks groups
Terraform
  • 1986 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @soumiknow , I did some digging internally and found something that may help: Based on the information gathered, I can now draft a comprehensive response to this Databricks Community question about the Terraform authentication issue. ## Dra...

  • 0 kudos
anusha98
by Visitor
  • 16 Views
  • 0 replies
  • 0 kudos

Regarding : How to use Row_number() in dlt pipelines

We have two streaming tables : customer_info and customer_info_history and we  joined them using full join to create temp table in pyspark and now we want to eliminate the de-duped records from this temp table. Tried using row_number() but facing bel...

  • 16 Views
  • 0 replies
  • 0 kudos
Marthinus
by New Contributor III
  • 35 Views
  • 2 replies
  • 0 kudos

[Databricks Asset Bundles] Bug: driver_node_type_id not updated

Working with databricks asset bundles (using the new python-based definition), if you have a job_cluster defined using driver_node_type_id, and then update it to no longer have it defined, but only node_type_id, the driver node_type never gets update...

  • 35 Views
  • 2 replies
  • 0 kudos
Latest Reply
Marthinus
New Contributor III
  • 0 kudos

@dkushari no I don't think you understand me. When I unset driver_node_type_id, it does NOT revert to node_type_id.Perhaps an example will make it clear:First define cluster as:JobCluster( job_cluster_key="example", new_cluster=ClusterSpec( ...

  • 0 kudos
1 More Replies
Jonathan_
by New Contributor II
  • 250 Views
  • 5 replies
  • 6 kudos

Slow PySpark operations after long DAG that contains many joins and transformations

We are using PySpark and notice that when we are doing many transformations/aggregations/joins of the data then at some point the execution time of simple task (count, display, union of 2 tables, ...) become very slow even if we have a small data (ex...

  • 250 Views
  • 5 replies
  • 6 kudos
Latest Reply
Jonathan_
New Contributor II
  • 6 kudos

Hi,We forgot to say that we were using a single node cluster (E class with 16 cores). Often in our projects we need to used library that works mainly with data in memory. We also need to remember that here we are not referring to a large data.When we...

  • 6 kudos
4 More Replies
smoortema
by New Contributor III
  • 170 Views
  • 2 replies
  • 2 kudos

Resolved! How to make FOR cycle and dynamic SQL and variables work together

I am working on a testing notebook where the table that is tested can be given as a widget. I wanted to write it in SQL. The notebook does the following steps in a cycle that should run 10 times:1. Store the starting version of a delta table in a var...

  • 170 Views
  • 2 replies
  • 2 kudos
Latest Reply
smoortema
New Contributor III
  • 2 kudos

Thank you! I realised that the example I gave was bad. However, what I was missing is that I did not know how to set a variable in SQL scripting. Including the SET command within the sql string does not work, you have to use the EXECUTE IMMEDIATE ......

  • 2 kudos
1 More Replies
AbhishekNakka
by New Contributor II
  • 49 Views
  • 1 replies
  • 0 kudos

Databricks professional data engineer

Hi, i wanted to know i anyone has given databricks professional data engineering exam recently after oct 2025. I wanted to know if the syllabus has been updated or not ?

  • 49 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @AbhishekNakka ,Yes, the syllabus has been updated. The current exam objectives you can find at below link:Databricks Certified Data Engineer Professional September 2025 - Exam Guide.docxDatabricks Certified Data Engineer Professional | Databricks

  • 0 kudos
DatabricksEngi1
by New Contributor III
  • 123 Views
  • 4 replies
  • 0 kudos

Resolved! Problem in VS Code Extention

Until a few days ago, I was working with Databricks Connect using the VS Code extension, and everything worked perfectly.In my .databrickscfg file, I had authentication configured like this:  [name]host:token: When I ran my code, everything worked fi...

  • 123 Views
  • 4 replies
  • 0 kudos
Latest Reply
dkushari
Databricks Employee
  • 0 kudos

Hi @DatabricksEngi1 - Please ensure you have a Python Venv set up for each Python version that you use with Databricks Connect. Also, I have given step-by-step ways to debug the issue, clear the cache, etc [Read the files and instructions carefully b...

  • 0 kudos
3 More Replies
manugarri
by New Contributor II
  • 18728 Views
  • 12 replies
  • 2 kudos

Fuzzy text matching in Spark

I have a list of client provided data, a list of company names. I have to match those names with an internal database of company names. The client list can fit in memory (its about 10k elements) but the internal dataset is on hdfs and we use Spark ...

  • 18728 Views
  • 12 replies
  • 2 kudos
Latest Reply
Shamzaa3Q
New Contributor II
  • 2 kudos

+1 for rapidfuzz, I have used it in production pipelines. Better than just levenshtein function, as rapidfuzz provides a couple of other algorithms as well. I will warn you to not do what 2024 me attempted, which is use LLM to solve for this. It soun...

  • 2 kudos
11 More Replies
pranaav93
by New Contributor III
  • 66 Views
  • 1 replies
  • 0 kudos

Resolved! TransformWithState is not emitting for live streams

Hi Team, For one of my custom logics i went with transformwithState processor. However it is not working for live stream inputs., I have a start date filter on my df_base and when I give start date that is not current, the processor computes df_loss ...

Data Engineering
apachespark
pyspark
StatefulStreaming
StructuredStreaming
transformWithState
  • 66 Views
  • 1 replies
  • 0 kudos
Latest Reply
pranaav93
New Contributor III
  • 0 kudos

I managed to solve this. The issue was with how I handled the value state in the def init method. It was handled as a dataframe which caused the state to never materialize nor update therefore emitting nulls.I changed them to a tuple of values and th...

  • 0 kudos
drag7ter
by Contributor
  • 2415 Views
  • 1 replies
  • 0 kudos

Delta sharing view and cached data in DSFF

I've created a view with row level access based on CURRENT_RECIPIENT() function in the where clause. And I have 100s of clients as recipients that query this view.The problem is, when I modify this view CREATE OR REPLACE with a new sql code, and reci...

  • 2415 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
New Contributor
  • 0 kudos

Have you tried something like this already?  Force Cache Invalidation (Recommended)  -- After CREATE OR REPLACE VIEW, execute:  ALTER SHARE <share_name> REMOVE TABLE <schema>.<view_name>;  ALTER SHARE <share_name> ADD TABLE <schema>.<view_name>;  Thi...

  • 0 kudos
abhijit007
by New Contributor
  • 1961 Views
  • 1 replies
  • 0 kudos

Lakebridge code conversion | Permission issue

Hi,I’ve successfully installed the transpile module from Lakebridge and tried the tool to convert Informatica mappings into PySpark code. However, I’m encountering a PermissionError during execution. I’ve provided the relevant environment details and...

Data Engineering
Lakebridge
Warehouse Migration
  • 1961 Views
  • 1 replies
  • 0 kudos
Latest Reply
dkushari
Databricks Employee
  • 0 kudos

Hi @abhijit007 - I see that this has been resolved in the 0.10.5 release. Can you please retest and confirm?

  • 0 kudos
AlexSantiago
by New Contributor II
  • 14072 Views
  • 20 replies
  • 4 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 14072 Views
  • 20 replies
  • 4 kudos
Latest Reply
Alyceveum25
New Contributor
  • 4 kudos

Thank you 

  • 4 kudos
19 More Replies
raghvendrarm1
by New Contributor
  • 156 Views
  • 2 replies
  • 1 kudos

Results from the spark application to driver

I tried to read many articles but still not clear on this:The executors complete the execution of tasks and have the results with them.1. The results(output data) from all executors is transported to driver in all cases or executors persist it if tha...

  • 156 Views
  • 2 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @raghvendrarm1  ,   Below are the answers to your questions: Do executors always send “results” to the driver? No. Only actions that return values (e.g., collect, take, first, count) bring data back to the driver. collect explicitly “returns al...

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels