cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

smoortema
by Contributor
  • 316 Views
  • 2 replies
  • 2 kudos

Resolved! How to make FOR cycle and dynamic SQL and variables work together

I am working on a testing notebook where the table that is tested can be given as a widget. I wanted to write it in SQL. The notebook does the following steps in a cycle that should run 10 times:1. Store the starting version of a delta table in a var...

  • 316 Views
  • 2 replies
  • 2 kudos
Latest Reply
smoortema
Contributor
  • 2 kudos

Thank you! I realised that the example I gave was bad. However, what I was missing is that I did not know how to set a variable in SQL scripting. Including the SET command within the sql string does not work, you have to use the EXECUTE IMMEDIATE ......

  • 2 kudos
1 More Replies
DatabricksEngi1
by Contributor
  • 392 Views
  • 4 replies
  • 1 kudos

Resolved! Problem in VS Code Extention

Until a few days ago, I was working with Databricks Connect using the VS Code extension, and everything worked perfectly.In my .databrickscfg file, I had authentication configured like this:  [name]host:token: When I ran my code, everything worked fi...

  • 392 Views
  • 4 replies
  • 1 kudos
Latest Reply
dkushari
Databricks Employee
  • 1 kudos

Hi @DatabricksEngi1 - Please ensure you have a Python Venv set up for each Python version that you use with Databricks Connect. Also, I have given step-by-step ways to debug the issue, clear the cache, etc [Read the files and instructions carefully b...

  • 1 kudos
3 More Replies
manugarri
by New Contributor II
  • 19435 Views
  • 12 replies
  • 2 kudos

Fuzzy text matching in Spark

I have a list of client provided data, a list of company names. I have to match those names with an internal database of company names. The client list can fit in memory (its about 10k elements) but the internal dataset is on hdfs and we use Spark ...

  • 19435 Views
  • 12 replies
  • 2 kudos
Latest Reply
Shamzaa3Q
New Contributor II
  • 2 kudos

+1 for rapidfuzz, I have used it in production pipelines. Better than just levenshtein function, as rapidfuzz provides a couple of other algorithms as well. I will warn you to not do what 2024 me attempted, which is use LLM to solve for this. It soun...

  • 2 kudos
11 More Replies
pranaav93
by New Contributor III
  • 354 Views
  • 1 replies
  • 1 kudos

Resolved! TransformWithState is not emitting for live streams

Hi Team, For one of my custom logics i went with transformwithState processor. However it is not working for live stream inputs., I have a start date filter on my df_base and when I give start date that is not current, the processor computes df_loss ...

Data Engineering
apachespark
pyspark
StatefulStreaming
StructuredStreaming
transformWithState
  • 354 Views
  • 1 replies
  • 1 kudos
Latest Reply
pranaav93
New Contributor III
  • 1 kudos

I managed to solve this. The issue was with how I handled the value state in the def init method. It was handled as a dataframe which caused the state to never materialize nor update therefore emitting nulls.I changed them to a tuple of values and th...

  • 1 kudos
drag7ter
by Contributor
  • 2561 Views
  • 1 replies
  • 0 kudos

Delta sharing view and cached data in DSFF

I've created a view with row level access based on CURRENT_RECIPIENT() function in the where clause. And I have 100s of clients as recipients that query this view.The problem is, when I modify this view CREATE OR REPLACE with a new sql code, and reci...

  • 2561 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Have you tried something like this already?  Force Cache Invalidation (Recommended)  -- After CREATE OR REPLACE VIEW, execute:  ALTER SHARE <share_name> REMOVE TABLE <schema>.<view_name>;  ALTER SHARE <share_name> ADD TABLE <schema>.<view_name>;  Thi...

  • 0 kudos
abhijit007
by New Contributor
  • 2106 Views
  • 1 replies
  • 1 kudos

Resolved! Lakebridge code conversion | Permission issue

Hi,I’ve successfully installed the transpile module from Lakebridge and tried the tool to convert Informatica mappings into PySpark code. However, I’m encountering a PermissionError during execution. I’ve provided the relevant environment details and...

Data Engineering
Lakebridge
Warehouse Migration
  • 2106 Views
  • 1 replies
  • 1 kudos
Latest Reply
dkushari
Databricks Employee
  • 1 kudos

Hi @abhijit007 - I see that this has been resolved in the 0.10.5 release. Can you please retest and confirm?

  • 1 kudos
AlexSantiago
by New Contributor II
  • 14556 Views
  • 20 replies
  • 4 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 14556 Views
  • 20 replies
  • 4 kudos
Latest Reply
Alyceveum25
New Contributor III
  • 4 kudos

Thank you 

  • 4 kudos
19 More Replies
raghvendrarm1
by New Contributor
  • 291 Views
  • 2 replies
  • 3 kudos

Resolved! Results from the spark application to driver

I tried to read many articles but still not clear on this:The executors complete the execution of tasks and have the results with them.1. The results(output data) from all executors is transported to driver in all cases or executors persist it if tha...

  • 291 Views
  • 2 replies
  • 3 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 3 kudos

Hello @raghvendrarm1  ,   Below are the answers to your questions: Do executors always send “results” to the driver? No. Only actions that return values (e.g., collect, take, first, count) bring data back to the driver. collect explicitly “returns al...

  • 3 kudos
1 More Replies
Saf4Databricks
by New Contributor III
  • 504 Views
  • 3 replies
  • 2 kudos

Resolved! Cannot import pyspark.pipelines module

Question: What could be a cause of the following error of my code in a Databricks notebook, and how can we fix the error? I'm using latest Free Edition of Databricks that has runtime version 17.2 and PySpark version 4.0.0.Error:ImportError: cannot im...

  • 504 Views
  • 3 replies
  • 2 kudos
Latest Reply
dkushari
Databricks Employee
  • 2 kudos

Hi @Saf4Databricks - Are you trying to use it from a standalone Databricks notebook? You should only use it from with Lakeflow Declarative Pipeline (LDP). The link you shared is about LDP. Here is an example where I used it.    

  • 2 kudos
2 More Replies
TalessRocha
by New Contributor II
  • 1663 Views
  • 10 replies
  • 8 kudos

Resolved! Connect to azure data lake storage using databricks free edition

Hello guys, i'm using databricks free edition (serverless) and i am trying to connect to a azure data lake storage.The problem I'm having is that in the free edition we can't configure the cluster so I tried to make the connection via notebook using ...

  • 1663 Views
  • 10 replies
  • 8 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 8 kudos

@TalessRocha thanks for getting back to us! Glad to hear you got it working, that's awesome. Best of luck with your projects.All the best,BS

  • 8 kudos
9 More Replies
Malthe
by Contributor II
  • 533 Views
  • 4 replies
  • 1 kudos

Resolved! Can't enable "variantType-preview" using DLTs

Using create_streaming_table and passing table properties as follows, I get an error running the pipeline for the first time:> Your table schema requires manually enablement of the following table feature(s): variantType-preview.I'm using this code:c...

  • 533 Views
  • 4 replies
  • 1 kudos
Latest Reply
Malthe
Contributor II
  • 1 kudos

There's a workaround available in most situations which is to first create the table without the VARIANT column, run the pipeline at least once, and then add the column in a subsequent refresh.

  • 1 kudos
3 More Replies
Upendra_Dwivedi
by Contributor
  • 2691 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks APP OBO User Authorization

Hi All,We are using on-behalf of user authorization method for our app and the x-forwarded-access-token is expiring after sometime and we have to redeploy our app to rectify the issue. I am not sure what is the issue or how we can keep the token aliv...

Upendra_Dwivedi_0-1747911721728.png
  • 2691 Views
  • 1 replies
  • 1 kudos
Latest Reply
jamesl
Databricks Employee
  • 1 kudos

Hi @Upendra_Dwivedi , are you still facing this issue? The x-forwarded-access-token your app receives is the current user’s access token that Databricks forwards in HTTP headers for on‑behalf‑of‑user access. You should read it from the request on eac...

  • 1 kudos
Mous92i
by New Contributor III
  • 375 Views
  • 3 replies
  • 2 kudos

Resolved! Liquid Clustering With Merge

Hello I’m facing severe performance issues with a  merge into databricksmerge_condition = """ source.data_hierarchy = target.data_hierarchy AND source.sensor_id = target.sensor_id AND source.timestamp = target.timestamp """The target Delt...

  • 375 Views
  • 3 replies
  • 2 kudos
Latest Reply
Mous92i
New Contributor III
  • 2 kudos

Thanks for your response

  • 2 kudos
2 More Replies
databricksero
by New Contributor II
  • 714 Views
  • 8 replies
  • 4 kudos

DLT pipeline fails with “can not infer schema from empty dataset” — works fine when run manually

Hi everyone,I’m running into an issue with a Delta Live Tables (DLT) pipeline that processes a few transformation layers (raw → intermediate → primary → feature).When I trigger the entire pipeline, it fails with the following error:can not infer sche...

  • 714 Views
  • 8 replies
  • 4 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 4 kudos

@databricksero  Explicit Schema Definition: When calling spark.createDataFrame(pdf_cleaned), explicitly provide the schema even if the DataFrame is empty. This helps Spark infer the types and prevents the “cannot infer schema from empty dataset” erro...

  • 4 kudos
7 More Replies
deng_dev
by New Contributor III
  • 11404 Views
  • 1 replies
  • 0 kudos

py4j.protocol.Py4JJavaError: An error occurred while calling o359.sql. : java.util.NoSuchElementExce

Hi!We are creating table in streaming job every micro-batch using spark.sql('create or replace table ... using delta as ...') command. This query includes combining data from multiple tables.Sometimes our job fails with error:py4j.Py4JException: An e...

  • 11404 Views
  • 1 replies
  • 0 kudos
Latest Reply
sahilchavan
New Contributor II
  • 0 kudos

Hi @deng_dev ,Did you discover any way to raise this error gracefully? I'm facing the same error when running the kinesis stream. Although I'm aware of what the error is but my intent is to raise and log the error gracefully 

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels