07-23-2024 07:07 AM
Hi,
I'm new to databricks and I'm trying to use stream for my incremental data. This data has duplicates which can be solved using a window function. Can you check where my code goes wrong?
1-------
07-23-2024 07:32 AM - edited 07-23-2024 07:33 AM
Hi @zll_0091 ,
Change the output mode to update. Other than that, your code looks fine, but I would rename variable microdf to windowSpec, because now it's little confusing.
07-23-2024 07:21 PM
Thank you for your reply. I have updated the output mode and now encountering below error:
"py4j.Py4JException: An exception was raised by the Python Proxy. Return Message: Traceback (most recent call last):
File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 617, in _call_proxy
return_value = getattr(self.pool[obj_id], method)(*params)
File "/databricks/spark/python/pyspark/sql/utils.py", line 119, in call
raise e
File "/databricks/spark/python/pyspark/sql/utils.py", line 116, in call
self.func(DataFrame(jdf, wrapped_session_jdf), batch_id)
File "<command-1456054439786523>", line 9, in mergetoDF
(deltadf
File "/databricks/spark/python/delta/tables.py", line 1159, in execute
self._jbuilder.execute()
File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/databricks/spark/python/pyspark/errors/exceptions.py", line 234, in deco
raise converted from None
pyspark.errors.exceptions.AnalysisException: cannot resolve source.key1 in search condition given columns spark_catalog.hive.final_table.key1, spark_catalog.hive.final_table.last_sync_version, spark_catalog.hive.final_table.last_sync_date, spark_catalog.hive.final_table.key2, spark_catalog.hive.final_table.process_key, key1, last_sync_version, last_sync_date, key2, process_key; line 1 pos 0"
07-23-2024 10:43 PM
Hi,
In merge your are referring to source data frame as source, but you need to first alias data frame
(deltadf .alias("target") .merge( microdf.alias("source"), "source.key1 = target.key1 AND source.key2 = target.key2
" )
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now