Databricks Community

chanansh · 02-08-2023

ERROR:py4j.clientserver:There was an exception while executing the Python Proxy on the Python Side. Traceback (most recent call last): File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 617, in _call_proxy retu...

chanansh · 02-03-2023

What is the best practice for accelerating queries which looks like the following?win = Window.partitionBy('key1','key2').orderBy('timestamp') df.select('timestamp', (F.col('col1') - F.lag('col1').over(win)).alias('col1_diff'))I have tried to use OP...

chanansh · 02-02-2023

I have a big data delta table with timestamp, key and metric(s) columns (e.g. m1, m2, ...).I often will group by the key (e.g. select max(m1) group by timestamp, key).I cannot partition by `key` because there are too many values( ~200K).I have tried ...

chanansh · 01-30-2023

According to the documentation you can monitor a spark structure stream job using QueryExecutionListener. However I cannot find it. https://docs.databricks.com/structured-streaming/stream-monitoring.html#language-python

chanansh · 01-18-2023

I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.Normally I would write:lag_window = Window.partitionBy(C...

chanansh · 03-12-2023

Been there done that  still super slow for anything interactive.

chanansh · 02-08-2023

I found this but could not make it work https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming.html

chanansh · 01-22-2023

I asked it also in Stack-overflow since I don't get an answer here https://stackoverflow.com/questions/75161849/spark-structure-streaming-differentiate-over-time

chanansh · 01-17-2023

I don't know. The way I have saved the table was with autoloader as follows:I am saving a structure stream into a table using:``` .writeStream .format("delta") # <----------- .option("checkpointLocation", checkpoint_path) .option("path", ou...

chanansh · 01-17-2023

I have renamed the files replacing : with - as the bug still exists

Databricks Community

User Stats

User Activity

Running stateful spark streaming example fails https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming.html

Delta table acceleration for group by on key columns using ZORDER does not work

delta table grouping by key which is not partitioned by is very slow

QueryExecutionListener cannot be found in pyspark

how to compute difference over time of a spark structure streaming?

Re: delta table grouping by key which is not partitioned by is very slow

Re: how to compute difference over time of a spark structure streaming?

Re: how to compute difference over time of a spark structure streaming?

Re: Delta table cannot be previewed in the Data UI

Re: Relative path in absolute URI when reading a folder with files containing ":" colons in filename