cancel
Showing results for 
Search instead for 
Did you mean: 
chanansh
Contributor
since ‎01-11-2023
‎06-26-2023

User Stats

  • 19 Posts
  • 0 Solutions
  • 0 Kudos given
  • 4 Kudos received

User Activity

ERROR:py4j.clientserver:There was an exception while executing the Python Proxy on the Python Side. Traceback (most recent call last): File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 617, in _call_proxy retu...
What is the best practice for accelerating queries which looks like the following?win = Window.partitionBy('key1','key2').orderBy('timestamp') df.select('timestamp', (F.col('col1') - F.lag('col1').over(win)).alias('col1_diff'))I have tried to use OP...
I have a big data delta table with timestamp, key and metric(s) columns (e.g. m1, m2, ...).I often will group by the key (e.g. select max(m1) group by timestamp, key).I cannot partition by `key` because there are too many values( ~200K).I have tried ...
According to the documentation you can monitor a spark structure stream job using QueryExecutionListener. However I cannot find it. https://docs.databricks.com/structured-streaming/stream-monitoring.html#language-python
I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.Normally I would write:lag_window = Window.partitionBy(C...
Kudos from