cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Erik_L
by Contributor II
  • 1299 Views
  • 3 replies
  • 1 kudos

Resolved! How to keep data in time-based localized clusters after joining?

I have a bunch of data frames from different data sources. They are all time series data in order of a column timestamp, which is an int32 Unix timestamp. I can join them together by this and another column join_idx which is basically an integer inde...

  • 1299 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Erik Louie​ :If the data frames have different time zones, you can use Databricks' timezone conversion function to convert them to a common time zone. You can use the from_utc_timestamp or to_utc_timestampfunction to convert the timestamp column to ...

  • 1 kudos
2 More Replies
rubenteixeira
by New Contributor III
  • 1453 Views
  • 2 replies
  • 0 kudos

Can't parallelize model training with sc.parallelize, even tough I can run the same code without parallelizing

I'm training a NeuralProphet for a time series forecasting problem. I'm trying to parallelize my training, but this error is appearingThe folder lightning_logs has a hparams.yaml but it's empty. Is this related to permissions on the cluster? Thanks i...

image image.png
  • 1453 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi,Please let us know if this was checked already:

  • 0 kudos
1 More Replies
Erik
by Valued Contributor II
  • 8229 Views
  • 15 replies
  • 9 kudos

Grafana + databricks = True?

We have some timeseries in databricks, and we are reading them into powerbi through sql compute endpoints. For timeseries powerbi is ... not optimal. Earlier I have used grafana with various backends, and quite like it, but I cant find any way to con...

  • 8229 Views
  • 15 replies
  • 9 kudos
Latest Reply
cold_river_22
New Contributor II
  • 9 kudos

There is now an Open-Source Grafana Databricks backend plugin available.https://github.com/mullerpeter/databricks-grafana

  • 9 kudos
14 More Replies
RRO
by Contributor
  • 22040 Views
  • 7 replies
  • 7 kudos

Resolved! Performance for pyspark dataframe is very slow after using a @pandas_udf

Hello,I am currently working on a time series forecasting with FBProphet. Since I have data with many time series groups (~3000) I use a @pandas_udf to parallelize the training. @pandas_udf(schema, PandasUDFType.GROUPED_MAP) def forecast_netprofit(pr...

  • 22040 Views
  • 7 replies
  • 7 kudos
Latest Reply
RRO
Contributor
  • 7 kudos

Thank you for the answers. Unfortunately this did not solve the performance issue.What I did now is I saved the results into a table:results.write.mode("overwrite").saveAsTable("db.results") This is probably not the best solution but after I do that ...

  • 7 kudos
6 More Replies
Labels