Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I've got data stored in feature tables, plus in a data lake. The feature tables are expected to lag the data lake by at least a little bit. I want to filter data coming out of the feature store by querying the data lake for lookup keys out of my inde...
I'm facing the same issue as described by @mrcity. There is no easy way to alter the dataframe, which is created inside the score_batch() function. Filtering out rows in the (sklearn) pipeline itself is also not convenient since these transformers ar...
Hi,there was a need to query an older snapshot of a table. Therefore ran:deltaTable = DeltaTable.forPath(spark, 'dbfs:/<path>') display(deltaTable.history())and noticed that every fs.write_table run triggers two operations:Write and CREATE OR REPLACE...
@Direo Direo :When you use deltaTable.write() method to write a DataFrame into a Delta table, it actually triggers the Delta write operation internally. This operation performs two actions:It writes the new data to disk in the Delta format, andIt at...
I'm using databricks feature store == 0.6.1. After I register my feature table with `create_feature_table` and write data with `write_Table` I want to read that feature_table based on filter conditions ( may be on time stamp column ) without calling ...
create_training_set is just a simple Select from delta tables. All feature tables are just registered delta tables. Here is an example code that I used to handle that: customer_features_df = spark.sql("SELECT * FROM recommender_system.customer_fea...
I have a feature table in BQ that I want to ingest into Delta Lake. This feature table in BQ has 100TB of data. This table can be partitioned by DATE.What best practices and approaches can I take to ingest this 100TB? In particular, what can I do to ...