Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...
I'm trying to find the best strategy for handling big data sets. In this case I have something that is 450 million records. I'm pulling the data from SQL Server very quickly but when I try to push the data to the Delta Table OR a Azure Container the...
Let's say I create a table like CREATE TABLE IF NOT EXISTS new_db.data_table (
key STRING,
value STRING,
last_updated_time TIMESTAMP
) USING DELTA LOCATION 's3://......';Now when I insert into this table I insert data which has say 20 columns a...
I tried running "REFRESH TABLE tablename;" but I still do not see the added columns in the data explorer columns, while I do see the added columns in the sample data
We are looking into enabling Predictive IO on our delta tables. In the ingest process we are using autoloader and I am wondering if autoloader will gett a flag to enable deletion vectors at table creation? Deletion vectors is a requirement for Predic...
Hi,I`m creating some views to be queried by PowerBI. In our delta tables we have a column called database name which contains the source systems database name. What I`m doing is using this to filter data WHERE databaseName = current_database(). No...
When using Delta tables with DBR jobs or even with DLT pipelines, the upserts (especially updates) (on key and timestamp) are taking quite higher than expected time to update the files/tables data (~2 mins for even 1 record poll) (Inserts are lightni...
Hi @Surya Agarwal Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...
Hi team,If we kill - clusters every-time will the connection details changes.if yes, If there a way we can mask this so that the End users are not impacted dur to any changes in Clusters.Also if I want to call a Delta Table from an API using JDBC - s...
Hi @Siddharth Krishna Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell u...
I know delta tables are supporting the ACID properties and my understanding is Merge, Insert, delete, etc. are inside a transaction by default and if any error occurred during these operations, that transaction will be roll backed. I hope this unders...
@Thushar R Yes you are right. As Delta table is keeping a transaction log and maintain version history of your data, it can easily roll back your transaction in case of a failure -> i.e. Once transaction in successfully committed, that is when the ...
Hi,I am trying to understand how to take backup of databrick delta tables/workspace and how to restore in case if any failure.Or suggest me any alternative solution to revert back if data is corrupted.Regards,Sanjay
Hi @Sanjay Jain ,Here are some of the ways Deep Clone: https://www.databricks.com/wp-content/uploads/notebooks/using-deep-clone-disaster-recovery-delta-lake-databricks.htmlRepos for notebooks and code: https://docs.databricks.com/repos/index.htmlht...
Hello @Ovidiu Eremia ,To filter which folders on S3 contain Delta tables, you can look for the specific files that are associated with Delta tables. Delta Lake stores its metadata in a hidden folder named _delta_log, which is located at the root of ...
Hello,I think the only way of handling is to mention the schema within the job through a schema file. The other way is to restart the job to infer the new schema automatically.
Hi All,I have some data in Delta table with multiple columns and each record has a unique identifier.I want to update some columns as per the new values coming in for each of these unique records. However updating one record at a time is taking a lot...
I am using only insert sql query to insert the hist. load but previous data getting deleted.Tried with python query also but same issue persists.Reading the data from gcp bucket(parquet file)writing the data into gcp bucket(delta file)..the deleted f...
I would like to connect to the Delta tables I have created with PowerBI to use for reporting. Is it possible to do this with Databricks or do I have to write my data to some other serving layer?
if you want to read your Delta Lake table directly from the storage without the need of having a Databricks cluster up and running you can also use the official connector Power BI connector for Delta Lake https://github.com/delta-io/connectors/tree/m...
I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession
...