Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hey as previously stated you could drop the duplicates of the columns that contain the said duplicates(code you can find online pretty easily), I have had this problem myself and it came when creating a temporary view from a dataframe, the dataframe ...
Hi @Prashant Joshi Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...
Hi All, I have 22 postgress tables and i need to implement SCD type 2 and create azure Databricks pipeline . However my project team doesn't want to use delta tables concept . Have anyone implemented this ? below is how i planned to do try: df_src = ...
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: The query operator `UpdateCommandEdge` contains one or more unsupportedexpression types Aggregate, Window or Generate.Invalid expres...
Hi @Pradeep Chauhan Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...
Hi,I am trying to see if it is possible to setup a direct connection from dlt pipeline to a table in Databricks SQL by configuring the Target Schema: with poc being a location of schema like "dbfs:/***/***/***/poc.db The error message was just a...
When ever you store a Delta Table to Hive Metastore. This table will be available in Databricks SQL Workspace ( Data Explorer ) under hive_metastore catalog.
Hi, I am using Databricks and want to upgrade to Databricks runtime version 11.3 LTS which uses Spark 3.3 now. Current system enviroment:Operating System: Ubuntu 20.04.4 LTSJava: Zulu 8.56.0.21-CA-linux64Python: 3.8.10Delta Lake: 1.1.0Target system ...
I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession
...
Hi Team,As I'm performing the Databricks workspace migration, during Metastore migration I'm facing below issue.As we found differences in the Metastore table count between Legacy and Target workspace, we checked error logs.After going through Failed...
We are doing DBFS migration. In that we have a folder 'user' in Root DBFS having data 5.8 TB in legacy workspace. We performed AWS CLi Sync/cp between Legacy to Target and again performed the same between Target bucket to Target dbfs While implemen...
Thanks for the quick response.Regarding the suggested AWS data sync approach, we have tried data sync in multiple ways, it is creating folders in s3 bucket itself not on DBFS. As our task is to copy from bucket to DBFS.It seems that it only supports ...
Hey there Community!! I have a client that will produce a CSV file daily that needs to be moved from Bronze -> Silver. Unfortunately, this source file will always be a full set of data....not incremental. I was thinking of using AutoLoader/cloudFil...
I "up voted'" all of @werners suggestions b/c they are all very valid ways of addressing my need (the true power/flexibility of the Databricks UDAP!!!). However, turns out I'm going to end up getting incremental data afterall :). So now the flow wi...
I have a task to transform a dataframe. The task is to collect all the columns in a row and embed it into a JSON string as a column.Source DF:Target DF: