Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I'm reading data into a dataframe withdf = spark.read.json("s3://somepath/")I've tried first creating a delta table using the DeltaTable API with:DeltaTable.createIfNotExists(spark)\
.location(target_path)\
.addColumns(df.sche...
I have a PySpark streaming pipeline which reads data from a Kafka topic, data undergoes thru various transformations and finally gets merged into a databricks delta table. In the beginning we were loading data into the delta table by using the merge ...
@bobbysidhartha :When merging data into a partitioned Delta table in parallel, it is important to ensure that each job only accesses and modifies the files in its own partition to avoid concurrency issues. One way to achieve this is to use partition...
Hello,If you’re trying to bring timestamp data or any other SAP Table from SAP (SAP HANA) into Databricks, our SAP HANA to Databricks Connector can help streamline this process. The connector enables you to extract data directly from SAP HANA tables ...
Hi Team,The issue - Data lineage graph is not working (16-feb, 17-18 Feb) – I created the below tables but when I click the lineage graph not able to see the upstream or downstream table .... the + sign goes away after a few sec but not able to clic...
Hi Kaniz,We're encountering the same issue where the lineage is not getting populated for a few tables. Could you let us know if a fix has been implemented in any runtime?"We are uaing job cluster 12.2.x .
Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.
Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...
What are the minimum permissions are required to search and view objects in Data Explorer? For example, does a user have to have `USE [SCHEMA|CATALOG]` to search or browse in the Data Explorer? Or can anyone with workspace access browse objects and, ...
Circling back to this. With one of the recent releases you can now GRANT BROWSE at the catalog level! Hopefully they will be rolling this feature out at every object level (schemas and tables specifically).
I want to read the last modified datetime of the files in data lake in a databricks script. If I could read it efficiently as a column when reading data from data lake, it would be perfect.Thank you:)
Efficiently reading data lake files involves:Choosing the Right Tools: Select tools optimized for data lake file formats (e.g., Parquet, ORC) and distributed computing frameworks (e.g., Apache Spark, Apache Flink).Partitioning and Indexing: Partition...
i was trying to upload data into a table in hive_metastore using SSIS using SIMBA ODBC driver. The data set is huge (1.2 million records and 20 columns) , it is taking more than 40 mins to complete. is there an config change to improve the load time.
Looks like a slow data upload into a table in hive_metastore using SSIS and the SIMBA ODBC driver. This could be due to a variety of factors, including the size of your dataset and the configuration of your system.
One potential solution could be to ...
Thanks! Converting PDF format is sometimes a difficult task as not all converters provide accuracy. I want to share with you one interesting tool I recently discovered that can make your work even more efficient. I recently came across an amazing onl...
we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark ...
I learning data bricks for the first time following the book that is copywrited in 2020 so I imagine it might be a little outdated at this point. What I am trying to do is move data from an online source (in this specific case using shell script but ...
In Databricks, you can install external libraries by going to the Clusters tab, selecting your cluster, and then adding the Maven coordinates for Deequ. This represents the best b2b data enrichment services in Databricks.In your notebook or script, y...