cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

prabhjot
by New Contributor III
  • 2975 Views
  • 6 replies
  • 2 kudos

Resolved! Data lineage graph is not working

Hi Team,The issue - Data lineage graph is not working (16-feb, 17-18 Feb) –  I created the below tables but when I click the lineage graph not able to see the upstream or downstream table .... the + sign goes away after a few sec but not able to clic...

  • 2975 Views
  • 6 replies
  • 2 kudos
Latest Reply
Sikki
New Contributor III
  • 2 kudos

 Hi Kaniz,We're encountering the same issue where the lineage is not getting populated for a few tables. Could you let us know if a fix has been implemented in any runtime?"We are uaing job cluster 12.2.x .

  • 2 kudos
5 More Replies
Phani1
by Valued Contributor II
  • 6429 Views
  • 6 replies
  • 0 kudos

Data Quality in Databricks

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

  • 6429 Views
  • 6 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor III
  • 0 kudos

Looks nice! However I don't see Databricks support in the docs

  • 0 kudos
5 More Replies
Oliver_Angelil
by Valued Contributor II
  • 7761 Views
  • 9 replies
  • 6 kudos

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...

Screenshot 2023-05-09 at 17.02.04
  • 7761 Views
  • 9 replies
  • 6 kudos
Latest Reply
Rahul_S
New Contributor II
  • 6 kudos

Informative.

  • 6 kudos
8 More Replies
dvmentalmadess
by Valued Contributor
  • 5441 Views
  • 10 replies
  • 2 kudos

Resolved! Data Explorer minimum permissions

What are the minimum permissions are required to search and view objects in Data Explorer? For example, does a user have to have `USE [SCHEMA|CATALOG]` to search or browse in the Data Explorer? Or can anyone with workspace access browse objects and, ...

  • 5441 Views
  • 10 replies
  • 2 kudos
Latest Reply
bearded_data
New Contributor III
  • 2 kudos

Circling back to this.  With one of the recent releases you can now GRANT BROWSE at the catalog level!  Hopefully they will be rolling this feature out at every object level (schemas and tables specifically).

  • 2 kudos
9 More Replies
User16790091296
by Contributor II
  • 824 Views
  • 1 replies
  • 0 kudos

How to efficiently read the data lake files' metadata?

I want to read the last modified datetime of the files in data lake in a databricks script. If I could read it efficiently as a column when reading data from data lake, it would be perfect.Thank you:)

  • 824 Views
  • 1 replies
  • 0 kudos
Latest Reply
KrunalMedapara
New Contributor II
  • 0 kudos

Efficiently reading data lake files involves:Choosing the Right Tools: Select tools optimized for data lake file formats (e.g., Parquet, ORC) and distributed computing frameworks (e.g., Apache Spark, Apache Flink).Partitioning and Indexing: Partition...

  • 0 kudos
deepu
by New Contributor II
  • 1014 Views
  • 1 replies
  • 1 kudos

performance issue with SIMBA ODBC using SSIS

i was trying to upload data into a table in hive_metastore using SSIS using SIMBA ODBC driver. The data set is huge (1.2 million records and 20 columns) , it is taking more than 40 mins to complete. is there an config change to improve the load time.

  • 1014 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Honored Contributor
  • 1 kudos

Looks like a slow data upload into a table in hive_metastore using SSIS and the SIMBA ODBC driver. This could be due to a variety of factors, including the size of your dataset and the configuration of your system. One potential solution could be to ...

  • 1 kudos
User16826987838
by Contributor
  • 1738 Views
  • 2 replies
  • 0 kudos

Convert pdf's is into structured data

Is there anything on Databricks to help read PDF (payment invoices and receipts for example) and convert it to structured data?

  • 1738 Views
  • 2 replies
  • 0 kudos
Latest Reply
SoniaFoster
New Contributor II
  • 0 kudos

Thanks! Converting PDF format is sometimes a difficult task as not all converters provide accuracy. I want to share with you one interesting tool I recently discovered that can make your work even more efficient. I recently came across an amazing onl...

  • 0 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 13703 Views
  • 6 replies
  • 4 kudos

Resolved! Data type length enforcement

Hello. Is there a way to enforce the length of a column in SQL? For example that a column has to be exactly 18 characters? Thank you!

  • 13703 Views
  • 6 replies
  • 4 kudos
Latest Reply
databricks31
New Contributor II
  • 4 kudos

we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark            ...

  • 4 kudos
5 More Replies
ChrisS
by New Contributor III
  • 4033 Views
  • 7 replies
  • 8 kudos

How to get data scraped from the web into your data storage

I learning data bricks for the first time following the book that is copywrited in 2020 so I imagine it might be a little outdated at this point. What I am trying to do is move data from an online source (in this specific case using shell script but ...

  • 4033 Views
  • 7 replies
  • 8 kudos
Latest Reply
CharlesReily
New Contributor III
  • 8 kudos

In Databricks, you can install external libraries by going to the Clusters tab, selecting your cluster, and then adding the Maven coordinates for Deequ. This represents the best b2b data enrichment services in Databricks.In your notebook or script, y...

  • 8 kudos
6 More Replies
mriccardi
by New Contributor II
  • 2709 Views
  • 4 replies
  • 1 kudos

Spark streaming: Checkpoint not recognising new data

Hello everyone!We are currently facing an issue with a stream that is not updating new data since the 20 of July.We've validated and bronze table has data that silver doesn't have.Also seeing the logs the silver stream is running but writing 0 files....

  • 2709 Views
  • 4 replies
  • 1 kudos
Latest Reply
mriccardi
New Contributor II
  • 1 kudos

Also the trigger is configured to run once, but when we start the job it never ends, it keeps in an endless loop.

  • 1 kudos
3 More Replies
amruth
by New Contributor
  • 2335 Views
  • 4 replies
  • 0 kudos

How do i retrieve timestamp data from history in databricks sql not using DELTA table,its data is coming from SAP

I am not using delta tables my data is from SAP ..how do i retrieve timestamp(history) dynamically from SAP table using databricks SQL

  • 2335 Views
  • 4 replies
  • 0 kudos
Latest Reply
Dribka
New Contributor III
  • 0 kudos

@amruth If you're working with data from SAP in Databricks and want to retrieve timestamps dynamically from a SAP table, you can utilize Databricks SQL to achieve this. You'll need to identify the specific SAP table that contains the timestamp or his...

  • 0 kudos
3 More Replies
Anonymous
by Not applicable
  • 32534 Views
  • 6 replies
  • 10 kudos

How to connect and extract data from sharepoint using Databricks (AWS) ?

We are using Databricks (on AWS). We need to connect to SharePoint and extract & load data to Databricks Delta table. Any possible solution on this ?

  • 32534 Views
  • 6 replies
  • 10 kudos
Latest Reply
yliu
New Contributor III
  • 10 kudos

Wondering the same.. Can we use Sharepoint REST API to download the file and save to dbfs/external location and read it? 

  • 10 kudos
5 More Replies
Swostiman
by New Contributor II
  • 5066 Views
  • 5 replies
  • 4 kudos

Consuming data from databricks[Hive metastore] sql endpoint using pyspark

I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.Even when I try to just show the data it giv...

  • 5066 Views
  • 5 replies
  • 4 kudos
Latest Reply
sucan
New Contributor II
  • 4 kudos

Encountered the same issue and downgrading to 2.6.22 helped me resolve this issue.

  • 4 kudos
4 More Replies
Binesh
by New Contributor II
  • 7443 Views
  • 2 replies
  • 0 kudos

Databricks Logs some error messages while trying to read data using databricks-jdbc dependency

I have tried to read data from Databricks using the following java code.String TOKEN = "token..."; String url = "url...";   Properties properties = new Properties(); properties.setProperty("user", "token"); properties.setProperty("PWD", TOKEN);   Con...

Logger Errors
  • 7443 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@Binesh J​ - The issue could be due to the data type of the column is not compatible with getString() method in line#17. use getObject() method to retrieve the value as a generic value and then convert to string.

  • 0 kudos
1 More Replies
Labels