cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 38901 Views
  • 7 replies
  • 12 kudos

How to connect and extract data from sharepoint using Databricks (AWS) ?

We are using Databricks (on AWS). We need to connect to SharePoint and extract & load data to Databricks Delta table. Any possible solution on this ?

  • 38901 Views
  • 7 replies
  • 12 kudos
Latest Reply
yliu
New Contributor III
  • 12 kudos

Wondering the same.. Can we use Sharepoint REST API to download the file and save to dbfs/external location and read it? 

  • 12 kudos
6 More Replies
bobbysidhartha
by New Contributor
  • 15246 Views
  • 2 replies
  • 0 kudos

How to parallelly merge data into partitions of databricks delta table using PySpark/Spark streaming?

I have a PySpark streaming pipeline which reads data from a Kafka topic, data undergoes thru various transformations and finally gets merged into a databricks delta table. In the beginning we were loading data into the delta table by using the merge ...

WbOeJ 6MYWV
  • 15246 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@bobbysidhartha​ :When merging data into a partitioned Delta table in parallel, it is important to ensure that each job only accesses and modifies the files in its own partition to avoid concurrency issues. One way to achieve this is to use partition...

  • 0 kudos
1 More Replies
amruth
by New Contributor
  • 2785 Views
  • 5 replies
  • 0 kudos

How do i retrieve timestamp data from history in databricks sql not using DELTA table,its data is coming from SAP

I am not using delta tables my data is from SAP ..how do i retrieve timestamp(history) dynamically from SAP table using databricks SQL

  • 2785 Views
  • 5 replies
  • 0 kudos
Latest Reply
felixdmeshio
New Contributor II
  • 0 kudos

Hello,If you’re trying to bring timestamp data or any other SAP Table from SAP (SAP HANA) into Databricks, our SAP HANA to Databricks Connector can help streamline this process. The connector enables you to extract data directly from SAP HANA tables ...

  • 0 kudos
4 More Replies
Kanna1706
by New Contributor III
  • 2801 Views
  • 3 replies
  • 0 kudos

DBFS option

I can't find dbfs option in my free data bricks community edition when I tried to see location of the table.

  • 2801 Views
  • 3 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

It's fixed. You can continue to use Upload.

  • 0 kudos
2 More Replies
prabhjot
by New Contributor III
  • 3452 Views
  • 4 replies
  • 2 kudos

Resolved! Data lineage graph is not working

Hi Team,The issue - Data lineage graph is not working (16-feb, 17-18 Feb) –  I created the below tables but when I click the lineage graph not able to see the upstream or downstream table .... the + sign goes away after a few sec but not able to clic...

  • 3452 Views
  • 4 replies
  • 2 kudos
Latest Reply
Sikki
New Contributor III
  • 2 kudos

 Hi Kaniz,We're encountering the same issue where the lineage is not getting populated for a few tables. Could you let us know if a fix has been implemented in any runtime?"We are uaing job cluster 12.2.x .

  • 2 kudos
3 More Replies
Phani1
by Valued Contributor II
  • 8390 Views
  • 5 replies
  • 0 kudos

Data Quality in Databricks

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

  • 8390 Views
  • 5 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor III
  • 0 kudos

Looks nice! However I don't see Databricks support in the docs

  • 0 kudos
4 More Replies
Oliver_Angelil
by Valued Contributor II
  • 9094 Views
  • 9 replies
  • 6 kudos

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...

Screenshot 2023-05-09 at 17.02.04
  • 9094 Views
  • 9 replies
  • 6 kudos
Latest Reply
Rahul_S
New Contributor II
  • 6 kudos

Informative.

  • 6 kudos
8 More Replies
dvmentalmadess
by Valued Contributor
  • 6373 Views
  • 10 replies
  • 2 kudos

Resolved! Data Explorer minimum permissions

What are the minimum permissions are required to search and view objects in Data Explorer? For example, does a user have to have `USE [SCHEMA|CATALOG]` to search or browse in the Data Explorer? Or can anyone with workspace access browse objects and, ...

  • 6373 Views
  • 10 replies
  • 2 kudos
Latest Reply
bearded_data
New Contributor III
  • 2 kudos

Circling back to this.  With one of the recent releases you can now GRANT BROWSE at the catalog level!  Hopefully they will be rolling this feature out at every object level (schemas and tables specifically).

  • 2 kudos
9 More Replies
User16790091296
by Contributor II
  • 985 Views
  • 1 replies
  • 0 kudos

How to efficiently read the data lake files' metadata?

I want to read the last modified datetime of the files in data lake in a databricks script. If I could read it efficiently as a column when reading data from data lake, it would be perfect.Thank you:)

  • 985 Views
  • 1 replies
  • 0 kudos
Latest Reply
KrunalMedapara
New Contributor II
  • 0 kudos

Efficiently reading data lake files involves:Choosing the Right Tools: Select tools optimized for data lake file formats (e.g., Parquet, ORC) and distributed computing frameworks (e.g., Apache Spark, Apache Flink).Partitioning and Indexing: Partition...

  • 0 kudos
deepu
by New Contributor II
  • 1163 Views
  • 1 replies
  • 1 kudos

performance issue with SIMBA ODBC using SSIS

i was trying to upload data into a table in hive_metastore using SSIS using SIMBA ODBC driver. The data set is huge (1.2 million records and 20 columns) , it is taking more than 40 mins to complete. is there an config change to improve the load time.

  • 1163 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Looks like a slow data upload into a table in hive_metastore using SSIS and the SIMBA ODBC driver. This could be due to a variety of factors, including the size of your dataset and the configuration of your system. One potential solution could be to ...

  • 1 kudos
User16826987838
by Contributor
  • 1971 Views
  • 2 replies
  • 0 kudos

Convert pdf's is into structured data

Is there anything on Databricks to help read PDF (payment invoices and receipts for example) and convert it to structured data?

  • 1971 Views
  • 2 replies
  • 0 kudos
Latest Reply
SoniaFoster
New Contributor II
  • 0 kudos

Thanks! Converting PDF format is sometimes a difficult task as not all converters provide accuracy. I want to share with you one interesting tool I recently discovered that can make your work even more efficient. I recently came across an amazing onl...

  • 0 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 19465 Views
  • 3 replies
  • 2 kudos

Data type length enforcement

Hello. Is there a way to enforce the length of a column in SQL? For example that a column has to be exactly 18 characters? Thank you!

  • 19465 Views
  • 3 replies
  • 2 kudos
Latest Reply
databricks31
New Contributor II
  • 2 kudos

we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark            ...

  • 2 kudos
2 More Replies
ChrisS
by New Contributor III
  • 4926 Views
  • 7 replies
  • 8 kudos

How to get data scraped from the web into your data storage

I learning data bricks for the first time following the book that is copywrited in 2020 so I imagine it might be a little outdated at this point. What I am trying to do is move data from an online source (in this specific case using shell script but ...

  • 4926 Views
  • 7 replies
  • 8 kudos
Latest Reply
CharlesReily
New Contributor III
  • 8 kudos

In Databricks, you can install external libraries by going to the Clusters tab, selecting your cluster, and then adding the Maven coordinates for Deequ. This represents the best b2b data enrichment services in Databricks.In your notebook or script, y...

  • 8 kudos
6 More Replies
mriccardi
by New Contributor II
  • 3146 Views
  • 4 replies
  • 1 kudos

Spark streaming: Checkpoint not recognising new data

Hello everyone!We are currently facing an issue with a stream that is not updating new data since the 20 of July.We've validated and bronze table has data that silver doesn't have.Also seeing the logs the silver stream is running but writing 0 files....

  • 3146 Views
  • 4 replies
  • 1 kudos
Latest Reply
mriccardi
New Contributor II
  • 1 kudos

Also the trigger is configured to run once, but when we start the job it never ends, it keeps in an endless loop.

  • 1 kudos
3 More Replies
Labels