Data Engineering

Forum Posts

Sorted by:

by maxutil • New Contributor II

11-09-2022 8:45:05 AM

19484 Views
6 replies
3 kudos

Resolved! Invalid Characters in Column Names " ,;{}()\n\t="

I'm reading data into a dataframe withdf = spark.read.json("s3://somepath/")I've tried first creating a delta table using the DeltaTable API with:DeltaTable.createIfNotExists(spark)\ .location(target_path)\ .addColumns(df.sche...

Data Engineering

19484 Views
6 replies
3 kudos

11-09-2022 8:45:05 AM

View Replies

Latest Reply

VZLA
Databricks Employee

01-15-2025 2:07:25 AM

3 kudos

Glad it helped @jb1z , happy to help.

3 kudos

01-15-2025 2:07:25 AM

5 More Replies

by Anonymous • Not applicable

03-30-2022 3:39:44 AM

46689 Views
7 replies
12 kudos

How to connect and extract data from sharepoint using Databricks (AWS) ?

We are using Databricks (on AWS). We need to connect to SharePoint and extract & load data to Databricks Delta table. Any possible solution on this ?

Data Engineering

46689 Views
7 replies
12 kudos

03-30-2022 3:39:44 AM

View Replies

Latest Reply

yliu
New Contributor III

11-10-2023 8:03:15 AM

12 kudos

Wondering the same.. Can we use Sharepoint REST API to download the file and save to dbfs/external location and read it?

12 kudos

11-10-2023 8:03:15 AM

6 More Replies

by bobbysidhartha • New Contributor

01-13-2023 4:26:15 AM

16940 Views
2 replies
0 kudos

How to parallelly merge data into partitions of databricks delta table using PySpark/Spark streaming?

I have a PySpark streaming pipeline which reads data from a Kafka topic, data undergoes thru various transformations and finally gets merged into a databricks delta table. In the beginning we were loading data into the delta table by using the merge ...

Data Engineering

16940 Views
2 replies
0 kudos

01-13-2023 4:26:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:41:06 AM

0 kudos

@bobbysidhartha :When merging data into a partitioned Delta table in parallel, it is important to ensure that each job only accesses and modifies the files in its own partition to avoid concurrency issues. One way to achieve this is to use partition...

0 kudos

04-10-2023 7:41:06 AM

1 More Replies

by amruth • New Contributor

03-02-2023 3:12:24 AM

3498 Views
4 replies
0 kudos

How do i retrieve timestamp data from history in databricks sql not using DELTA table,its data is coming from SAP

I am not using delta tables my data is from SAP ..how do i retrieve timestamp(history) dynamically from SAP table using databricks SQL

Data Engineering

3498 Views
4 replies
0 kudos

03-02-2023 3:12:24 AM

View Replies

Latest Reply

felixdmeshio
New Contributor III

12-17-2024 1:24:07 AM

0 kudos

Hello,If you’re trying to bring timestamp data or any other SAP Table from SAP (SAP HANA) into Databricks, our SAP HANA to Databricks Connector can help streamline this process. The connector enables you to extract data directly from SAP HANA tables ...

0 kudos

12-17-2024 1:24:07 AM

3 More Replies

by Kanna1706 • New Contributor III

04-13-2023 3:09:02 AM

2937 Views
3 replies
0 kudos

DBFS option

I can't find dbfs option in my free data bricks community edition when I tried to see location of the table.

Data Engineering

2937 Views
3 replies
0 kudos

04-13-2023 3:09:02 AM

View Replies

Latest Reply

gchandra
Databricks Employee

10-18-2024 6:05:02 PM

0 kudos

It's fixed. You can continue to use Upload.

0 kudos

10-18-2024 6:05:02 PM

2 More Replies

by prabhjot • New Contributor III

02-22-2023 4:00:02 AM

3829 Views
4 replies
2 kudos

Resolved! Data lineage graph is not working

Hi Team,The issue - Data lineage graph is not working (16-feb, 17-18 Feb) – I created the below tables but when I click the lineage graph not able to see the upstream or downstream table .... the + sign goes away after a few sec but not able to clic...

Data Engineering

3829 Views
4 replies
2 kudos

02-22-2023 4:00:02 AM

View Replies

Latest Reply

Sikki
New Contributor III

09-04-2024 4:27:48 AM

2 kudos

Hi Kaniz,We're encountering the same issue where the lineage is not getting populated for a few tables. Could you let us know if a fix has been implemented in any runtime?"We are uaing job cluster 12.2.x .

2 kudos

09-04-2024 4:27:48 AM

3 More Replies

by Phani1 • Valued Contributor II

04-14-2023 1:17:50 AM

9321 Views
5 replies
0 kudos

Data Quality in Databricks

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

Data Engineering

9321 Views
5 replies
0 kudos

04-14-2023 1:17:50 AM

View Replies

Latest Reply

joarobles
New Contributor III

07-25-2024 8:32:45 AM

0 kudos

Looks nice! However I don't see Databricks support in the docs

0 kudos

07-25-2024 8:32:45 AM

4 More Replies

by Oliver_Angelil • Valued Contributor II

05-09-2023 8:21:07 AM

10255 Views
9 replies
6 kudos

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...

Data Engineering

10255 Views
9 replies
6 kudos

05-09-2023 8:21:07 AM

View Replies

Latest Reply

Rahul_S
New Contributor II

07-14-2024 12:33:38 AM

6 kudos

Informative.

6 kudos

07-14-2024 12:33:38 AM

8 More Replies

by dvmentalmadess • Valued Contributor

03-31-2023 6:32:36 AM

7203 Views
10 replies
2 kudos

Resolved! Data Explorer minimum permissions

What are the minimum permissions are required to search and view objects in Data Explorer? For example, does a user have to have `USE [SCHEMA|CATALOG]` to search or browse in the Data Explorer? Or can anyone with workspace access browse objects and, ...

Data Engineering

7203 Views
10 replies
2 kudos

03-31-2023 6:32:36 AM

View Replies

Latest Reply

bearded_data
New Contributor III

06-21-2024 3:19:21 PM

2 kudos

Circling back to this. With one of the recent releases you can now GRANT BROWSE at the catalog level! Hopefully they will be rolling this feature out at every object level (schemas and tables specifically).

2 kudos

06-21-2024 3:19:21 PM

9 More Replies

by User16790091296 • Contributor II

06-24-2021 8:17:28 AM

1134 Views
1 replies
0 kudos

How to efficiently read the data lake files' metadata?

I want to read the last modified datetime of the files in data lake in a databricks script. If I could read it efficiently as a column when reading data from data lake, it would be perfect.Thank you:)

Data Engineering

1134 Views
1 replies
0 kudos

06-24-2021 8:17:28 AM

View Replies

Latest Reply

KrunalMedapara
New Contributor II

06-13-2024 12:17:33 AM

0 kudos

Efficiently reading data lake files involves:Choosing the Right Tools: Select tools optimized for data lake file formats (e.g., Parquet, ORC) and distributed computing frameworks (e.g., Apache Spark, Apache Flink).Partitioning and Indexing: Partition...

0 kudos

06-13-2024 12:17:33 AM

by deepu • New Contributor II

11-12-2022 11:30:46 PM

1356 Views
1 replies
1 kudos

performance issue with SIMBA ODBC using SSIS

i was trying to upload data into a table in hive_metastore using SSIS using SIMBA ODBC driver. The data set is huge (1.2 million records and 20 columns) , it is taking more than 40 mins to complete. is there an config change to improve the load time.

Data Engineering

1356 Views
1 replies
1 kudos

11-12-2022 11:30:46 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

06-01-2024 1:25:52 AM

1 kudos

Looks like a slow data upload into a table in hive_metastore using SSIS and the SIMBA ODBC driver. This could be due to a variety of factors, including the size of your dataset and the configuration of your system. One potential solution could be to ...

1 kudos

06-01-2024 1:25:52 AM

by User16826987838 • Contributor

06-23-2021 12:43:14 PM

2222 Views
2 replies
0 kudos

Convert pdf's is into structured data

Is there anything on Databricks to help read PDF (payment invoices and receipts for example) and convert it to structured data?

Data Engineering

2222 Views
2 replies
0 kudos

06-23-2021 12:43:14 PM

View Replies

Latest Reply

SoniaFoster
New Contributor II

03-07-2024 7:01:01 AM

0 kudos

Thanks! Converting PDF format is sometimes a difficult task as not all converters provide accuracy. I want to share with you one interesting tool I recently discovered that can make your work even more efficient. I recently came across an amazing onl...

0 kudos

03-07-2024 7:01:01 AM

1 More Replies

by elgeo • Valued Contributor II

02-15-2023 5:56:42 AM

21045 Views
3 replies
2 kudos

Data type length enforcement

Hello. Is there a way to enforce the length of a column in SQL? For example that a column has to be exactly 18 characters? Thank you!

Data Engineering

21045 Views
3 replies
2 kudos

02-15-2023 5:56:42 AM

View Replies

Latest Reply

databricks31
New Contributor II

03-03-2024 11:26:29 PM

2 kudos

we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark ...

2 kudos

03-03-2024 11:26:29 PM

2 More Replies

by ChrisS • New Contributor III

06-17-2023 3:53:43 AM

5817 Views
7 replies
8 kudos

How to get data scraped from the web into your data storage

I learning data bricks for the first time following the book that is copywrited in 2020 so I imagine it might be a little outdated at this point. What I am trying to do is move data from an online source (in this specific case using shell script but ...

Data Engineering

5817 Views
7 replies
8 kudos

06-17-2023 3:53:43 AM

View Replies

Latest Reply

CharlesReily
New Contributor III

01-18-2024 6:53:03 AM

8 kudos

In Databricks, you can install external libraries by going to the Clusters tab, selecting your cluster, and then adding the Maven coordinates for Deequ. This represents the best b2b data enrichment services in Databricks.In your notebook or script, y...