Data Engineering

Forum Posts

Sorted by:

by costi9992 • New Contributor III

01-16-2023 8:33:13 AM

1681 Views
2 replies
0 kudos

Access Databricks API using IDP token

Hello,We have a databricks account & workspace, provided by AWS with SSO enabled. Is there any way to access databricks workspace API ( jobs/clusters, etc ) using a token retrieved from IdentityProvider ? We can access databricks workspace API with A...

Data Engineering

1681 Views
2 replies
0 kudos

01-16-2023 8:33:13 AM

View Replies

Latest Reply

fpopa
New Contributor II

02-28-2024 12:19:28 PM

0 kudos

Hey - Costin and Anonymous user, have you managed to get this working, do you have examples by any chance?I'm also trying something similar but I haven't been able to make it work.> authenticate and access the Databricks REST API by setting the Autho...

0 kudos

02-28-2024 12:19:28 PM

1 More Replies

by srjchoubey2 • New Contributor

02-28-2024 5:03:35 AM

2436 Views
1 replies
0 kudos

How to import excel files xls/xlsx file into Databricks python notebook?

Method 1: Using "com.crealytics.spark.excel" package, how do I import the package?Method 2: Using pandas I tried the possible paths, but file not found it shows, nor while uploading the xls/xlsx file it shows options for importing the dataframe.Help ...

Data Engineering

excel

import

pyspark

python

2436 Views
1 replies
0 kudos

02-28-2024 5:03:35 AM

View Replies

Latest Reply

vishwanath_1
New Contributor III

02-28-2024 9:23:18 AM

0 kudos

import pandas as pd ExcelData = pd.read_excel("/dbfs"+FilePath, sheetName) # make sure you add /dbfs to FilePath

0 kudos

02-28-2024 9:23:18 AM

by AurelioGesino • New Contributor II

02-26-2024 7:00:23 AM

1243 Views
2 replies
0 kudos

Bug in unity catalog registering external database that is case sensitive

I successfully registered in my Unity Catalog an external Database ```dwcore``` that is hosted on SQL server.I first added the connection in "External Data": tested the connection and it was successful.I then added the database on top: tested the con...

Data Engineering

1243 Views
2 replies
0 kudos

02-26-2024 7:00:23 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

02-28-2024 5:17:11 AM

0 kudos

Hi @AurelioGesino, It seems you’ve encountered an issue with table names when connecting to an external SQL Server database in Databricks. Let’s break down the situation and explore potential solutions: Table Name Case Sensitivity: You’ve correc...

0 kudos

02-28-2024 5:17:11 AM

1 More Replies

by MoJaMa • Valued Contributor II

06-18-2021 8:28:27 AM

6052 Views
8 replies
2 kudos

Can we have multiple AWS accounts associated with one Databricks account?

Data Engineering

6052 Views
8 replies
2 kudos

06-18-2021 8:28:27 AM

View Replies

Latest Reply

User15848365773
New Contributor II

02-24-2023 3:59:47 PM

2 kudos

Hi @amitca71 @atanu .. yes you can associate as many vpcs(workspace deployment fundamental) across regions and aws accounts to one single databricks aws account infact its one of the super powers of databricks platform and you can even track all thei...

2 kudos

02-24-2023 3:59:47 PM

7 More Replies

by Milliman • New Contributor

02-27-2024 10:38:55 PM

816 Views
1 replies
0 kudos

How could we automatically re run the complete job if any of its associted task fails.?

I need to re run the compete job automatically if any of its associated task gets failed, any help would be appreciable. Thanks

Data Engineering

816 Views
1 replies
0 kudos

02-27-2024 10:38:55 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

02-28-2024 5:08:01 AM

0 kudos

Hi @Milliman, In Databricks, you can automate the re-run of a job if any of its associated tasks fail. Here are some steps to achieve this: Conditional Task Execution: You can specify “Run if dependencies” to run a task based on the run status o...

0 kudos

02-28-2024 5:08:01 AM

by creditorwatch • New Contributor

02-27-2024 4:44:17 PM

582 Views
1 replies
0 kudos

Load data from Aurora to Databricks directly

Hi,Does anyone know how to link Aurora to Databricks directly and load data into Databricks automatically on a schedule without any third-party tools in the middle?

Data Engineering

582 Views
1 replies
0 kudos

02-27-2024 4:44:17 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

02-28-2024 5:05:04 AM

0 kudos

Hi @creditorwatch, To ingest data into Databricks directly from Amazon Aurora and automate the process on a schedule, you have a few options. Let’s explore them: Auto Loader (Recommended): Auto Loader is a powerful feature in Databricks that eff...

0 kudos

02-28-2024 5:05:04 AM

by Bas1 • New Contributor III

03-15-2022 1:42:48 AM

8784 Views
17 replies
20 kudos

Resolved! network security for DBFS storage account

In Azure Databricks the DBFS storage account is open to all networks. Changing that to use a private endpoint or minimizing access to selected networks is not allowed.Is there any way to add network security to this storage account? Alternatively, is...

Data Engineering

8784 Views
17 replies
20 kudos

03-15-2022 1:42:48 AM

View Replies

Latest Reply

Odee79
New Contributor II

01-24-2024 5:13:47 AM

20 kudos

How can we secure the storage account in the managed resource group which holds the DBFS with restricted network access, since access from all networks is blocked by our Azure storage account policy?

20 kudos

01-24-2024 5:13:47 AM

16 More Replies

by alm • New Contributor III

05-30-2023 5:43:01 AM

5042 Views
6 replies
2 kudos

Resolved! How to grant access to views without granting access to underlying tables

I have a medallion architecture: Bronze layer: Raw data in tablesSilver layer: Refined data in views created from the bronze layerGold layer: Data products as views created from the silver layerCurrently I have a data scientist that needs access to d...

Data Engineering

5042 Views
6 replies
2 kudos

05-30-2023 5:43:01 AM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

02-27-2024 8:39:37 AM

2 kudos

Single-user clusters use a different security mode which is the reason for this difference. On single-user/assigned clusters, you'll need the Fine Grained Access Control service (which is a Serverless service) - that is the solution to this problem (...

2 kudos

02-27-2024 8:39:37 AM

5 More Replies

by Rishitha • New Contributor III

10-24-2023 5:40:05 PM

2716 Views
4 replies
1 kudos

Delta live tables straming

I'm trying to addmonotonicallyIncreasingId() column to a streaming table and I see the following errorFailed to start stream [table_name] in either append mode or complete mode. Append mode error: Expression(s): monotonically_increasing_id() is not s...

Data Engineering

2716 Views
4 replies
1 kudos

10-24-2023 5:40:05 PM

View Replies

Latest Reply

Niro
New Contributor II

02-27-2024 6:06:18 PM

1 kudos

Is aggregations with row_number() combined with a SQL window function and a watermark still supported in Databricks 14.3?

1 kudos

02-27-2024 6:06:18 PM

3 More Replies

by Brad • Contributor

02-22-2024 11:31:17 PM

2152 Views
5 replies
0 kudos

Is there a way to control the cluster runtime version for DLT

Hi team, When I create a DLT job, is there a way to control the cluster runtime version somewhere? E.g. I want to use 14.3 LTS. I tried to add `"spark_version": "14.3.x-scala2.12",` inside cluster default label but not work.Thanks

Data Engineering

2152 Views
5 replies
0 kudos

02-22-2024 11:31:17 PM

View Replies

Latest Reply

Brad
Contributor

02-27-2024 3:20:31 PM

0 kudos

Thanks. Got it.And the cluster has to be share mode. Can different DLT jobs share clusters or when DLT job is running, can other people use the cluster? Seems each DLT job running will start a new cluster. If it is not be able to shared, why it has t...

0 kudos

02-27-2024 3:20:31 PM

4 More Replies

by pjp94 • Contributor

02-22-2024 2:37:11 PM

751 Views
1 replies
0 kudos

pyspark.pandas PandasNotImplementedError

Can someone explain why this below code is throwing an error? My intuition is telling me it's my spark version (3.2.1) but would like confirmation:d = {'key':['a','a','c','d','e','f','g','h'], 'data':[1,2,3,4,5,6,7,8]} x = ps.DataFrame(d) x[x['...

Data Engineering

751 Views
1 replies
0 kudos

02-22-2024 2:37:11 PM

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

02-27-2024 9:47:22 AM

0 kudos

@pjp94 - The error indicates the pandas pyspark implementation does not have the below method implemented. pd.Series.duplicated() Next steps is to use dataframe methods such as distinct, groupBy, dropDuplicates to resolve this.

0 kudos

02-27-2024 9:47:22 AM

by User_1611 • New Contributor

02-26-2024 10:05:27 AM

1051 Views
1 replies
0 kudos

TimeoutException: Stream Execution thread for stream [xxxxxx]failed to stop within 15000 millisecond

TimeoutException: Stream Execution thread for stream [id = xxx runId = xxxx] failed to stop within 15000 milliseconds (specified by spark.sql.streaming.stopTimeout). See the cause on what was being executed in the streaming query thread.I have a data...

Data Engineering

1051 Views
1 replies
0 kudos

02-26-2024 10:05:27 AM

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

02-27-2024 9:05:50 AM

0 kudos

@User_1611 - could you please try the following ? Reduce the number of streaming queries running on the same clusterMake sure your code does not try to re-trigger/start an active streaming queryMake sure to collect the thread dumps if this error hap...

0 kudos

02-27-2024 9:05:50 AM

by Shan1 • New Contributor II

02-26-2024 11:34:40 AM

2084 Views
5 replies
0 kudos

Read large volume of parquet files

I have 50k + parquet files in the in azure datalake and i have mount point as well. I need to read all the files and load into a dataframe. i have around 2 billion records in total and all the files are not having all the columns, column order may di...

Data Engineering

2084 Views
5 replies
0 kudos

02-26-2024 11:34:40 AM

View Replies

Latest Reply

shan_chandra
Esteemed Contributor

02-27-2024 8:37:26 AM

0 kudos

@Shan1 - This could be due to the files have cols that differ by data type. Eg. Integer vs long , Boolean vs integer. can be resolved by schemaMerge=False. Please refer to this code. https://github.com/apache/spark/blob/418bba5ad6053449a141f3c9c31e...

0 kudos

02-27-2024 8:37:26 AM

4 More Replies

by Chandraw • New Contributor III

02-19-2024 3:21:01 AM

1757 Views
3 replies
1 kudos

Resolved! Malformed Input Exception while saving or retreiving Table

Hi everyone,I am using DBR version 13 and Managed tables in a custom catalog location of table is AWS S3.running notebook on single user clusterI am facing MalformedInputException while saving data to Tables or reading it.When I am running my noteboo...

Data Engineering

1757 Views
3 replies
1 kudos

02-19-2024 3:21:01 AM

View Replies

Latest Reply

Chandraw
New Contributor III

02-27-2024 4:15:42 AM

1 kudos

@Kaniz_Fatma The issue is resolved as soon as I deployed it to mutlinode dev cluster.Issue is only occurring in single user clusters. Looks like limitation of running all updates in one node as distributed system.

1 kudos

02-27-2024 4:15:42 AM

2 More Replies

by BerkerKozan • New Contributor III

02-26-2024 12:34:26 PM

930 Views
2 replies
1 kudos

Creating All Purpose Cluster in Data Asset Bundles

There is no resource to create All Purpose Cluster, but I need it, so does it mean I should create it via Terraform or DBX and reference to it, which I dont prefer?

Data Engineering

930 Views
2 replies
1 kudos

02-26-2024 12:34:26 PM

View Replies

Latest Reply

BerkerKozan
New Contributor III

02-27-2024 2:55:37 AM

1 kudos

Hello @Ayushi_Suthar, Thanks for the quick reply! Where can I see these requests?https://ideas.databricks.com/ideas/DB-I-9451 ?

1 kudos

02-27-2024 2:55:37 AM

1 More Replies

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

Access Databricks API using IDP token

How to import excel files xls/xlsx file into Databricks python notebook?

Bug in unity catalog registering external database that is case sensitive

Can we have multiple AWS accounts associated with one Databricks account?

How could we automatically re run the complete job if any of its associted task fails.?

Load data from Aurora to Databricks directly

Resolved! network security for DBFS storage account

Resolved! How to grant access to views without granting access to underlying tables

Delta live tables straming

Is there a way to control the cluster runtime version for DLT

pyspark.pandas PandasNotImplementedError

TimeoutException: Stream Execution thread for stream [xxxxxx]failed to stop within 15000 millisecond

Read large volume of parquet files

Resolved! Malformed Input Exception while saving or retreiving Table

Creating All Purpose Cluster in Data Asset Bundles

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error