Data Engineering

Forum Posts

Sorted by:

by thiagoawstest • Contributor

2 weeks ago

114 Views
1 replies
0 kudos

change network/vpc workspace

Hello, I have two workspaces, each workspace pointing to a VPC in AWS, in one of the accounts we need to remove a subnet, after removing the InvalidSubnetID.NotFound AWS error when starting the clueter, checked in Manager Account, the networl is poin...

Data Engineering

AWS

114 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

2 weeks ago

0 kudos

Hi @thiagoawstest, Could you please ensure these : The specified subnet IDs exist in the correct VPC and AWS region.The subnet IDs are properly formatted as subnet-xxxxxxxxxxxxxxxxx.The subnets are not already in use by other resources.

0 kudos

2 weeks ago

by Avinash_Narala • Contributor

2 weeks ago

110 Views
1 replies
1 kudos

Resolved! Tracking Serverless cluster cost

Hi,I just explored serverless feature in databricks and wondering how can i track cost associated with it. Is it stored in system tables? If yes, then where can i find it?And also how can i prove that it's cost is relatively less compared to classic ...

Data Engineering

110 Views
1 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

2 weeks ago

1 kudos

Hi @Avinash_Narala, Databricks provides a system table called system.billing.usage (Public Preview) that allows you to monitor the cost of your serverless compute usage.This table includes user and workload attributes related to serverless compute c...

1 kudos

2 weeks ago

by Avinash_Narala • Contributor

2 weeks ago

119 Views
1 replies
1 kudos

Resolved! File Trigger VS Autoloader

Hi,I recently came across File Trigger in Databricks and find mostly similar to Autoloader. My 1st question is why file trigger as we have autoloader.In which scenarios I can go with file triggers and autoloader.Can you please differentiate?

Data Engineering

119 Views
1 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

2 weeks ago

1 kudos

Hi @Avinash_Narala, The key differences between File Trigger and Autoloader in Databricks are: Autoloader Autoloader is a tool for ingesting files from storage and doing file discovery.It is designed for incremental data ingestion, processing new fil...

1 kudos

2 weeks ago

by joshuat • New Contributor III

a month ago

345 Views
4 replies
0 kudos

How to partition JDBC Oracle read query and cast with TO_DATE on partition date field?

I'm attempting to fetch an Oracle Netsuite table in parallel via JDBC using the Netsuite Connect JAR, already installed on the cluster and setup correctly. I can do successfully with a single-threaded approach using the `dbtable` option:table = 'Tran...

Data Engineering

345 Views
4 replies
0 kudos

a month ago

View Replies

Latest Reply

joshuat
New Contributor III

2 weeks ago

0 kudos

@mtajmouati I appreciate your response. This approach resulted in a generic "bad SQL" error in Netsuite: "java.sql.SQLSyntaxErrorException: [NetSuite][SuiteAnalytics Connect JDBC Driver][OpenAccess SDK SQL Engine]Syntax Error in the SQL statement.[10...

0 kudos

2 weeks ago

3 More Replies

by ShenghaoWu • New Contributor II

2 weeks ago

129 Views
2 replies
1 kudos

Java code to read azure storage file in a jar type databricks job

I have a java application, packed as a jar, and will be used as jar dbx job.This application need1. read azure storage file, yaml format.2. need to get passphrase, privatekey stored in dbx, in order to access a snowflake dbmy questions are:1. how to ...

Data Engineering

129 Views
2 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

2 weeks ago

1 kudos

Hi @ShenghaoWu, To access an Azure Storage file in your Java code, you can use the Azure Storage SDK for Java. This can be done within your Java application packaged as a JAR file that will be used as a dbx job. Here is an example of how to read an ...

1 kudos

2 weeks ago

1 More Replies

by tariq • New Contributor III

04-11-2024 10:17:00 PM

1924 Views
3 replies
0 kudos

SqlContext in DBR 14.3

I have a Databricks workspace in GCP and I am using the cluster with the Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am trying to set the checkpoint directory location using the following command in a notebook:spark.sparkContext.set...

Data Engineering

1924 Views
3 replies
0 kudos

04-11-2024 10:17:00 PM

View Replies

Latest Reply

RamlaS
New Contributor II

2 weeks ago

0 kudos

Same issue with broadcast too. Do you have a solution?

0 kudos

2 weeks ago

2 More Replies

by Laltu_singh • New Contributor II

02-21-2023 7:38:22 AM

1386 Views
3 replies
1 kudos

Accessing Private API in databricks notebook

Hello, I am trying to access an API in databricks python notebook which is available within a restricted network. When I try to access that API, it's not able to find the URL used to access the API and throws an HTTP error (max retries exceeded).d...

Data Engineering

1386 Views
3 replies
1 kudos

02-21-2023 7:38:22 AM

View Replies

Latest Reply

pjv
New Contributor III

2 weeks ago

1 kudos

Hi! Could you recommend a way to setup a proxy server that can reroute all HTTP traffic according to the above advice? Thank you!Kind regards,Pim

1 kudos

2 weeks ago

2 More Replies

by Nathant93 • New Contributor III

3 weeks ago

149 Views
2 replies
1 kudos

remove empty folders with pyspark

Hi,I am trying to search a mnt point for any empty folders and remove them. Does anyone know of a way to do this? I have tried dbutils.fs.walk but this does not seem to work.Thanks

Data Engineering

149 Views
2 replies
1 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

2 weeks ago

1 kudos

Hi @Nathant93, To find and remove empty folders in a mount point using PySpark, you can follow these steps: 1. List all folders in the mount pointYou can use the `dbutils.fs.ls()` function to list all the folders in the mount point: folders = dbutil...

1 kudos

2 weeks ago

1 More Replies

by yalei • New Contributor

04-22-2023 8:43:56 PM

5203 Views
2 replies
0 kudos

leaflet not works in notebook(R language)

I saw this notebook: htmlwidgets-azure - Databricks (microsoft.com)However, it is not reproducible. I got a lot errors:there is no package called ‘R.utils’. This is easy to fix, just install the package "R.utils""can not be unloaded". This is not ...

Data Engineering

5203 Views
2 replies
0 kudos

04-22-2023 8:43:56 PM

View Replies

Latest Reply

KAdamatzky
New Contributor II

2 weeks ago

0 kudos

Hi yalei, Did you have any luck fixing this issue? I am also trying to replicate the htmlwidgets notebook and am running into the same error.Unfortunately, the suggestions provided by Kaniz_Fatma below did not work.

0 kudos

2 weeks ago

1 More Replies

by ksenija • Contributor

2 weeks ago

214 Views
3 replies
1 kudos

Resolved! DLT pipeline - silver table, joining streaming data

Hello!I'm trying to do my modeling in DLT pipelines. For bronze, I created 3 streaming views. When I try to join them to create silver table, I got an error that I can't join stream and stream without watermarks. I tried adding them but then I got no...

Data Engineering

214 Views
3 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Ravivarma
New Contributor III

2 weeks ago

1 kudos

Hello @ksenija , Greetings! Streaming uses watermarks to control the threshold for how long to continue processing updates for a given state entity. Common examples of state entities include: Aggregations over a time window. Unique keys in a join b...

1 kudos

2 weeks ago

2 More Replies

by ShankarM • New Contributor III

2 weeks ago

121 Views
1 replies
1 kudos

Resolved! Serverless feature audit in data engg.

As recently announced in the summit that notebooks, jobs, workflows will run in serverless mode, how do we track/debug the compute cluster metrics in this case especially when there are performance issues while running jobs/workflows.

Data Engineering

121 Views
1 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

imsabarinath
New Contributor II

2 weeks ago

1 kudos

Databricks is planning to enable some system tables to capture some of these metrics and same can be leveraged for troubleshooting as starting point is my view

1 kudos

2 weeks ago

by OliverCadman • New Contributor III

10-09-2023 1:57:10 AM

9993 Views
10 replies
5 kudos

'File not found' error when executing %run magic command

I'm just walking through a simple exercise presented in the Databricks Platform Lab notebook, in which I'm executing a remote notebook from within using the %run command. The remote notebook resides in the same directory as the Platform Lab notebook,...

Data Engineering

%file_not_found

%magic_commands

%run

9993 Views
10 replies
5 kudos

10-09-2023 1:57:10 AM

View Replies

Latest Reply

MuthuLakshmi
New Contributor III

11-07-2023 6:47:46 AM

5 kudos

The %run command is a specific Jupyter magic command. The ipykernel used in Databricks examines the initial line of code to determine the appropriate compiler or language for execution. To minimize the likelihood of encountering errors, it is advisab...

5 kudos

11-07-2023 6:47:46 AM

9 More Replies

by Oliver_Angelil • Valued Contributor II

05-09-2023 8:21:07 AM

6376 Views
9 replies
6 kudos

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...

Data Engineering

6376 Views
9 replies
6 kudos

05-09-2023 8:21:07 AM

View Replies

Latest Reply

Rahul_S
New Contributor II

2 weeks ago

6 kudos

Informative.

6 kudos

2 weeks ago

8 More Replies

by jwilliam • Contributor

07-27-2022 4:56:36 AM

2307 Views
4 replies
7 kudos

Resolved! Has Unity Catalog been available in Azure Gov Cloud?

We are using Databricks with Premium Tier in Azure Gov Cloud. We check the Data section but don't see any options to Create Metastore.

Data Engineering

2307 Views
4 replies
7 kudos

07-27-2022 4:56:36 AM

View Replies

Latest Reply

User16672493709
New Contributor III

2 weeks ago

7 kudos

Azure.gov does not have Unity Catalog (as of July 2024). I think previous responses missed the context of government cloud in OP's question. UC has been open sourced since this question was asked, and is a more comprehensive solution in commercial cl...

7 kudos

2 weeks ago

3 More Replies

by DmitriyLamzin • New Contributor

01-09-2024 8:32:59 AM

3663 Views
4 replies
0 kudos

applyInPandas started to hang on the runtime 13.3 LTS ML and above

Hello, recently I've tried to upgrade my runtime env to the 13.3 LTS ML and found that it breaks my workload during applyInPandas.My job started to hang during the applyInPandas execution. Thread dump shows that it hangs on direct memory allocation: ...

Data Engineering

pandas udf

3663 Views
4 replies
0 kudos

01-09-2024 8:32:59 AM

View Replies

Latest Reply

Daisy98
New Contributor II

2 weeks ago

0 kudos

The applyInPandas function may hang on Databricks Runtime 13.3 LTS ML and later versions owing to changes or inefficiencies in how the runtime handles parallel processing. Consider evaluating recent revisions or implementing alternative DataFrame ope...

0 kudos

2 weeks ago

3 More Replies

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

change network/vpc workspace

Resolved! Tracking Serverless cluster cost

Resolved! File Trigger VS Autoloader

How to partition JDBC Oracle read query and cast with TO_DATE on partition date field?

Java code to read azure storage file in a jar type databricks job

SqlContext in DBR 14.3

Accessing Private API in databricks notebook

remove empty folders with pyspark

leaflet not works in notebook(R language)

Resolved! DLT pipeline - silver table, joining streaming data

Resolved! Serverless feature audit in data engg.

'File not found' error when executing %run magic command

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Resolved! Has Unity Catalog been available in Azure Gov Cloud?

applyInPandas started to hang on the runtime 13.3 LTS ML and above

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error