cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

thiagoawstest
by Contributor
  • 114 Views
  • 1 replies
  • 0 kudos

change network/vpc workspace

Hello, I have two workspaces, each workspace pointing to a VPC in AWS, in one of the accounts we need to remove a subnet, after removing the InvalidSubnetID.NotFound AWS error when starting the clueter, checked in Manager Account, the networl is poin...

thiagoawstest_0-1720808852626.png
  • 114 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @thiagoawstest, Could you please ensure these : The specified subnet IDs exist in the correct VPC and AWS region.The subnet IDs are properly formatted as subnet-xxxxxxxxxxxxxxxxx.The subnets are not already in use by other resources.

  • 0 kudos
Avinash_Narala
by Contributor
  • 110 Views
  • 1 replies
  • 1 kudos

Resolved! Tracking Serverless cluster cost

Hi,I just explored serverless feature in databricks and wondering how can i track cost associated with it. Is it stored in system tables? If yes, then where can i find it?And also how can i prove that it's cost is relatively less compared to classic ...

  • 110 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Avinash_Narala,  Databricks provides a system table called system.billing.usage (Public Preview) that allows you to monitor the cost of your serverless compute usage.This table includes user and workload attributes related to serverless compute c...

  • 1 kudos
Avinash_Narala
by Contributor
  • 119 Views
  • 1 replies
  • 1 kudos

Resolved! File Trigger VS Autoloader

Hi,I recently came across File Trigger in Databricks and find mostly similar to Autoloader. My 1st question is why file trigger as we have autoloader.In which scenarios I can go with file triggers and autoloader.Can you please differentiate?

  • 119 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Avinash_Narala, The key differences between File Trigger and Autoloader in Databricks are: Autoloader Autoloader is a tool for ingesting files from storage and doing file discovery.It is designed for incremental data ingestion, processing new fil...

  • 1 kudos
joshuat
by New Contributor III
  • 345 Views
  • 4 replies
  • 0 kudos

How to partition JDBC Oracle read query and cast with TO_DATE on partition date field?

I'm attempting to fetch an Oracle Netsuite table in parallel via JDBC using the Netsuite Connect JAR, already installed on the cluster and setup correctly. I can do successfully with a single-threaded approach using the `dbtable` option:table = 'Tran...

  • 345 Views
  • 4 replies
  • 0 kudos
Latest Reply
joshuat
New Contributor III
  • 0 kudos

@mtajmouati I appreciate your response. This approach resulted in a generic "bad SQL" error in Netsuite: "java.sql.SQLSyntaxErrorException: [NetSuite][SuiteAnalytics Connect JDBC Driver][OpenAccess SDK SQL Engine]Syntax Error in the SQL statement.[10...

  • 0 kudos
3 More Replies
ShenghaoWu
by New Contributor II
  • 129 Views
  • 2 replies
  • 1 kudos

Java code to read azure storage file in a jar type databricks job

I have a java application, packed as a jar, and will be used as jar dbx job.This application need1. read azure storage file, yaml format.2. need to get passphrase, privatekey stored in dbx, in order to access a snowflake dbmy questions are:1. how to ...

  • 129 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @ShenghaoWu,  To access an Azure Storage file in your Java code, you can use the Azure Storage SDK for Java. This can be done within your Java application packaged as a JAR file that will be used as a dbx job. Here is an example of how to read an ...

  • 1 kudos
1 More Replies
tariq
by New Contributor III
  • 1924 Views
  • 3 replies
  • 0 kudos

SqlContext in DBR 14.3

I have a Databricks workspace in GCP and I am using the cluster with the Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am trying to set the checkpoint directory location using the following command in a notebook:spark.sparkContext.set...

  • 1924 Views
  • 3 replies
  • 0 kudos
Latest Reply
RamlaS
New Contributor II
  • 0 kudos

Same issue with broadcast too. Do you have a solution?

  • 0 kudos
2 More Replies
Laltu_singh
by New Contributor II
  • 1386 Views
  • 3 replies
  • 1 kudos

Accessing Private API in databricks notebook

Hello, I am trying to access an API in databricks python notebook which is available within a restricted network. ​When I try to access that API, it's not able to find the URL used to access the API and throws an HTTP error (max retries exceeded).​d...

  • 1386 Views
  • 3 replies
  • 1 kudos
Latest Reply
pjv
New Contributor III
  • 1 kudos

Hi! Could you recommend a way to setup a proxy server that can reroute all HTTP traffic according to the above advice? Thank you!Kind regards,Pim

  • 1 kudos
2 More Replies
Nathant93
by New Contributor III
  • 149 Views
  • 2 replies
  • 1 kudos

remove empty folders with pyspark

Hi,I am trying to search a mnt point for any empty folders and remove them. Does anyone know of a way to do this? I have tried dbutils.fs.walk but this does not seem to work.Thanks

  • 149 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Nathant93,  To find and remove empty folders in a mount point using PySpark, you can follow these steps: 1. List all folders in the mount pointYou can use the `dbutils.fs.ls()` function to list all the folders in the mount point: folders = dbutil...

  • 1 kudos
1 More Replies
yalei
by New Contributor
  • 5203 Views
  • 2 replies
  • 0 kudos

leaflet not works in notebook(R language)

I saw this notebook: htmlwidgets-azure - Databricks (microsoft.com)However, it is not reproducible. I got a lot errors:there is no package called ‘R.utils’. This is easy to fix, just install the package "R.utils""can not be unloaded". This is not ...

  • 5203 Views
  • 2 replies
  • 0 kudos
Latest Reply
KAdamatzky
New Contributor II
  • 0 kudos

Hi yalei,  Did you have any luck fixing this issue? I am also trying to replicate the htmlwidgets notebook and am running into the same error.Unfortunately, the suggestions provided by Kaniz_Fatma below did not work.

  • 0 kudos
1 More Replies
ksenija
by Contributor
  • 214 Views
  • 3 replies
  • 1 kudos

Resolved! DLT pipeline - silver table, joining streaming data

Hello!I'm trying to do my modeling in DLT pipelines. For bronze, I created 3 streaming views. When I try to join them to create silver table, I got an error that I can't join stream and stream without watermarks. I tried adding them but then I got no...

  • 214 Views
  • 3 replies
  • 1 kudos
Latest Reply
Ravivarma
New Contributor III
  • 1 kudos

Hello @ksenija , Greetings! Streaming uses watermarks to control the threshold for how long to continue processing updates for a given state entity. Common examples of state entities include: Aggregations over a time window. Unique keys in a join b...

  • 1 kudos
2 More Replies
ShankarM
by New Contributor III
  • 121 Views
  • 1 replies
  • 1 kudos

Resolved! Serverless feature audit in data engg.

As recently announced in the summit that notebooks, jobs, workflows will run in serverless mode, how do we track/debug the compute cluster metrics in this case especially when there are performance issues while running jobs/workflows.

  • 121 Views
  • 1 replies
  • 1 kudos
Latest Reply
imsabarinath
New Contributor II
  • 1 kudos

Databricks is planning to enable some system tables to capture some of these metrics and same can be leveraged for troubleshooting as starting point is my view

  • 1 kudos
OliverCadman
by New Contributor III
  • 9993 Views
  • 10 replies
  • 5 kudos

'File not found' error when executing %run magic command

I'm just walking through a simple exercise presented in the Databricks Platform Lab notebook, in which I'm executing a remote notebook from within using the %run command. The remote notebook resides in the same directory as the Platform Lab notebook,...

Data Engineering
%file_not_found
%magic_commands
%run
  • 9993 Views
  • 10 replies
  • 5 kudos
Latest Reply
MuthuLakshmi
New Contributor III
  • 5 kudos

The %run command is a specific Jupyter magic command. The ipykernel used in Databricks examines the initial line of code to determine the appropriate compiler or language for execution. To minimize the likelihood of encountering errors, it is advisab...

  • 5 kudos
9 More Replies
Oliver_Angelil
by Valued Contributor II
  • 6376 Views
  • 9 replies
  • 6 kudos

Resolved! Confusion about Data storage: Data Asset within Databricks vs Hive Metastore vs Delta Lake vs Lakehouse vs DBFS vs Unity Catalogue vs Azure Blob

Hi thereIt seems there are many different ways to store / manage data in Databricks.This is the Data asset in Databricks: However data can also be stored (hyperlinks included to relevant pages):in a Lakehousein Delta Lakeon Azure Blob storagein the D...

Screenshot 2023-05-09 at 17.02.04
  • 6376 Views
  • 9 replies
  • 6 kudos
Latest Reply
Rahul_S
New Contributor II
  • 6 kudos

Informative.

  • 6 kudos
8 More Replies
jwilliam
by Contributor
  • 2307 Views
  • 4 replies
  • 7 kudos

Resolved! Has Unity Catalog been available in Azure Gov Cloud?

We are using Databricks with Premium Tier in Azure Gov Cloud. We check the Data section but don't see any options to Create Metastore.

  • 2307 Views
  • 4 replies
  • 7 kudos
Latest Reply
User16672493709
New Contributor III
  • 7 kudos

Azure.gov does not have Unity Catalog (as of July 2024). I think previous responses missed the context of government cloud in OP's question. UC has been open sourced since this question was asked, and is a more comprehensive solution in commercial cl...

  • 7 kudos
3 More Replies
DmitriyLamzin
by New Contributor
  • 3663 Views
  • 4 replies
  • 0 kudos

applyInPandas started to hang on the runtime 13.3 LTS ML and above

Hello, recently I've tried to upgrade my runtime env to the 13.3 LTS ML and found that it breaks my workload during applyInPandas.My job started to hang during the applyInPandas execution. Thread dump shows that it hangs on direct memory allocation: ...

Data Engineering
pandas udf
  • 3663 Views
  • 4 replies
  • 0 kudos
Latest Reply
Daisy98
New Contributor II
  • 0 kudos

The applyInPandas function may hang on Databricks Runtime 13.3 LTS ML and later versions owing to changes or inefficiencies in how the runtime handles parallel processing. Consider evaluating recent revisions or implementing alternative DataFrame ope...

  • 0 kudos
3 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors