Data Engineering

Forum Posts

Sorted by:

by acagatayyilmaz • New Contributor

a month ago

545 Views
1 replies
0 kudos

How to find consumed DBU

Hi All,I'm trying to understand my databricks consumption to purchase a reservation. However, I couldnt find the consumed DBU in both Azure Portal and Databricks workspace.I'm also exporting and processing Azure Cost data daily. When I check the reso...

Data Engineering

545 Views
1 replies
0 kudos

a month ago

View Replies

Latest Reply

Ayushi_Suthar
Honored Contributor

a month ago

0 kudos

Hi @acagatayyilmaz , Hope you are doing well! You can refer to the Billable usage system table to find the records of consumed DBU. You can go through the below document to understand more about the System tables: https://learn.microsoft.com/en-us/...

0 kudos

a month ago

by vanepet • New Contributor II

12-10-2022 7:31:36 AM

10103 Views
5 replies
2 kudos

Is it possible to use multiprocessing or threads to submit multiple queries to a database from Databricks in parallel?

We are trying to improve our overall runtime by running queries in parallel using either multiprocessing or threads. What I am seeing though is that when the function that runs this code is run on a separate process it doesnt return a dataFrame with...

Data Engineering

10103 Views
5 replies
2 kudos

12-10-2022 7:31:36 AM

View Replies

Latest Reply

BapsDBS
New Contributor II

a month ago

2 kudos

Thanks for the links mentioned above. But both of them uses raw python to achieve parallelism. Does this mean Spark (read PySpark) does exactly provisions for parallel execution of functions or even notebooks ? We used a wrapper notebook with ThreadP...

2 kudos

a month ago

4 More Replies

by RIDBX • New Contributor II

04-06-2024 9:38:28 AM

985 Views
2 replies
0 kudos

What is the bestway to handle huge gzipped file dropped to S3 ?

What is the bestway to handle huge gzipped file dropped to S3 ?=================================================I find some intereting suggestions for posted questions. Thanks for reviewing my threads. Here is the situation we have.We are getting dat...

Data Engineering

bulkload

985 Views
2 replies
0 kudos

04-06-2024 9:38:28 AM

View Replies

Latest Reply

Kaniz
Community Manager

a month ago

0 kudos

Hi @RIDBX, One approach is to avoid using DataFrames and instead use RDDs (Resilient Distributed Datasets) for repartitioning.Read the gzipped files as RDDs, repartition them into smaller partitions, and save them in a splittable format (e.g., Snapp...

0 kudos

a month ago

1 More Replies

by zerodarkzone • New Contributor II

04-05-2024 2:53:38 PM

421 Views
2 replies
0 kudos

Cannot create vnet peering on Azure Databricks

Hi,I'm trying to create a VNET peering using to SAP hana using the default VNET created by databricks but it is not possible.I'm getting the following errorNo se pudo agregar el emparejamiento de red virtual "PeeringSAP" a "workers-vnet". Error: El c...

Data Engineering

Azure Databricks

peering

vnet

421 Views
2 replies
0 kudos

04-05-2024 2:53:38 PM

View Replies

Latest Reply

Kaniz
Community Manager

a month ago

0 kudos

Hi @zerodarkzone, Ensure that the user has the necessary permissions to manage network resources. Specifically, they should have the permission to perform the action "Microsoft.Network/virtualNetworks/virtualNetworkPeerings/write" within the scope o...

0 kudos

a month ago

1 More Replies

by jx1226 • New Contributor II

11-27-2023 3:54:02 AM

796 Views
2 replies
0 kudos

Connect Workspace EnableNoPublicIP=No and VnetInject=No to storage account with Private Endpoint.

We know that Databricks with VNET injection (our own VNET) allows is to connect to blob storage/ ADLS Gen2 over private endpoints and peering. This is what we typically do.We have a client who created Databricks with EnableNoPublicIP=No (secure clust...

Data Engineering

796 Views
2 replies
0 kudos

11-27-2023 3:54:02 AM

View Replies

Latest Reply

User16539034020
Contributor II

12-12-2023 12:34:05 PM

0 kudos

Hello, Thanks for contacting Databricks Support.You need to enable EnableNoPublicIP, otherwise, you will get the error message "cannot be deployed on subnet containing Basic SKU Public IP addresses or Basic SKU Load Balancer. NIC", it was usually t...

0 kudos

12-12-2023 12:34:05 PM

1 More Replies

by VVM • New Contributor III

02-27-2023 3:12:56 PM

8509 Views
13 replies
3 kudos

Resolved! Databricks SQL - Unable to Escape Dollar Sign ($) in Column Name

It seems that due to how Databricks processes SQL cells, it's impossible to escape the $ when it comes to a column name.I would expect the following to work:%sql SELECT 'hi' `$id`The backticks ought to escape everything. And indeed that's exactly wha...

Data Engineering

8509 Views
13 replies
3 kudos

02-27-2023 3:12:56 PM

View Replies

Latest Reply

Casper-Bang
New Contributor II

a month ago

3 kudos

What is the status on this bug report? its been over a year now.

3 kudos

a month ago

12 More Replies

by crankerkor • New Contributor II

04-03-2024 9:09:17 AM

284 Views
2 replies
1 kudos

Databricks JDBC SQL Warehouse Encoding Issue

Hi Everyone.I am trying to connect and read data from the Databricks table using SQL Warehouse and return it using Azure API.However, the non-English characters, for example, 'Ä', are present in the response as following: ��.I am using the databricks...

Data Engineering

284 Views
2 replies
1 kudos

04-03-2024 9:09:17 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-05-2024 4:02:04 AM

1 kudos

Hi @crankerkor, JDBC Driver Configuration: Ensure that you are using the correct JDBC driver. You mentioned using the databricks-jdbc driver. Make sure it’s the latest version and compatible with your Databricks cluster.The Simba Spark JDBC driv...

1 kudos

04-05-2024 4:02:04 AM

1 More Replies

by Spenyo • New Contributor II

a month ago

200 Views
1 replies
1 kudos

Delta table size not shrinking after Vacuum

Hi team.Everyday once we overwrite the last X month data in tables. So it generate a every day a larger amount of history. We don't use time travel so we don't need it.What we done:SET spark.databricks.delta.retentionDurationCheck.enabled = false ALT...

Data Engineering

vacuum

200 Views
1 replies
1 kudos

a month ago

View Replies

Latest Reply

Kaniz
Community Manager

a month ago

1 kudos

Hi @Spenyo, Consider increasing the retention duration if you need to retain historical data for longer periods.If you’re not using time travel, you can set a retention interval of at least 7 days to strike a balance between history retention and st...

1 kudos

a month ago

by Gilg • Contributor II

a month ago

231 Views
1 replies
0 kudos

Best Practices Near Real-time Processing

HI All,We are ingesting 1000 files in json format and different sizes per minute. DLT is in continuous mode. Unity Catalog is enabled workspace. We are using the default setting of Autoloader (Directory Listing) and Silver has CDC as well.We aim to ...

Data Engineering

231 Views
1 replies
0 kudos

a month ago

View Replies

Latest Reply

Kaniz
Community Manager

a month ago

0 kudos

Hi @Gilg, Achieving near real-time processing for your data ingestion and processing pipeline is crucial. Here are some best practices to consider: Plan your data isolation model: When using a data platform like Azure Databricks, consider setting up...

0 kudos

a month ago

by User16752240150 • New Contributor II

06-04-2021 12:35:28 PM

613 Views
1 replies
1 kudos

Resolved! If I write pandas code using koalas and have photon enabled, will my pandas code run on photon?

Data Engineering

613 Views
1 replies
1 kudos

06-04-2021 12:35:28 PM

View Replies

Latest Reply

holly
New Contributor III

a month ago

1 kudos

Hi there! Appreciate this reply is 3 years later than it was originally asked, but people might be coming across it still. A few things: Koalas was deprecated in spark 3.2 (runtime 10.4). Instead, the recommendation is to use pandas on spark with `im...

1 kudos

a month ago

by manish1987c • New Contributor II

a month ago

390 Views
2 replies
0 kudos

Delta Live table expectations

I am able to ues expectation feature in delta live table using by creating the expectations as below checks = {}checks["validate circuitId col for null values"] = "(circuitId IS not NULL)"checks["validate name col for null values"] = "(name IS not ...

Data Engineering

390 Views
2 replies
0 kudos

a month ago

View Replies

Latest Reply

Kaniz
Community Manager

a month ago

0 kudos

Hi @manish1987c, To dynamically generate expectations based on different conditions, you can create the dlt table inside an if condition. If you encounter any further issues, feel free to ask for additional assistance!

0 kudos

a month ago

1 More Replies

by NarenderKumar • New Contributor II

04-06-2024 4:00:46 AM

332 Views
1 replies
0 kudos

How to set up relations between tables in unity catalog tables

We are using unity catalog.Is there a way to set up relations in unity catalog tables like key column relations, 1 to many, many to 1.Can we also generate ER diagrams if we are able to set up these relations.

Data Engineering

332 Views
1 replies
0 kudos

04-06-2024 4:00:46 AM

View Replies

Latest Reply

Kaniz
Community Manager

a month ago

0 kudos

Hi @NarenderKumar, Unity Catalog allows you to define relationships between tables using key columns. Here are the common types of relationships you can set up: One-to-Many (1:N): In this relationship, one record in the primary table corresponds to ...

0 kudos

a month ago

by Brad • Contributor

04-06-2024 4:33:38 PM

963 Views
1 replies
0 kudos

Why "rror: Invalid access to Org: xxx"

Hi team, I installed Databricks CLI, and run "databricks auth login --profile xxx" successfully. I can also connect from vscode to Databricks. "databricks clusters list -p xxx" also works. But when I tried to rundatabricks bundle validateI got"Error:...

Data Engineering

963 Views
1 replies
0 kudos

04-06-2024 4:33:38 PM

View Replies

Latest Reply

Kaniz
Community Manager

a month ago

0 kudos

Hi @Brad, Ensure that your Databricks CLI configuration is correctly set up with the right access token. Verify that the token is mentioned in both the password field and the Extra field.The Extra field should be configured with a JSON string like t...

0 kudos

a month ago

by Ajay-Pandey • Esteemed Contributor III

03-19-2024 3:49:13 AM

934 Views
5 replies
0 kudos

On-behalf-of token creation for service principals is not enabled for this workspace

Hi AllI just wanted to create PAT for Databricks Service Principle but getting below code while hitting API or using CLI - Please help me to create PAT for the same.#dataengineering #databricks

Data Engineering

community

Databricks

934 Views
5 replies
0 kudos

03-19-2024 3:49:13 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

a month ago

0 kudos

Hi @Kaniz Have you got any update on this ?

0 kudos

a month ago

4 More Replies

by RajeshRK • Contributor

01-06-2022 8:14:40 AM

5133 Views
6 replies
0 kudos

Resolved! Need help to analyze databricks logs for a long-running job.

Hi Team,We have a job it completes in 3 minutes in one Databricks cluster, if we run the same job in another databricks cluster it is taking 3 hours to complete.I am quite new to Databricks and need your guidance on how to find out where databricks s...

Data Engineering

5133 Views
6 replies
0 kudos

01-06-2022 8:14:40 AM

View Replies

Latest Reply

AmitKP
New Contributor II

04-07-2024 12:53:50 PM

0 kudos

Hi @Kaniz ,I am saving logs of my databricks Job Compute From ADF, How can i open those files that present in dbfs location.

0 kudos

04-07-2024 12:53:50 PM

5 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

How to find consumed DBU

Is it possible to use multiprocessing or threads to submit multiple queries to a database from Databricks in parallel?

What is the bestway to handle huge gzipped file dropped to S3 ?

Cannot create vnet peering on Azure Databricks

Connect Workspace EnableNoPublicIP=No and VnetInject=No to storage account with Private Endpoint.

Resolved! Databricks SQL - Unable to Escape Dollar Sign ($) in Column Name

Databricks JDBC SQL Warehouse Encoding Issue

Delta table size not shrinking after Vacuum

Best Practices Near Real-time Processing

Resolved! If I write pandas code using koalas and have photon enabled, will my pandas code run on photon?

Delta Live table expectations

How to set up relations between tables in unity catalog tables

Why "rror: Invalid access to Org: xxx"

On-behalf-of token creation for service principals is not enabled for this workspace

Resolved! Need help to analyze databricks logs for a long-running job.

Load multiple delta tables at once from Sql server

Starting Serverless sql cluster on GCP

"Can't login to databricks socket is closed" when ...

Temporary views no longer working for Share Comput...

Does DLT use one single SparkSession?