cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

acagatayyilmaz
by New Contributor
  • 545 Views
  • 1 replies
  • 0 kudos

How to find consumed DBU

Hi All,I'm trying to understand my databricks consumption to purchase a reservation. However, I couldnt find the consumed DBU in both Azure Portal and Databricks workspace.I'm also exporting and processing Azure Cost data daily. When I check the reso...

  • 545 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Honored Contributor
  • 0 kudos

Hi @acagatayyilmaz , Hope you are doing well!  You can refer to the Billable usage system table to find the records of consumed DBU. You can go through the below document to understand more about the System tables:  https://learn.microsoft.com/en-us/...

  • 0 kudos
vanepet
by New Contributor II
  • 10103 Views
  • 5 replies
  • 2 kudos

Is it possible to use multiprocessing or threads to submit multiple queries to a database from Databricks in parallel?

We are trying to improve our overall runtime by running queries in parallel using either multiprocessing or threads. What I am seeing though is that when the function that runs this code is run on a separate process it doesnt return a dataFrame with...

  • 10103 Views
  • 5 replies
  • 2 kudos
Latest Reply
BapsDBS
New Contributor II
  • 2 kudos

Thanks for the links mentioned above. But both of them uses raw python to achieve parallelism. Does this mean Spark (read PySpark) does exactly provisions for parallel execution of functions or even notebooks ? We used a wrapper notebook with ThreadP...

  • 2 kudos
4 More Replies
RIDBX
by New Contributor II
  • 985 Views
  • 2 replies
  • 0 kudos

What is the bestway to handle huge gzipped file dropped to S3 ?

What is the bestway to handle huge gzipped file dropped to S3 ?=================================================I find some intereting suggestions for posted questions. Thanks for reviewing my threads. Here is the situation we have.We are getting dat...

Data Engineering
bulkload
S3
  • 985 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @RIDBX,  One approach is to avoid using DataFrames and instead use RDDs (Resilient Distributed Datasets) for repartitioning.Read the gzipped files as RDDs, repartition them into smaller partitions, and save them in a splittable format (e.g., Snapp...

  • 0 kudos
1 More Replies
zerodarkzone
by New Contributor II
  • 421 Views
  • 2 replies
  • 0 kudos

Cannot create vnet peering on Azure Databricks

Hi,I'm trying to create a VNET peering using to SAP hana using the default VNET created by databricks but it is not possible.I'm getting the following errorNo se pudo agregar el emparejamiento de red virtual "PeeringSAP" a "workers-vnet". Error: El c...

Data Engineering
Azure Databricks
peering
vnet
  • 421 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @zerodarkzone,  Ensure that the user has the necessary permissions to manage network resources. Specifically, they should have the permission to perform the action "Microsoft.Network/virtualNetworks/virtualNetworkPeerings/write" within the scope o...

  • 0 kudos
1 More Replies
jx1226
by New Contributor II
  • 796 Views
  • 2 replies
  • 0 kudos

Connect Workspace EnableNoPublicIP=No and VnetInject=No to storage account with Private Endpoint.

We know that Databricks with VNET injection (our own VNET) allows is to connect to blob storage/ ADLS Gen2 over private endpoints and peering. This is what we typically do.We have a client who created Databricks with EnableNoPublicIP=No (secure clust...

  • 796 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16539034020
Contributor II
  • 0 kudos

Hello,  Thanks for contacting Databricks Support.You need to enable EnableNoPublicIP,  otherwise, you will get the error message "cannot be deployed on subnet containing Basic SKU Public IP addresses or Basic SKU Load Balancer. NIC", it was usually t...

  • 0 kudos
1 More Replies
VVM
by New Contributor III
  • 8509 Views
  • 13 replies
  • 3 kudos

Resolved! Databricks SQL - Unable to Escape Dollar Sign ($) in Column Name

It seems that due to how Databricks processes SQL cells, it's impossible to escape the $ when it comes to a column name.I would expect the following to work:%sql SELECT 'hi' `$id`The backticks ought to escape everything. And indeed that's exactly wha...

  • 8509 Views
  • 13 replies
  • 3 kudos
Latest Reply
Casper-Bang
New Contributor II
  • 3 kudos

What is the status on this bug report? its been over a year now. 

  • 3 kudos
12 More Replies
crankerkor
by New Contributor II
  • 284 Views
  • 2 replies
  • 1 kudos

Databricks JDBC SQL Warehouse Encoding Issue

Hi Everyone.I am trying to connect and read data from the Databricks table using SQL Warehouse and return it using Azure API.However, the non-English characters, for example, 'Ä', are present in the response as following: ��.I am using the databricks...

  • 284 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @crankerkor,  JDBC Driver Configuration: Ensure that you are using the correct JDBC driver. You mentioned using the databricks-jdbc driver. Make sure it’s the latest version and compatible with your Databricks cluster.The Simba Spark JDBC driv...

  • 1 kudos
1 More Replies
Spenyo
by New Contributor II
  • 200 Views
  • 1 replies
  • 1 kudos

Delta table size not shrinking after Vacuum

Hi team.Everyday once we overwrite the last X month data in tables. So it generate a every day a larger amount of history. We don't use time travel so we don't need it.What we done:SET spark.databricks.delta.retentionDurationCheck.enabled = false ALT...

chrome_KZMxPl8x1d.png
  • 200 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Spenyo,  Consider increasing the retention duration if you need to retain historical data for longer periods.If you’re not using time travel, you can set a retention interval of at least 7 days to strike a balance between history retention and st...

  • 1 kudos
Gilg
by Contributor II
  • 231 Views
  • 1 replies
  • 0 kudos

Best Practices Near Real-time Processing

HI All,We are ingesting 1000 files in json format and different sizes per minute. DLT is in continuous mode. Unity Catalog is enabled workspace.  We are using the default setting of Autoloader (Directory Listing) and Silver has CDC as well.We aim to ...

  • 231 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Gilg, Achieving near real-time processing for your data ingestion and processing pipeline is crucial. Here are some best practices to consider: Plan your data isolation model: When using a data platform like Azure Databricks, consider setting up...

  • 0 kudos
User16752240150
by New Contributor II
  • 613 Views
  • 1 replies
  • 1 kudos
  • 613 Views
  • 1 replies
  • 1 kudos
Latest Reply
holly
New Contributor III
  • 1 kudos

Hi there! Appreciate this reply is 3 years later than it was originally asked, but people might be coming across it still. A few things: Koalas was deprecated in spark 3.2 (runtime 10.4). Instead, the recommendation is to use pandas on spark with `im...

  • 1 kudos
manish1987c
by New Contributor II
  • 390 Views
  • 2 replies
  • 0 kudos

Delta Live table expectations

I am able to ues expectation feature in delta live table using by creating the expectations as below   checks = {}checks["validate circuitId col for null values"] = "(circuitId IS not NULL)"checks["validate name col for null values"] = "(name IS not ...

  • 390 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @manish1987c, To dynamically generate expectations based on different conditions, you can create the dlt table inside an if condition. If you encounter any further issues, feel free to ask for additional assistance!

  • 0 kudos
1 More Replies
NarenderKumar
by New Contributor II
  • 332 Views
  • 1 replies
  • 0 kudos

How to set up relations between tables in unity catalog tables

We are using unity catalog.Is there a way to set up relations in unity catalog tables like key column relations, 1 to many, many to 1.Can we also generate ER diagrams if we are able to set up these relations.

  • 332 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @NarenderKumar,  Unity Catalog allows you to define relationships between tables using key columns. Here are the common types of relationships you can set up: One-to-Many (1:N): In this relationship, one record in the primary table corresponds to ...

  • 0 kudos
Brad
by Contributor
  • 963 Views
  • 1 replies
  • 0 kudos

Why "rror: Invalid access to Org: xxx"

Hi team, I installed Databricks CLI, and run "databricks auth login --profile xxx" successfully. I can also connect from vscode to Databricks. "databricks clusters list -p xxx" also works. But when I tried to rundatabricks bundle validateI got"Error:...

  • 963 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Brad,  Ensure that your Databricks CLI configuration is correctly set up with the right access token. Verify that the token is mentioned in both the password field and the Extra field.The Extra field should be configured with a JSON string like t...

  • 0 kudos
Ajay-Pandey
by Esteemed Contributor III
  • 934 Views
  • 5 replies
  • 0 kudos

On-behalf-of token creation for service principals is not enabled for this workspace

Hi AllI just wanted to create PAT for Databricks Service Principle but getting below code while hitting API or using CLI - Please help me to create PAT for the same.#dataengineering #databricks

AjayPandey_0-1710845262519.png AjayPandey_1-1710845276557.png
Data Engineering
community
Databricks
  • 934 Views
  • 5 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Kaniz Have you got any update on this ?

  • 0 kudos
4 More Replies
RajeshRK
by Contributor
  • 5133 Views
  • 6 replies
  • 0 kudos

Resolved! Need help to analyze databricks logs for a long-running job.

Hi Team,We have a job it completes in 3 minutes in one Databricks cluster, if we run the same job in another databricks cluster it is taking 3 hours to complete.I am quite new to Databricks and need your guidance on how to find out where databricks s...

  • 5133 Views
  • 6 replies
  • 0 kudos
Latest Reply
AmitKP
New Contributor II
  • 0 kudos

Hi @Kaniz ,I am saving logs of my databricks Job Compute From ADF, How can i open those files that present in dbfs location.

  • 0 kudos
5 More Replies
Labels
Top Kudoed Authors