cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Neil
by New Contributor
  • 6858 Views
  • 1 replies
  • 0 kudos

While trying to save the spark dataframe to delta table is taking too long

While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs...

  • 6858 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

can you check the spark UI, to see where the time is spent?It can be a join, udf, ...

  • 0 kudos
FRG96
by New Contributor III
  • 7239 Views
  • 0 replies
  • 0 kudos

How to set the ABFSS URL for Azure Databricks Init Scripts that have spaces in directory names?

I want to use an Init Script on ADLS Gen2 location for my Azure Databricks 11.3 and 12.2 clusters. The init_script.sh is placed in a directory that has spaces in it:https://storageaccount1.blob.core.windows.net/container1/directory%20with%20spaces/su...

  • 7239 Views
  • 0 replies
  • 0 kudos
Chinu
by New Contributor III
  • 6720 Views
  • 1 replies
  • 1 kudos

Resolved! How to create a raw data (with filter_by) to pull query history from now to 5 mins ago

Hi Team, Is it possible I can use "query_start_time_range" filter from the api call to get the query data only from now to 5 mins ago?Im using telegraf to call query history api but it looks like Im reaching the max return and I can't find how to use...

  • 6720 Views
  • 1 replies
  • 1 kudos
Latest Reply
mathan_pillai
Databricks Employee
  • 1 kudos

Have you checked this https://docs.databricks.com/api-explorer/workspace/queryhistory/list you can list the queries based on time range as well. So you can try passing the fields in the filter_by parameter. Then pass the value as (current time - 5 m...

  • 1 kudos
User16783854357
by Databricks Employee
  • 1978 Views
  • 1 replies
  • 0 kudos

Delta Sharing - Who provides the server?

I would like to understand who provides the server when using Delta sharing? If a customer exposes their delta table through Delta sharing, is it the customer who needs to setup a cluster or server to process the incoming requests?

  • 1978 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

The producer does need a cluster to set up Delta Sharing. However, once the handoff happens no cluster is needed, the data will be delivered via storage services.

  • 0 kudos
Shankar
by New Contributor III
  • 3334 Views
  • 1 replies
  • 2 kudos

Resolved! Is there a Python API for vacuum with dry run?

I have the below sql command where i am doing a dry run with vacuum. ​%sql VACUUM <table_name> RETAIN 500 HOURS DRY RUN;wanted to check if there is a way to achieve this in python api?​I​ tried the below. But, not sure if there is a parameter that we...

  • 3334 Views
  • 1 replies
  • 2 kudos
Latest Reply
venkatcrc
New Contributor III
  • 2 kudos

Equivalent of sql command "VACUUM <table_name> RETAIN 500 HOURS DRY RUN;" in python is spark.sql("VACUUM <table_name> RETAIN 500 HOURS DRY RUN;")

  • 2 kudos
Retko
by Contributor
  • 15052 Views
  • 3 replies
  • 3 kudos

Resolved! Data Tab is not showing any databases and tables even though cluster is running (Community edition)

Hi, I have a cluster running: But I dont see anything in Data Tab: As you can see it tells about some error, but error appeared after I deleted clusters which were terminated. Before it said something that cluster is not running, dont remember exactl...

image image image
  • 15052 Views
  • 3 replies
  • 3 kudos
Latest Reply
Rajani
Contributor II
  • 3 kudos

@Retko Okter​ You need to enable DBFS File Browser from the Admin Settings hope this helps.

  • 3 kudos
2 More Replies
horatiug
by New Contributor III
  • 1324 Views
  • 0 replies
  • 0 kudos

Can the databricks_mount timeout be changed. ?

I am using terrafom to do databricks workspace configuration and while mounting 6 buckets if duration of mount is bigger than 20 min I get timeout. Is it possible to change the timeout ? thanksHoratiu

  • 1324 Views
  • 0 replies
  • 0 kudos
cblock
by New Contributor III
  • 3574 Views
  • 3 replies
  • 3 kudos

Unable to run jobs with git notebooks

So, in this case our jobs are deployed from our development workspace to our isolated testing workspace via an automated Azure DevOps pipeline. As such, they are created (and thus run as) a service account user.Recently we made the switch to using gi...

  • 3574 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Chris Block​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 3 kudos
2 More Replies
swatish0395
by New Contributor III
  • 1996 Views
  • 0 replies
  • 0 kudos

i am working on the parquet file level column encryption and decryption on user specific permission

i am able to encrypt and decrypt the daat in multiple ways and able to save the encrypted parquet file, but i want to decrypt the data if the user has specific permission otherwise he will get the encrypted data,.is there any permanent solution to de...

  • 1996 Views
  • 0 replies
  • 0 kudos
grazie
by Contributor
  • 3645 Views
  • 2 replies
  • 2 kudos

how to get dbutils in Runtime 13

We're using the following method (generated by using dbx) to access dbutils, e.g. to retrieve parameters from secret scopes: @staticmethod def _get_dbutils(spark: SparkSession) -> "dbutils": try: from pyspark.dbutils import...

  • 3645 Views
  • 2 replies
  • 2 kudos
Latest Reply
colt
New Contributor III
  • 2 kudos

We have something similar in our code. This worked using runtime 13 until last week. Also the Machine Learning DBR doesn't work either.

  • 2 kudos
1 More Replies
96286
by Contributor
  • 10891 Views
  • 4 replies
  • 3 kudos

Resolved! Autoloader works on compute cluster, but does not work within a task in workflows

I feel like I am going crazy with this. I have tested a data pipeline on my standard compute cluster. I am loading new files as batch from a Google Cloud Storage bucket. Autoloader works exactly as expected from my notebook on my compute cluster. The...

  • 10891 Views
  • 4 replies
  • 3 kudos
Latest Reply
96286
Contributor
  • 3 kudos

I found the issue. I describe the solution in the following SO post. https://stackoverflow.com/questions/76287095/databricks-autoloader-works-on-compute-cluster-but-does-not-work-within-a-task/76313794#76313794

  • 3 kudos
3 More Replies
g96g
by New Contributor III
  • 1616 Views
  • 1 replies
  • 0 kudos

Function in databricks

Im having a hard time to convert below function from SSMS to databricks function. Any help would be appreciated! CREATE FUNCTION [dbo].[MaxOf5Values] (@D1 [int],@D2 [int],@D3 [int],@D4 [int],@D5 [int]) RETURNS int AS BEGIN DECLARE @Result int   ...

  • 1616 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 0 kudos

Hi @Givi Salu​ ,​Please refer to this link that will help you convert this function.

  • 0 kudos
HappySK
by New Contributor II
  • 6397 Views
  • 1 replies
  • 0 kudos

Execute COPY INTO query with source as temp view

We are having the files that needs to be loaded into the delta tableNow we want to perform some transformation on the files and load that into the tableWhat we didCreate a Spark DF from that fileApply transformation on the DFCreate a temp view from t...

  • 6397 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 0 kudos

Hi @Sravan Kumar Mohanraj​ ,Yes, you can use copy query in this case your temp_view will be source.For more info, please visit these links.

  • 0 kudos
Ismail1
by New Contributor III
  • 4414 Views
  • 2 replies
  • 3 kudos

Resolved! API Authentication

I am trying to run some API calls to the account console. I tried with every syntax possible encoded and decoded to get the call successfully but it returns a "user not authenticated" error. But when I tried with the Account admin it worked. I need t...

  • 4414 Views
  • 2 replies
  • 3 kudos
Latest Reply
Ismail1
New Contributor III
  • 3 kudos

Hi Venkat, that sounds like a good idea. Thanks

  • 3 kudos
1 More Replies
Labels