cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Neil
by New Contributor
  • 5466 Views
  • 1 replies
  • 0 kudos

While trying to save the spark dataframe to delta table is taking too long

While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs...

  • 5466 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

can you check the spark UI, to see where the time is spent?It can be a join, udf, ...

  • 0 kudos
FRG96
by New Contributor III
  • 5931 Views
  • 0 replies
  • 0 kudos

How to set the ABFSS URL for Azure Databricks Init Scripts that have spaces in directory names?

I want to use an Init Script on ADLS Gen2 location for my Azure Databricks 11.3 and 12.2 clusters. The init_script.sh is placed in a directory that has spaces in it:https://storageaccount1.blob.core.windows.net/container1/directory%20with%20spaces/su...

  • 5931 Views
  • 0 replies
  • 0 kudos
Chinu
by New Contributor III
  • 5471 Views
  • 1 replies
  • 1 kudos

Resolved! How to create a raw data (with filter_by) to pull query history from now to 5 mins ago

Hi Team, Is it possible I can use "query_start_time_range" filter from the api call to get the query data only from now to 5 mins ago?Im using telegraf to call query history api but it looks like Im reaching the max return and I can't find how to use...

  • 5471 Views
  • 1 replies
  • 1 kudos
Latest Reply
mathan_pillai
Valued Contributor
  • 1 kudos

Have you checked this https://docs.databricks.com/api-explorer/workspace/queryhistory/list you can list the queries based on time range as well. So you can try passing the fields in the filter_by parameter. Then pass the value as (current time - 5 m...

  • 1 kudos
User16783854357
by New Contributor III
  • 1151 Views
  • 1 replies
  • 0 kudos

Delta Sharing - Who provides the server?

I would like to understand who provides the server when using Delta sharing? If a customer exposes their delta table through Delta sharing, is it the customer who needs to setup a cluster or server to process the incoming requests?

  • 1151 Views
  • 1 replies
  • 0 kudos
Latest Reply
BigRoux
New Contributor III
  • 0 kudos

The producer does need a cluster to set up Delta Sharing. However, once the handoff happens no cluster is needed, the data will be delivered via storage services.

  • 0 kudos
Shankar
by New Contributor III
  • 1552 Views
  • 1 replies
  • 2 kudos

Resolved! Is there a Python API for vacuum with dry run?

I have the below sql command where i am doing a dry run with vacuum. ​%sql VACUUM <table_name> RETAIN 500 HOURS DRY RUN;wanted to check if there is a way to achieve this in python api?​I​ tried the below. But, not sure if there is a parameter that we...

  • 1552 Views
  • 1 replies
  • 2 kudos
Latest Reply
venkatcrc
New Contributor III
  • 2 kudos

Equivalent of sql command "VACUUM <table_name> RETAIN 500 HOURS DRY RUN;" in python is spark.sql("VACUUM <table_name> RETAIN 500 HOURS DRY RUN;")

  • 2 kudos
Retko
by Contributor
  • 7835 Views
  • 3 replies
  • 3 kudos

Resolved! Data Tab is not showing any databases and tables even though cluster is running (Community edition)

Hi, I have a cluster running: But I dont see anything in Data Tab: As you can see it tells about some error, but error appeared after I deleted clusters which were terminated. Before it said something that cluster is not running, dont remember exactl...

image image image
  • 7835 Views
  • 3 replies
  • 3 kudos
Latest Reply
Rajani
Contributor II
  • 3 kudos

@Retko Okter​ You need to enable DBFS File Browser from the Admin Settings hope this helps.

  • 3 kudos
2 More Replies
horatiug
by New Contributor III
  • 722 Views
  • 0 replies
  • 0 kudos

Can the databricks_mount timeout be changed. ?

I am using terrafom to do databricks workspace configuration and while mounting 6 buckets if duration of mount is bigger than 20 min I get timeout. Is it possible to change the timeout ? thanksHoratiu

  • 722 Views
  • 0 replies
  • 0 kudos
cblock
by New Contributor III
  • 1926 Views
  • 3 replies
  • 3 kudos

Unable to run jobs with git notebooks

So, in this case our jobs are deployed from our development workspace to our isolated testing workspace via an automated Azure DevOps pipeline. As such, they are created (and thus run as) a service account user.Recently we made the switch to using gi...

  • 1926 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Chris Block​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 3 kudos
2 More Replies
swatish0395
by New Contributor III
  • 982 Views
  • 0 replies
  • 0 kudos

i am working on the parquet file level column encryption and decryption on user specific permission

i am able to encrypt and decrypt the daat in multiple ways and able to save the encrypted parquet file, but i want to decrypt the data if the user has specific permission otherwise he will get the encrypted data,.is there any permanent solution to de...

  • 982 Views
  • 0 replies
  • 0 kudos
grazie
by Contributor
  • 2173 Views
  • 2 replies
  • 2 kudos

how to get dbutils in Runtime 13

We're using the following method (generated by using dbx) to access dbutils, e.g. to retrieve parameters from secret scopes: @staticmethod def _get_dbutils(spark: SparkSession) -> "dbutils": try: from pyspark.dbutils import...

  • 2173 Views
  • 2 replies
  • 2 kudos
Latest Reply
colt
New Contributor III
  • 2 kudos

We have something similar in our code. This worked using runtime 13 until last week. Also the Machine Learning DBR doesn't work either.

  • 2 kudos
1 More Replies
96286
by Contributor
  • 5905 Views
  • 4 replies
  • 3 kudos

Resolved! Autoloader works on compute cluster, but does not work within a task in workflows

I feel like I am going crazy with this. I have tested a data pipeline on my standard compute cluster. I am loading new files as batch from a Google Cloud Storage bucket. Autoloader works exactly as expected from my notebook on my compute cluster. The...

  • 5905 Views
  • 4 replies
  • 3 kudos
Latest Reply
96286
Contributor
  • 3 kudos

I found the issue. I describe the solution in the following SO post. https://stackoverflow.com/questions/76287095/databricks-autoloader-works-on-compute-cluster-but-does-not-work-within-a-task/76313794#76313794

  • 3 kudos
3 More Replies
g96g
by New Contributor III
  • 939 Views
  • 1 replies
  • 0 kudos

Function in databricks

Im having a hard time to convert below function from SSMS to databricks function. Any help would be appreciated! CREATE FUNCTION [dbo].[MaxOf5Values] (@D1 [int],@D2 [int],@D3 [int],@D4 [int],@D5 [int]) RETURNS int AS BEGIN DECLARE @Result int   ...

  • 939 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Givi Salu​ ,​Please refer to this link that will help you convert this function.

  • 0 kudos
HappySK
by New Contributor
  • 5342 Views
  • 1 replies
  • 0 kudos

Execute COPY INTO query with source as temp view

We are having the files that needs to be loaded into the delta tableNow we want to perform some transformation on the files and load that into the tableWhat we didCreate a Spark DF from that fileApply transformation on the DFCreate a temp view from t...

  • 5342 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Sravan Kumar Mohanraj​ ,Yes, you can use copy query in this case your temp_view will be source.For more info, please visit these links.

  • 0 kudos
Ismail1
by New Contributor III
  • 2187 Views
  • 2 replies
  • 3 kudos

Resolved! API Authentication

I am trying to run some API calls to the account console. I tried with every syntax possible encoded and decoded to get the call successfully but it returns a "user not authenticated" error. But when I tried with the Account admin it worked. I need t...

  • 2187 Views
  • 2 replies
  • 3 kudos
Latest Reply
Ismail1
New Contributor III
  • 3 kudos

Hi Venkat, that sounds like a good idea. Thanks

  • 3 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels