Data Engineering

Forum Posts

Sorted by:

by Trodenn • New Contributor III

01-30-2023 8:41:20 AM

1774 Views
5 replies
1 kudos

Resolved! ApprodxQuantile does not seem to be working with delta live tables (DLT)

HI,I am tying to use the approxQuantile() function and populate a list that I made, yet somehow, whenever I try to run the code it's as if the list is empty and there are no values in it.Code is written as below:@dlt.table(name = "customer_order_silv...

Data Engineering

1774 Views
5 replies
1 kudos

01-30-2023 8:41:20 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-30-2023 10:15:00 AM

1 kudos

Maybe try to use (and the first test in the separate notebook) standard df = spark.read.table("customer_order_silver") to calculate approxQuantile.Of course, you need to set that customer_order_silver has a target location in the catalog, so read us...

1 kudos

01-30-2023 10:15:00 AM

4 More Replies

by guru1 • New Contributor II

12-01-2022 12:58:27 AM

2634 Views
2 replies
0 kudos

Resolved! facing issue mentioned in body when connecting event hub with databricks , followed earlier discussion on this but no solution

ERROR: Query termination received for [id=37bada03-131b-4fbb-8992-a427263fef2c, runId=cf3d7c18-780e-43ae-aed0-9daf2939b823], with exception: java.lang.IllegalArgumentException: Input byte array has wrong 4-byte ending unit at java.util.Base64$Decoder...

Data Engineering

2634 Views
2 replies
0 kudos

12-01-2022 12:58:27 AM

View Replies

Latest Reply

Annapurna_Hiriy
New Contributor III

01-30-2023 10:19:47 AM

0 kudos

The issue could be due to the mismatch in the eventHub jar and the dependencies added. Also, not all the required dependencies may be added.Suggestions:Using the azure_eventhubs_spark_2_12_.jar eventHub spark jar along with the following dependencies...

0 kudos

01-30-2023 10:19:47 AM

1 More Replies

by ravinchi • New Contributor III

12-01-2022 6:07:12 AM

2082 Views
5 replies
9 kudos

I'd like to ingest data into my ADLS from sql server in an incremental manner using Delta Live Tables.

I'd like to ingest data into my ADLS from sql server in an incremental manner using Delta Live Tables. I do not want to use any staging tables. I was using CDC, While I call dlt.apply_changes, its asking me to specify source and target. SInce source ...

Data Engineering

2082 Views
5 replies
9 kudos

12-01-2022 6:07:12 AM

View Replies

Latest Reply

Sandeep
Contributor III

01-26-2023 7:59:49 AM

9 kudos

If you have a CDC feed, looks like we can use this: https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cdc.html

9 kudos

01-26-2023 7:59:49 AM

4 More Replies

by nagini_sitarama • New Contributor III

12-02-2022 1:56:23 PM

1109 Views
3 replies
2 kudos

Error while optimizing the table . Failure of InSet.sql for UTF8String collection

count of the table : 1125089 for october month data , So I am optimizing the table. optimize table where batchday >="2022-10-01" and batchday<="2022-10-31"I am getting error like : GC overhead limit exceeded at org.apache.spark.unsafe.types.UTF8St...

Data Engineering

1109 Views
3 replies
2 kudos

12-02-2022 1:56:23 PM

View Replies

Latest Reply

Priyanka_Biswas
Valued Contributor

01-30-2023 8:41:02 AM

2 kudos

Hi @Nagini Sitaraman To understand the issue better I would like to get some more information. Does the error occur at the driver side or executor side? Can you please share the full error stack trace? You may need to check the spark UI to find wher...

2 kudos

01-30-2023 8:41:02 AM

2 More Replies

by Aviral-Bhardwaj • Esteemed Contributor III

01-28-2023 10:16:45 PM

4003 Views
2 replies
13 kudos

Understanding Rename in Databricks Now there are multiple ways to rename Spark Data Frame Columns or Expressions. We can rename columns or expressions...

Understanding Rename in DatabricksNow there are multiple ways to rename Spark Data Frame Columns or Expressions.We can rename columns or expressions using alias as part of selectWe can add or rename columns or expressions using withColumn on top of t...

Data Engineering

4003 Views
2 replies
13 kudos

01-28-2023 10:16:45 PM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

01-30-2023 4:17:21 AM

13 kudos

Very informative, Thanks for sharing

13 kudos

01-30-2023 4:17:21 AM

1 More Replies

by AlexDavies • Contributor

01-27-2023 5:05:40 AM

1191 Views
2 replies
2 kudos

Issue connecting to SQL warehouse spark thrift server

we have a library that allows dotnet applications to talk to databricks clusters (https://github.com/clearbank/SparkSqlClient). This communicates with the clusters over the spark thrift serverAlthough this works great for clusters in the "data scienc...

Data Engineering

1191 Views
2 replies
2 kudos

01-27-2023 5:05:40 AM

View Replies

Latest Reply

AlexDavies
Contributor

01-30-2023 1:08:18 AM

2 kudos

I have tried those connection details however it they give me 400 errors when trying to connect directly using the hive thrift server contract (https://github.com/apache/hive/blob/master/service-rpc/if/TCLIService.thrift). I do not get the issues whe...

2 kudos

01-30-2023 1:08:18 AM

1 More Replies

by cristianc • Contributor

01-27-2023 6:58:21 AM

815 Views
2 replies
1 kudos

Unexpected workspace setup dialog in the account

Greetings,Recently we were doing cleanups in AWS and removed some Databricks related resources that were used only once for setting up our workspace and were not used since then.Since there is no plan to create any other workspaces the decision was t...

Data Engineering

815 Views
2 replies
1 kudos

01-27-2023 6:58:21 AM

View Replies

Latest Reply

cristianc
Contributor

01-30-2023 12:53:45 AM

1 kudos

The resources that were cleaned up were just the ones that were used for the initial setup of the workspace, everything else important for the day to day operation are in place and we are actively using the workspace, therefore there is no plan to de...

1 kudos

01-30-2023 12:53:45 AM

1 More Replies

by ftc • New Contributor II

08-16-2022 7:25:02 PM

557 Views
1 replies
2 kudos

Can Databricks Certified Data Engineer Professional exam questions be short and easy to understand?

The Databricks Certified Data Engineer Professional exam most questions are too long for those English as second language. Not enough time to read through the questions and sometimes hard to comprehend

Data Engineering

557 Views
1 replies
2 kudos

08-16-2022 7:25:02 PM

View Replies

Latest Reply

eimis_pacheco
Contributor

01-29-2023 11:38:47 PM

2 kudos

I strongly agree with you. There is not a Spanish version of this exam. Those exam are long even for native speakers just imagine for people with English as a second language. For instance, since Amazon does not have a Spanish version, they took this...

2 kudos

01-29-2023 11:38:47 PM

by jonathan-dufaul • Valued Contributor

01-06-2023 1:48:40 PM

1416 Views
4 replies
5 kudos

Why is writing to MSSQL Server 12.0 so slow directly from spark but nearly instant when I write to a csv and read it back

I have a dataframe that inexplicably takes forever to write to an MS SQL Server even though other dataframes, even much larger ones, write nearly instantly. I'm using this code:my_dataframe.write.format("jdbc") .option("url",sqlsUrl) .optio...

Data Engineering

1416 Views
4 replies
5 kudos

01-06-2023 1:48:40 PM

View Replies

Latest Reply

yueyue_tang
New Contributor II

01-29-2023 10:28:49 PM

5 kudos

I meet the same problem and I don't know how to write dataFrame to MS sql server quickly

5 kudos

01-29-2023 10:28:49 PM

3 More Replies

by BF • New Contributor II

01-28-2023 4:51:34 AM

3126 Views
3 replies
2 kudos

Resolved! Pyspark - How do I convert date/timestamp of format like /Date(1593786688000+0200)/ in pyspark?

Hi all, I've a dataframe with CreateDate column with this format:CreateDate/Date(1593786688000+0200)//Date(1446032157000+0100)//Date(1533904635000+0200)//Date(1447839805000+0100)//Date(1589451249000+0200)/and I want to convert that format to date/tim...

Data Engineering

3126 Views
3 replies
2 kudos

01-28-2023 4:51:34 AM

View Replies

Latest Reply

Chaitanya_Raju
Honored Contributor

01-28-2023 8:34:43 PM

2 kudos

Hi @Bruno Franco ,Can you please try the below code, hope it might for you.from pyspark.sql.functions import from_unixtime from pyspark.sql import functions as F final_df = df_src.withColumn("Final_Timestamp", from_unixtime((F.regexp_extract(col("Cr...

2 kudos

01-28-2023 8:34:43 PM

2 More Replies

by whh99 • New Contributor II

01-15-2023 7:30:26 PM

974 Views
3 replies
1 kudos

Given user id, what API can we use to find out which cluster the user is connected to?

I want to know the cluster that user is connected to in databricks. It would be great if we can also get the duration that the user is connected.

Data Engineering

974 Views
3 replies
1 kudos

01-15-2023 7:30:26 PM

View Replies

Latest Reply

Kaniz
Community Manager

01-16-2023 3:18:18 AM

1 kudos

Hi @Hui Hui Wong (Customer), We haven’t heard from you since the last response from @Daniel Sahal (Customer) , and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as...

1 kudos

01-16-2023 3:18:18 AM

2 More Replies

by SreedharVengala • New Contributor III

07-26-2021 6:55:55 PM

13481 Views
18 replies
9 kudos

PGP Encryption / Decryption in Databricks

Is there a way to Decrypt / Encrypt Blob files in Databricks using Key stored in Key Vault. What libraries need to be used? Any code snippets? Links?

Data Engineering

13481 Views
18 replies
9 kudos

07-26-2021 6:55:55 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-05-2022 9:48:09 AM

9 kudos

I am looking for similar requirements to explore various options to encrypt/decrypt the ADLS data using ADB pyspark. Please share list of options available.

9 kudos

07-05-2022 9:48:09 AM

17 More Replies

by 190809 • Contributor

12-07-2022 4:08:12 AM

436 Views
1 replies
1 kudos

What are the requirements in order for the event log to collect backlog metrics?

I am trying to use the event log to collect metrics on the 'flow_progess' under the 'event_type' field. In the the docs it suggests that this information may not be collected based on the data source and runtime used (see screenshot). Can anyone let ...

Data Engineering

436 Views
1 replies
1 kudos

12-07-2022 4:08:12 AM

View Replies

Latest Reply

User16539034020
Contributor II

01-28-2023 1:46:46 PM

1 kudos

Thanks for contacting Databricks Support! I understand that you're looking for information on unsupported data source types and runtimes for the backlog metrics. Unfortunately, we currently have not documented that information. It's possible that som...

1 kudos

01-28-2023 1:46:46 PM

by Ak3 • New Contributor III

12-31-2022 1:18:03 AM

1759 Views
5 replies
6 kudos

Databricks ADLS vs Azure Sql ? which is better for datawarehousing ? and why

Data Engineering

1759 Views
5 replies
6 kudos

12-31-2022 1:18:03 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-02-2023 7:35:14 AM

6 kudos

Databricks is the data lake / lakehouse and Azure SQL is the database.

6 kudos

01-02-2023 7:35:14 AM

4 More Replies

by hanish • New Contributor II

01-25-2023 1:22:35 AM

1409 Views
3 replies
2 kudos

Job cluster support in jobs/runs/submit API

We are using jobs/runs/submit API of databricks to create and trigger a one-time run with new_cluster and existing_cluster configuration. We would like to check if there is provision to pass "job_clusters" in this API to reuse the same cluster across...

Data Engineering

1409 Views
3 replies
2 kudos

01-25-2023 1:22:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-28-2023 6:37:19 AM

2 kudos

@Hanish Bansal Shared job cluster for jobs/runs/submit API is not supported at the moment.

2 kudos

01-28-2023 6:37:19 AM

2 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Resolved! ApprodxQuantile does not seem to be working with delta live tables (DLT)

Resolved! facing issue mentioned in body when connecting event hub with databricks , followed earlier discussion on this but no solution

I'd like to ingest data into my ADLS from sql server in an incremental manner using Delta Live Tables.

Error while optimizing the table . Failure of InSet.sql for UTF8String collection

Understanding Rename in Databricks Now there are multiple ways to rename Spark Data Frame Columns or Expressions. We can rename columns or expressions...

Issue connecting to SQL warehouse spark thrift server

Unexpected workspace setup dialog in the account

Can Databricks Certified Data Engineer Professional exam questions be short and easy to understand?

Why is writing to MSSQL Server 12.0 so slow directly from spark but nearly instant when I write to a csv and read it back

Resolved! Pyspark - How do I convert date/timestamp of format like /Date(1593786688000+0200)/ in pyspark?

Given user id, what API can we use to find out which cluster the user is connected to?

PGP Encryption / Decryption in Databricks

What are the requirements in order for the event log to collect backlog metrics?

Databricks ADLS vs Azure Sql ? which is better for datawarehousing ? and why

Job cluster support in jobs/runs/submit API

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...