Data Engineering

Forum Posts

Sorted by:

by ftc • New Contributor II

08-16-2022 7:25:02 PM

557 Views
1 replies
2 kudos

Can Databricks Certified Data Engineer Professional exam questions be short and easy to understand?

The Databricks Certified Data Engineer Professional exam most questions are too long for those English as second language. Not enough time to read through the questions and sometimes hard to comprehend

Data Engineering

557 Views
1 replies
2 kudos

08-16-2022 7:25:02 PM

View Replies

Latest Reply

eimis_pacheco
Contributor

01-29-2023 11:38:47 PM

2 kudos

I strongly agree with you. There is not a Spanish version of this exam. Those exam are long even for native speakers just imagine for people with English as a second language. For instance, since Amazon does not have a Spanish version, they took this...

2 kudos

01-29-2023 11:38:47 PM

by jonathan-dufaul • Valued Contributor

01-06-2023 1:48:40 PM

1419 Views
4 replies
5 kudos

Why is writing to MSSQL Server 12.0 so slow directly from spark but nearly instant when I write to a csv and read it back

I have a dataframe that inexplicably takes forever to write to an MS SQL Server even though other dataframes, even much larger ones, write nearly instantly. I'm using this code:my_dataframe.write.format("jdbc") .option("url",sqlsUrl) .optio...

Data Engineering

1419 Views
4 replies
5 kudos

01-06-2023 1:48:40 PM

View Replies

Latest Reply

yueyue_tang
New Contributor II

01-29-2023 10:28:49 PM

5 kudos

I meet the same problem and I don't know how to write dataFrame to MS sql server quickly

5 kudos

01-29-2023 10:28:49 PM

3 More Replies

by BF • New Contributor II

01-28-2023 4:51:34 AM

3136 Views
3 replies
2 kudos

Resolved! Pyspark - How do I convert date/timestamp of format like /Date(1593786688000+0200)/ in pyspark?

Hi all, I've a dataframe with CreateDate column with this format:CreateDate/Date(1593786688000+0200)//Date(1446032157000+0100)//Date(1533904635000+0200)//Date(1447839805000+0100)//Date(1589451249000+0200)/and I want to convert that format to date/tim...

Data Engineering

3136 Views
3 replies
2 kudos

01-28-2023 4:51:34 AM

View Replies

Latest Reply

Chaitanya_Raju
Honored Contributor

01-28-2023 8:34:43 PM

2 kudos

Hi @Bruno Franco ,Can you please try the below code, hope it might for you.from pyspark.sql.functions import from_unixtime from pyspark.sql import functions as F final_df = df_src.withColumn("Final_Timestamp", from_unixtime((F.regexp_extract(col("Cr...

2 kudos

01-28-2023 8:34:43 PM

2 More Replies

by whh99 • New Contributor II

01-15-2023 7:30:26 PM

978 Views
3 replies
1 kudos

Given user id, what API can we use to find out which cluster the user is connected to?

I want to know the cluster that user is connected to in databricks. It would be great if we can also get the duration that the user is connected.

Data Engineering

978 Views
3 replies
1 kudos

01-15-2023 7:30:26 PM

View Replies

Latest Reply

Kaniz
Community Manager

01-16-2023 3:18:18 AM

1 kudos

Hi @Hui Hui Wong (Customer), We haven’t heard from you since the last response from @Daniel Sahal (Customer) , and I was checking back to see if his suggestions helped you.Or else, If you have any solution, please share it with the community, as...

1 kudos

01-16-2023 3:18:18 AM

2 More Replies

by SreedharVengala • New Contributor III

07-26-2021 6:55:55 PM

13501 Views
18 replies
9 kudos

PGP Encryption / Decryption in Databricks

Is there a way to Decrypt / Encrypt Blob files in Databricks using Key stored in Key Vault. What libraries need to be used? Any code snippets? Links?

Data Engineering

13501 Views
18 replies
9 kudos

07-26-2021 6:55:55 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-05-2022 9:48:09 AM

9 kudos

I am looking for similar requirements to explore various options to encrypt/decrypt the ADLS data using ADB pyspark. Please share list of options available.

9 kudos

07-05-2022 9:48:09 AM

17 More Replies

by 190809 • Contributor

12-07-2022 4:08:12 AM

438 Views
1 replies
1 kudos

What are the requirements in order for the event log to collect backlog metrics?

I am trying to use the event log to collect metrics on the 'flow_progess' under the 'event_type' field. In the the docs it suggests that this information may not be collected based on the data source and runtime used (see screenshot). Can anyone let ...

Data Engineering

438 Views
1 replies
1 kudos

12-07-2022 4:08:12 AM

View Replies

Latest Reply

User16539034020
Contributor II

01-28-2023 1:46:46 PM

1 kudos

Thanks for contacting Databricks Support! I understand that you're looking for information on unsupported data source types and runtimes for the backlog metrics. Unfortunately, we currently have not documented that information. It's possible that som...

1 kudos

01-28-2023 1:46:46 PM

by Ak3 • New Contributor III

12-31-2022 1:18:03 AM

1761 Views
5 replies
6 kudos

Databricks ADLS vs Azure Sql ? which is better for datawarehousing ? and why

Data Engineering

1761 Views
5 replies
6 kudos

12-31-2022 1:18:03 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-02-2023 7:35:14 AM

6 kudos

Databricks is the data lake / lakehouse and Azure SQL is the database.

6 kudos

01-02-2023 7:35:14 AM

4 More Replies

by hanish • New Contributor II

01-25-2023 1:22:35 AM

1415 Views
3 replies
2 kudos

Job cluster support in jobs/runs/submit API

We are using jobs/runs/submit API of databricks to create and trigger a one-time run with new_cluster and existing_cluster configuration. We would like to check if there is provision to pass "job_clusters" in this API to reuse the same cluster across...

Data Engineering

1415 Views
3 replies
2 kudos

01-25-2023 1:22:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-28-2023 6:37:19 AM

2 kudos

@Hanish Bansal Shared job cluster for jobs/runs/submit API is not supported at the moment.

2 kudos

01-28-2023 6:37:19 AM

2 More Replies

by horatiug • New Contributor III

12-06-2022 9:42:47 AM

1595 Views
5 replies
1 kudos

Databricks workspace with custom VPC using terraform in Google Cloud

I am working on Google Cloud and want to create Databricks workspace with custom VPC using terraform. Is that supported ? If yes is it similar to AWS way ?Thank youHoratiu

Data Engineering

1595 Views
5 replies
1 kudos

12-06-2022 9:42:47 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-28-2023 5:44:08 AM

1 kudos

Hi @horatiu guja GCP Workspace provisioning using Terraform is public preview now. Please refer to the below doc for the steps.https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/gcp-workspace

1 kudos

01-28-2023 5:44:08 AM

4 More Replies

by johnb1 • New Contributor III

01-27-2023 6:04:04 AM

2909 Views
4 replies
0 kudos

SELECT from table saved under path

Hi!I saved a dataframe as a delta table with the following syntax:(test_df .write .format("delta") .mode("overwrite") .save(output_path) )How can I issue a SELECT statement on the table?What do I need to insert into [table_name] below?SELECT ...

Data Engineering

2909 Views
4 replies
0 kudos

01-27-2023 6:04:04 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

01-27-2023 8:20:19 AM

0 kudos

Hi @John B there is two way to access your delta table-SELECT * FROM delta.`your_delta_table_path`df.write.format("delta").mode("overwrite").option("path", "your_path").saveAsTable("table_name")Now you can use your select query-SELECT * FROM [table_...

0 kudos

01-27-2023 8:20:19 AM

3 More Replies

by xiaochong • New Contributor III

09-07-2022 2:09:53 PM

528 Views
1 replies
2 kudos

Is Delta Live Tables planned to be open source in the future?

Data Engineering

528 Views
1 replies
2 kudos

09-07-2022 2:09:53 PM

View Replies

Latest Reply

Priyanka_Biswas
Valued Contributor

01-27-2023 2:05:25 PM

2 kudos

Hello there @G Z I would say "we have a history of open sourcing our biggest innovations but there's no concrete timeline for dlt. It's built on the open APIs of spark and delta, so the most important parts (your transformation logic and you data) ...

2 kudos

01-27-2023 2:05:25 PM

by joakon • New Contributor III

01-25-2023 12:25:39 PM

1443 Views
4 replies
3 kudos

Resolved! Databricks - Workflow- Jobs- Script to automate

Hi - I have created a Databricks job - under Workflow - its running fine without any issues . I would like to promote this job to other workspaces using a script.Is there a way to script the job definition and deploy it across multiple workspaces .I ...

Data Engineering

1443 Views
4 replies
3 kudos

01-25-2023 12:25:39 PM

View Replies

Latest Reply

joakon
New Contributor III

01-27-2023 10:04:57 AM

3 kudos

thank you @Landan George

3 kudos

01-27-2023 10:04:57 AM

3 More Replies

by Dbks_Community • New Contributor II

11-30-2022 12:38:17 PM

857 Views
2 replies
0 kudos

Cross region Databricks to SQL Connection

We are trying to connect Azure Databricks Cluster to Azure SQL database but the firewalls at SQL level is causing an issue.Whitelisting dbks subnet is not an option here as both the resources are in two different azure regions. Is there a secure way ...

Data Engineering

857 Views
2 replies
0 kudos

11-30-2022 12:38:17 PM

View Replies

Latest Reply

Cedric
Valued Contributor

01-27-2023 7:11:33 AM

0 kudos

Hi @Timir Ranjan,Have you tried looking into private endpoints? This allows you to expose your Azure SQL database from the Azure backbone and is cross-regional supported.https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-overviewP...

0 kudos

01-27-2023 7:11:33 AM

1 More Replies

by StevenW • New Contributor III

01-26-2023 6:20:27 AM

2560 Views
10 replies
0 kudos

Resolved! Large MERGE Statements - 500+ lines of code!

I'm new to databricks. (Not new to DB's - 10+ year DB Developer).How do you generate a MERGE statement in DataBricks? Trying to manually maintain a 500+ or 1000+ lines in a MERGE statement doesn't make much sense? Working with Large Tables of between...

Data Engineering

2560 Views
10 replies
0 kudos

01-26-2023 6:20:27 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-26-2023 8:40:17 AM

0 kudos

In my opinion, when possible MERGE statement should be on the primary key. If not possible you can create your own unique key (by concatenate some fields and eventually hashing them) and then use it in merge logic.

0 kudos

01-26-2023 8:40:17 AM

9 More Replies

by KVNARK • Honored Contributor II

01-27-2023 12:56:19 AM

1448 Views
5 replies
7 kudos

Resolved! SQL error while executing

any fixes to the error would be much appreciated

Data Engineering

1448 Views
5 replies
7 kudos

01-27-2023 12:56:19 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

01-27-2023 5:00:42 AM

7 kudos

Hi @KVNARK . Could you please send the query that you are executing, that will help me to debug the error.

7 kudos

01-27-2023 5:00:42 AM

4 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Can Databricks Certified Data Engineer Professional exam questions be short and easy to understand?

Why is writing to MSSQL Server 12.0 so slow directly from spark but nearly instant when I write to a csv and read it back

Resolved! Pyspark - How do I convert date/timestamp of format like /Date(1593786688000+0200)/ in pyspark?

Given user id, what API can we use to find out which cluster the user is connected to?

PGP Encryption / Decryption in Databricks

What are the requirements in order for the event log to collect backlog metrics?

Databricks ADLS vs Azure Sql ? which is better for datawarehousing ? and why

Job cluster support in jobs/runs/submit API

Databricks workspace with custom VPC using terraform in Google Cloud

SELECT from table saved under path

Is Delta Live Tables planned to be open source in the future?

Resolved! Databricks - Workflow- Jobs- Script to automate

Cross region Databricks to SQL Connection

Resolved! Large MERGE Statements - 500+ lines of code!

Resolved! SQL error while executing

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...