Data Engineering

Forum Posts

Sorted by:

by weldermartins • Honored Contributor

11-23-2022 9:41:29 AM

23273 Views
7 replies
35 kudos

Resolved! pyspark - regexp_extract

hello everyone, I'm creating a regex expression to fetch only the value of a string, but some values are negative. I am not able to create the rule to compose the negative value. can you help me?from pyspark.sql.functions import regexp_extract fro...

Data Engineering

23273 Views
7 replies
35 kudos

11-23-2022 9:41:29 AM

View Replies

Latest Reply

ErinArmistead
New Contributor II

03-16-2023 5:50:31 AM

35 kudos

Have you found the answer? If you are a student in college or school searching for free essay examples online, you may want to visit the website https://writinguniverse.com/free-essay-examples/soccer/ here you will find a vast collection of free essa...

35 kudos

03-16-2023 5:50:31 AM

6 More Replies

by Baumeister • New Contributor II

03-10-2023 6:00:09 AM

7469 Views
2 replies
0 kudos

Error when importing .dbc of a complete Workspace

I saved the content of an older Databricks Workspace by clicking on the Dropdown next to Workspace -> Export -> DBC Archive and saved it on my local machine.In a new Databricks Workspace, I now want to import That .DBC archive to restore the previous...

Data Engineering

7469 Views
2 replies
0 kudos

03-10-2023 6:00:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-14-2023 2:01:16 AM

0 kudos

@Sebastian K :It looks like the error you are facing while importing the DBC archive could be due to the version incompatibility between the Databricks instance where you created the DBC archive and the one where you are trying to import it. Can you...

0 kudos

03-14-2023 2:01:16 AM

1 More Replies

by bluesky111 • New Contributor II

03-14-2023 2:49:49 AM

2812 Views
1 replies
3 kudos

Resolved! I Input the wrong schedule time for the exams can it be reschedule ?

Helo today ,i think i was scheduled to do an exams at 2.15 PM but unfortunately i made a mistake put the time to 2.15 AM, could it be rescheduled? i already submit a ticket to https://help.databricks.com/s/contact-us?ReqType=training but no reply yet...

Data Engineering

2812 Views
1 replies
3 kudos

03-14-2023 2:49:49 AM

View Replies

Latest Reply

APadmanabhan
Databricks Employee

03-16-2023 4:53:00 AM

3 kudos

Hello @heron halim, If the exam time and date have already passed, we cannot help in the situation; we can only change the time/date of the exam if we are notified a minimum of 30 hours before the exam date/time. Test-takers must ensure they check t...

3 kudos

03-16-2023 4:53:00 AM

by Harun • Honored Contributor

03-15-2023 8:55:28 AM

6444 Views
5 replies
6 kudos

how to load structured stream data into delta table whose location is in ADLS Gen2

Hi All,I am working on a streaming data processing. As a intial step i have read the data from azure eventhub using readstream. now i want to writestream this into a delta table. My requirement is, The data should present in external location (adls g...

Data Engineering

6444 Views
5 replies
6 kudos

03-15-2023 8:55:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-15-2023 9:29:52 AM

6 kudos

There are a couple ways to connect to ADLS Gen2. Please refer to below doc. For instance, if you decide to go by service principal method, you need to add below storage account configurations details to the cluster or notebooks. Same goes for storag...

6 kudos

03-15-2023 9:29:52 AM

4 More Replies

by KVNARK • Honored Contributor II

02-08-2023 10:08:33 PM

4472 Views
1 replies
4 kudos

Resolved! Deploying global parameters from lower to higher env in ADF

how can we deploy global parameters from dev to higher environments in ADF. Could anyone throw some light on this.I'm using GIT in DEV and deploying it to PROD using Azure CICD pipeline.

Data Engineering

4472 Views
1 replies
4 kudos

02-08-2023 10:08:33 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-16-2023 1:16:51 AM

4 kudos

@KVNARK . : To deploy global parameters from dev to higher environments in Azure Data Factory (ADF), you can follow these steps:In your DEV environment, create the global parameters in ADF and save them.Commit and push the changes to your Git reposi...

4 kudos

03-16-2023 1:16:51 AM

by alvaro_databric • New Contributor III

03-08-2023 12:13:40 AM

3476 Views
1 replies
2 kudos

Resolved! Fastest Azure VM for Databricks Big Data workload

Hi All,It is well known that Azure provides a wide variety of VM for Databricks, some of which provide powerful features such as Photon and Delta Caching. I would like to ask the community which do you think is the fastests cluster for performing Big...

Data Engineering

3476 Views
1 replies
2 kudos

03-08-2023 12:13:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-16-2023 12:56:30 AM

2 kudos

@Alvaro Moure :The performance of a Databricks cluster for big data operations depends on many factors, such as the amount and structure of the data, the nature of the operations being performed, the configuration of the cluster, and the specific re...

2 kudos

03-16-2023 12:56:30 AM

by Sujitha • Databricks Employee

03-16-2023 12:54:30 AM

7927 Views
0 replies
2 kudos

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week. Databricks platform release notesMarch 13 -...

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week.Databricks platform release notesMarch 13 - 17, 2023Execute SQL cells in the notebook in parallelYou can now run SQL cells in Databricks noteboo...

Data Engineering

7927 Views
0 replies
2 kudos

03-16-2023 12:54:30 AM

by Arunsundar • New Contributor III

03-12-2023 9:16:49 PM

4488 Views
4 replies
4 kudos

The possibility of finding the workload dynamically and spin up the cluster based on the workload

Hi Team,Good morning. I would like to understand if there is a possibility to determine the workload automatically through code (data load from a file to a table, determine the file size, kind of a benchmark that we can check), based on which we can ...

Data Engineering

4488 Views
4 replies
4 kudos

03-12-2023 9:16:49 PM

View Replies

Latest Reply

pvignesh92
Honored Contributor

03-13-2023 10:40:13 AM

4 kudos

Hi @Arunsundar Muthumanickam , When you say workload, I believe you might be handling various volumes of data between Dev and Prod environment. If you are using Databricks cluster and do not have much idea on how the volumes might turn out in differ...

4 kudos

03-13-2023 10:40:13 AM

3 More Replies

by DipakBachhav • New Contributor III

10-03-2022 7:25:48 AM

17252 Views
3 replies
3 kudos

Resolved! Geting error Caused by: com.databricks.NotebookExecutionException: FAILED

I am trying to run the below notebook through databricks but getting the below error. I have tried to update the notebook timeout and the retry mechanism but still no luck yet. NotebookData("/Users/mynotebook",9900, retry=3) ] res = parallelNot...

Data Engineering

17252 Views
3 replies
3 kudos

10-03-2022 7:25:48 AM

View Replies

Latest Reply

sujai_sparks
New Contributor III

11-28-2022 10:47:09 AM

3 kudos

Hi @Dipak Bachhav, not sure if you have fixed the issue, but here are few things you can check: Is the path "/Users/mynotebook" correct? Maybe you are missing the dot in the beginning.Run the notebook using dbutils.notebook.run("/Users/mynotebook") ...

3 kudos

11-28-2022 10:47:09 AM

2 More Replies

by ossinova • Contributor II

03-15-2023 6:23:15 AM

2503 Views
1 replies
2 kudos

PIVOT on month and quarter

I want to simplify this query:SELECT year(EntryDate) Year, AccountNumber, sum(CreditBase - DebitBase) FILTER(WHERE month(EntryDate) = 1) AS jan_total, sum(CreditBase - DebitBase) FILTER(WHERE month(EntryDate) = 2) AS feb_total, sum(CreditBase - Debi...

Data Engineering

2503 Views
1 replies
2 kudos

03-15-2023 6:23:15 AM

View Replies

Latest Reply

Lakshay
Databricks Employee

03-15-2023 9:53:59 AM

2 kudos

Hi @Oscar Dyremyhr , PIVOT doesn't support two FOR clauses. You can PIVOT either on month or on quarter.https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-pivot.html

2 kudos

03-15-2023 9:53:59 AM

by Dale_Ware • New Contributor III

03-14-2023 11:31:35 AM

5481 Views
2 replies
3 kudos

Resolved! How to query a table with backslashes in the name.

I am trying to query a snowflake table from a databricks data frame similar to the following example.sql_query = "select * from Database.Schema.Table_/Name_/V"sqlContext.sql(f"{sql_query}" ) And I get an error like this.ParseException: [PARSE_SYNTAX_...

Data Engineering

5481 Views
2 replies
3 kudos

03-14-2023 11:31:35 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

03-14-2023 9:20:17 PM

3 kudos

You can use Double Quotes to get the plan. Using quotes it is important to write the table names in capital letters.SELECT * FROM "/TABLE/NAME"

3 kudos

03-14-2023 9:20:17 PM

1 More Replies

by ramankr48 • Contributor II

10-18-2022 4:08:43 AM

21351 Views
5 replies
8 kudos

Resolved! How to get all the tables name with a specific column or columns in a database?

let's say there is a database db in which 700 tables are there, and we need to find all the tables name in which column "project_id" is present.just an example for ubderstanding the questions.

Data Engineering

21351 Views
5 replies
8 kudos

10-18-2022 4:08:43 AM

View Replies

Latest Reply

Anonymous
Not applicable

10-18-2022 4:53:00 AM

8 kudos

databaseName = "db" desiredColumn = "project_id" database = spark.sql(f"show tables in {databaseName} ").collect() tablenames = [] for row in database: cols = spark.table(row.tableName).columns if desiredColumn in cols: tablenames.append(row....

8 kudos

10-18-2022 4:53:00 AM

4 More Replies

by William_Scardua • Valued Contributor

03-13-2023 5:54:17 PM

3610 Views
3 replies
1 kudos

Resolved! Upsert When the Origin NOT Exists, but you need to change status in the target

Hi guys,I have a question about upsert/merge ... What do you do when que origin NOT exists, but you need to change status in the targetFor exemple:01/03 : source dataset [ id =1 and status = Active] ; target table [*not exists*] >> in this time the ...

Data Engineering

3610 Views
3 replies
1 kudos

03-13-2023 5:54:17 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

03-14-2023 5:23:23 AM

1 kudos

Hello @William Scardua , Just adding to what @Vigneshraja Palaniraj replied.Reference: https://docs.databricks.com/sql/language-manual/delta-merge-into.htmlThanks & Regards,Nandini

1 kudos

03-14-2023 5:23:23 AM

2 More Replies

by Ovi • New Contributor III

03-14-2023 3:48:02 AM

6557 Views
5 replies
3 kudos

Resolved! Filter only Delta tables from an S3 folders list

Hello everyone,From a list of folders on s3, how can I filter which ones are Delta tables, without trying to read each one at a time?Thanks,Ovi

Data Engineering

6557 Views
5 replies
3 kudos

03-14-2023 3:48:02 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

03-14-2023 4:13:52 AM

3 kudos

Hello @Ovidiu Eremia ,To filter which folders on S3 contain Delta tables, you can look for the specific files that are associated with Delta tables. Delta Lake stores its metadata in a hidden folder named _delta_log, which is located at the root of ...

3 kudos

03-14-2023 4:13:52 AM

4 More Replies

by Dataengineer_mm • New Contributor

03-13-2023 5:04:23 PM

3902 Views
1 replies
1 kudos

Surrogate key using identity column.

I want to create a surrogate in the delta table And i used the identity column id-Generated as DefaultCan i insert rows into the delta table using only spark.sql like Insert query ? or i can also use write delta format options? If i use the df.write ...

Data Engineering

3902 Views
1 replies
1 kudos

03-13-2023 5:04:23 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

03-14-2023 5:14:08 AM

1 kudos

Hello @Menaka Murugesan ,If you are using the identity column, I believe you would have created the table as below, (starts with value 1 and step 1)CREATE TABLE my_table ( id INT IDENTITY (1, 1) PRIMARY KEY, value STRING )You can insert values i...

1 kudos

03-14-2023 5:14:08 AM

Databricks Community

Forum Posts

Resolved! pyspark - regexp_extract

Error when importing .dbc of a complete Workspace

Resolved! I Input the wrong schedule time for the exams can it be reschedule ?

how to load structured stream data into delta table whose location is in ADLS Gen2

Resolved! Deploying global parameters from lower to higher env in ADF

Resolved! Fastest Azure VM for Databricks Big Data workload

Weekly Release Notes RecapHere’s a quick recap of the latest release notes updates from the past one week. Databricks platform release notesMarch 13 -...

The possibility of finding the workload dynamically and spin up the cluster based on the workload

Resolved! Geting error Caused by: com.databricks.NotebookExecutionException: FAILED

PIVOT on month and quarter

Resolved! How to query a table with backslashes in the name.

Resolved! How to get all the tables name with a specific column or columns in a database?

Resolved! Upsert When the Origin NOT Exists, but you need to change status in the target

Resolved! Filter only Delta tables from an S3 folders list

Surrogate key using identity column.

Join Us as a Local Community Builder!

SkipChangeCommit to True Scenario on Data Loss Pos...

Can we Change the ownership of Databricks Managed ...

how to know which join type was used (broadcast, s...

Using a cluster of type SINGLE_USER to run paralle...

Impact of Updating DAB Root Path on Databricks Job...