Data Engineering

Forum Posts

Sorted by:

by antoniosisba96 • New Contributor II

12-27-2022 9:04:51 AM

1313 Views
4 replies
4 kudos

Passed Data Engineer Associate Exam but received twice the Lakehouse Accreditation

Hi all,today (27/12/22 14:00 Rome Time Zone) I passed the Data Engineer Associate exam, but I received the badge of Lakehouse Fundamentals (second time). My email address is: sisbarra@gmail.comMy company address is: antonio.sisbarra@nttdata.comCan ...

Data Engineering

1313 Views
4 replies
4 kudos

12-27-2022 9:04:51 AM

View Replies

Latest Reply

Nadia1
Honored Contributor

01-04-2023 10:00:45 AM

4 kudos

Hello Antonio,I deleted the badge under: antonio.sisbarra@nttdata.com . You are good to go.Thanks!

4 kudos

01-04-2023 10:00:45 AM

3 More Replies

by Riddhi • New Contributor III

12-22-2022 4:13:52 AM

2810 Views
9 replies
14 kudos

Resolved! Databricks Lakehouse Fundamentals Accreditation V2 badge/certificate not received.

Hello, this is regarding Databricks Lakehouse Fundamentals Accreditation V2. I haven't received my badge/certificate. I also raised a ticket but haven't received any response. My request id is #00248504.. Kindly help me out with this.

Databricks Lakehouse Exam Score Screenshot 1

Data Engineering

2810 Views
9 replies
14 kudos

12-22-2022 4:13:52 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-27-2022 8:10:30 PM

14 kudos

@Jose Gonzalez ,@Vidula Khanna ,I also did not received my badge in portalThis is my portal link- https://credentials.databricks.com/profile/aviralbhardwaj143185/walletMy case number is - 00250939Also, my points are not updating in the reward port...

14 kudos

12-27-2022 8:10:30 PM

8 More Replies

by debanjan89 • New Contributor II

08-25-2022 12:03:55 AM

1470 Views
3 replies
2 kudos

How do we concatenate some fixed string with a secret value in Spark Config in Databricks Job Cluster?

Hi Team,I am trying to configure access to adls through Service Principal through Spark Config in Databricks job cluster. like, fs.azure.account.oauth2.client.id.<adls_account_name>.dfs.core.windows.net {{secrets/scopeName/clientID}}The above stateme...

Data Engineering

1470 Views
3 replies
2 kudos

08-25-2022 12:03:55 AM

View Replies

Latest Reply

Manimkm08
New Contributor III

01-04-2023 5:16:00 AM

2 kudos

@Kaniz Fatma We are blocked on this issue. Can you please look into the thread and give your suggestion to workaround it.

2 kudos

01-04-2023 5:16:00 AM

2 More Replies

by Mado • Valued Contributor II

12-15-2022 3:02:41 AM

9733 Views
1 replies
0 kudos

Resolved! How to show all rows by "DataFrame.show()"?

Hi,DataFrame.show() has a parameter n to set "Number of rows to show".Is there any way to show all rows?

Data Engineering

9733 Views
1 replies
0 kudos

12-15-2022 3:02:41 AM

View Replies

Latest Reply

sher
Valued Contributor II

01-03-2023 8:43:29 PM

0 kudos

Hi Medothis method will work fine df.show(df.count())

0 kudos

01-03-2023 8:43:29 PM

by Jyo777 • Contributor

12-28-2022 9:10:58 AM

915 Views
2 replies
3 kudos

Resolved! Can't do "Full screen" while taking Databricks Apache Spark developer course.

Hi, I see the option for "Full screen" on bottom right but its disabled/inactive. Attached is the screenshot for same.Please advise as its hard to read or see contents on half screen.Thanks

Data Engineering

915 Views
2 replies
3 kudos

12-28-2022 9:10:58 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-28-2022 11:32:45 PM

3 kudos

press F11 button it will become full screen

3 kudos

12-28-2022 11:32:45 PM

1 More Replies

by semi • New Contributor II

12-21-2022 11:06:52 AM

1052 Views
3 replies
3 kudos

Access file location problem

import pandas as pd from apiclient.discovery import build from oauth2client.service_account import ServiceAccountCredentials df = spark.read.json("/FileStore/tables/cert.json") SCOPES = ['https://www.googleapis.com/auth/analytics.readonly'] KEY_FIL...

Data Engineering

1052 Views
3 replies
3 kudos

12-21-2022 11:06:52 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-22-2022 5:36:07 AM

3 kudos

Looks like it is because the oauth2client.service_account does not know about DBFS (whereas spark does).Is it an option to manage your secrets in databricks? https://docs.databricks.com/security/secrets/secrets.html

3 kudos

12-22-2022 5:36:07 AM

2 More Replies

by Spauk • New Contributor II

01-03-2023 5:38:28 AM

7761 Views
5 replies
7 kudos

Resolved! Best Practices for naming Tables and Databases in Databricks

We moved in Databricks since few months from now, and before that we were in SQL Server.So, all our tables and databases follow the "camel case" rule.Apparently, in Databricks the rule is "lower case with underscore".Where can we find an official doc...

Data Engineering

7761 Views
5 replies
7 kudos

01-03-2023 5:38:28 AM

View Replies

Latest Reply

LandanG
Honored Contributor

01-03-2023 7:09:24 AM

7 kudos

Hi @Salah KHALFALLAH , looking at the documentation it appears that Databricks' preferred naming convention is lowercase and underscores as you mentioned.The reason for this is most likely because Databricks uses Hive Metastore, which is case insens...

7 kudos

01-03-2023 7:09:24 AM

4 More Replies

by jonathan-dufaul • Valued Contributor

12-30-2022 10:56:02 AM

751 Views
3 replies
3 kudos

Resolved! Why does chaining spark.read from one system/driver and .write to another system/driver take so much longer than doing each piece individually?

i am reading data from IBM DB2 and saving into a MS SQL server (the first step is moving the code itself to databricks, and then we will move the databases to databricks itself). Problem I'm running into is doing something like the below will take > ...

Data Engineering

751 Views
3 replies
3 kudos

12-30-2022 10:56:02 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-02-2023 7:56:11 AM

3 kudos

Hi, it is related to partitioning optimization. By default, the JDBC driver queries the source database with only a single thread. So write was from one partition as one partition was created, so it was using a single core. When you used pandas, it d...

3 kudos

01-02-2023 7:56:11 AM

2 More Replies

by J15S • New Contributor III

12-21-2022 6:49:08 AM

799 Views
4 replies
4 kudos

RStudio on Databricks user experience

Is anybody actually using the RStudio app integration on Databricks? I'm surprised to find so little discussion in this forum. My team has been using it for about 3 months and it seems under-developed.1) No automated backup, you have to do it yoursel...

Data Engineering

799 Views
4 replies
4 kudos

12-21-2022 6:49:08 AM

View Replies

Latest Reply

J15S
New Contributor III

01-03-2023 5:17:58 AM

4 kudos

@Jonathan Dufault Thanks for the response, and glad I'm not alone. My problem (and this is probably just a preference thing) is that the 'reward' of using a full-fledged IDE is huge, compared to bouncing between notebooks in multiple tabs. The integ...

4 kudos

01-03-2023 5:17:58 AM

3 More Replies

by Prototype998 • New Contributor III

01-03-2023 4:06:06 AM

519 Views
0 replies
0 kudos

Singleton Design Principle for pyspark database connector A singleton is a design pattern that ensures that a class has only one instance, and provide...

Singleton Design Principle for pyspark database connectorA singleton is a design pattern that ensures that a class has only one instance, and provides a global access point to that instance. Here is an example of how you could implement a singleton d...

Data Engineering

519 Views
0 replies
0 kudos

01-03-2023 4:06:06 AM

by Jfoxyyc • Valued Contributor

12-28-2022 6:23:20 PM

883 Views
2 replies
2 kudos

How to use partial_parse.msgpack with workflow dbt task?

I'm looking for direction on how to get the dbt task in workflows to use the partial_parse.msgpack file to skip parsing files that haven't changed. I'm downloading my artifacts after each run and the partial_parse file is being saved back to adls.Wha...

Data Engineering

883 Views
2 replies
2 kudos

12-28-2022 6:23:20 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

01-02-2023 10:39:11 AM

2 kudos

Hi, Could you please confirm what will be your expectation and the used case? Do you want the file to be saved somewhere else?

2 kudos

01-02-2023 10:39:11 AM

1 More Replies

by KVNARK • Honored Contributor II

01-02-2023 9:13:01 PM

1583 Views
4 replies
6 kudos

Resolved! Connecting azure synapse through data bricks note books

Hi All, Happy new year!how can we connect to azure synapse serverless sql pool through databricks notebooks and execute DDLs

Data Engineering

1583 Views
4 replies
6 kudos

01-02-2023 9:13:01 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

01-02-2023 10:41:19 PM

6 kudos

@KVNARK . https://joeho.xyz/blog-posts/how-to-connect-to-azure-synapse-in-azure-databricks/

6 kudos

01-02-2023 10:41:19 PM

3 More Replies

by APol • New Contributor II

09-08-2022 8:16:22 AM

1192 Views
2 replies
2 kudos

Read/Write concurrency issue

Hi. I assume that it can be concurrency issue. (a Read thread from Databricks and a Write thread from another system)From the start:I read 12-16 csv files (approximately 250Mb each of them) to dataframe. df = spark.read.option("header", "False").opti...

Data Engineering

1192 Views
2 replies
2 kudos

09-08-2022 8:16:22 AM

View Replies

Latest Reply

FerArribas
Contributor

01-02-2023 2:02:49 PM

2 kudos

Hi @Anastasiia Polianska,I agree, it looks like a concurrency issue. Very possibly this concurrency problem will be caused by an erroneous ETAG in the HTTP call to the Azure Storage API (https://azure.microsoft.com/de-de/blog/managing-concurrency-in...

2 kudos

01-02-2023 2:02:49 PM

1 More Replies

by maddy_081063 • New Contributor II

12-12-2022 10:53:26 AM

2395 Views
2 replies
4 kudos

Is there a way to automate the Azure Databricks dashboard to schedule and send an email output with the dashboard?

Data Engineering

2395 Views
2 replies
4 kudos

12-12-2022 10:53:26 AM

View Replies

Latest Reply

FerArribas
Contributor

01-02-2023 1:09:41 PM

4 kudos

Hi @maddy v ,I recommend that you use the Databricks SQL module for this type of reports and email alerts. It is a very interesting module with multiple options for your use case.https://learn.microsoft.com/en-us/azure/databricks/sql/user/dashboards...

4 kudos

01-02-2023 1:09:41 PM

1 More Replies

by pvm26042000 • New Contributor III

12-26-2022 7:17:16 PM

454 Views
1 replies
3 kudos

Spark SQL & Spark ML

I am using Spark SQL to import their data into a machine learning pipeline. Once data is imported I want performs machine learning tasks using Spark ML. So I should use what compute tools is best suited for this use case? Please help me!!! Thank you ...

Data Engineering

454 Views
1 replies
3 kudos

12-26-2022 7:17:16 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

01-02-2023 1:03:11 PM

3 kudos

Hi, please refer https://docs.databricks.com/machine-learning/index.html, please let us know if this helps.

3 kudos

01-02-2023 1:03:11 PM

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Passed Data Engineer Associate Exam but received twice the Lakehouse Accreditation

Resolved! Databricks Lakehouse Fundamentals Accreditation V2 badge/certificate not received.

How do we concatenate some fixed string with a secret value in Spark Config in Databricks Job Cluster?

Resolved! How to show all rows by "DataFrame.show()"?

Resolved! Can't do "Full screen" while taking Databricks Apache Spark developer course.

Access file location problem

Resolved! Best Practices for naming Tables and Databases in Databricks

Resolved! Why does chaining spark.read from one system/driver and .write to another system/driver take so much longer than doing each piece individually?

RStudio on Databricks user experience

Singleton Design Principle for pyspark database connector A singleton is a design pattern that ensures that a class has only one instance, and provide...

How to use partial_parse.msgpack with workflow dbt task?

Resolved! Connecting azure synapse through data bricks note books

Read/Write concurrency issue

Is there a way to automate the Azure Databricks dashboard to schedule and send an email output with the dashboard?

Spark SQL & Spark ML

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...