Data Engineering

Forum Posts

Sorted by:

by isaac_gritz • Valued Contributor II

08-23-2022 12:17:30 AM

7642 Views
6 replies
6 kudos

Local Development on Databricks

How to Develop Locally on Databricks with your Favorite IDEdbx is a Databricks Labs project that allows you to develop code locally and then submit against Databricks interactive and job compute clusters from your favorite local IDE (AWS | Azure | GC...

Data Engineering

7642 Views
6 replies
6 kudos

08-23-2022 12:17:30 AM

View Replies

Latest Reply

Jfoxyyc
Valued Contributor

01-04-2023 1:04:53 PM

6 kudos

I'm actually not a fan of dbx. I prefer the AWS Glue interactive sessions way of using the IDE. It's exactly like the web notebook experience. I can see the reason why dbx exists, but I'd still like to use a regular notebook experience in my IDE.

6 kudos

01-04-2023 1:04:53 PM

5 More Replies

by antoniosisba96 • New Contributor II

12-27-2022 9:04:51 AM

1351 Views
4 replies
4 kudos

Passed Data Engineer Associate Exam but received twice the Lakehouse Accreditation

Hi all,today (27/12/22 14:00 Rome Time Zone) I passed the Data Engineer Associate exam, but I received the badge of Lakehouse Fundamentals (second time). My email address is: sisbarra@gmail.comMy company address is: antonio.sisbarra@nttdata.comCan ...

Data Engineering

1351 Views
4 replies
4 kudos

12-27-2022 9:04:51 AM

View Replies

Latest Reply

Nadia1
Honored Contributor

01-04-2023 10:00:45 AM

4 kudos

Hello Antonio,I deleted the badge under: antonio.sisbarra@nttdata.com . You are good to go.Thanks!

4 kudos

01-04-2023 10:00:45 AM

3 More Replies

by Riddhi • New Contributor III

12-22-2022 4:13:52 AM

2874 Views
9 replies
14 kudos

Resolved! Databricks Lakehouse Fundamentals Accreditation V2 badge/certificate not received.

Hello, this is regarding Databricks Lakehouse Fundamentals Accreditation V2. I haven't received my badge/certificate. I also raised a ticket but haven't received any response. My request id is #00248504.. Kindly help me out with this.

Databricks Lakehouse Exam Score Screenshot 1

Data Engineering

2874 Views
9 replies
14 kudos

12-22-2022 4:13:52 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-27-2022 8:10:30 PM

14 kudos

@Jose Gonzalez ,@Vidula Khanna ,I also did not received my badge in portalThis is my portal link- https://credentials.databricks.com/profile/aviralbhardwaj143185/walletMy case number is - 00250939Also, my points are not updating in the reward port...

14 kudos

12-27-2022 8:10:30 PM

8 More Replies

by debanjan89 • New Contributor II

08-25-2022 12:03:55 AM

1504 Views
3 replies
2 kudos

How do we concatenate some fixed string with a secret value in Spark Config in Databricks Job Cluster?

Hi Team,I am trying to configure access to adls through Service Principal through Spark Config in Databricks job cluster. like, fs.azure.account.oauth2.client.id.<adls_account_name>.dfs.core.windows.net {{secrets/scopeName/clientID}}The above stateme...

Data Engineering

1504 Views
3 replies
2 kudos

08-25-2022 12:03:55 AM

View Replies

Latest Reply

Manimkm08
New Contributor III

01-04-2023 5:16:00 AM

2 kudos

@Kaniz Fatma We are blocked on this issue. Can you please look into the thread and give your suggestion to workaround it.

2 kudos

01-04-2023 5:16:00 AM

2 More Replies

by Mado • Valued Contributor II

12-15-2022 3:02:41 AM

9971 Views
1 replies
0 kudos

Resolved! How to show all rows by "DataFrame.show()"?

Hi,DataFrame.show() has a parameter n to set "Number of rows to show".Is there any way to show all rows?

Data Engineering

9971 Views
1 replies
0 kudos

12-15-2022 3:02:41 AM

View Replies

Latest Reply

sher
Valued Contributor II

01-03-2023 8:43:29 PM

0 kudos

Hi Medothis method will work fine df.show(df.count())

0 kudos

01-03-2023 8:43:29 PM

by Jyo777 • Contributor

12-28-2022 9:10:58 AM

935 Views
2 replies
3 kudos

Resolved! Can't do "Full screen" while taking Databricks Apache Spark developer course.

Hi, I see the option for "Full screen" on bottom right but its disabled/inactive. Attached is the screenshot for same.Please advise as its hard to read or see contents on half screen.Thanks

Data Engineering

935 Views
2 replies
3 kudos

12-28-2022 9:10:58 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-28-2022 11:32:45 PM

3 kudos

press F11 button it will become full screen

3 kudos

12-28-2022 11:32:45 PM

1 More Replies

by semi • New Contributor II

12-21-2022 11:06:52 AM

1075 Views
3 replies
3 kudos

Access file location problem

import pandas as pd from apiclient.discovery import build from oauth2client.service_account import ServiceAccountCredentials df = spark.read.json("/FileStore/tables/cert.json") SCOPES = ['https://www.googleapis.com/auth/analytics.readonly'] KEY_FIL...

Data Engineering

1075 Views
3 replies
3 kudos

12-21-2022 11:06:52 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-22-2022 5:36:07 AM

3 kudos

Looks like it is because the oauth2client.service_account does not know about DBFS (whereas spark does).Is it an option to manage your secrets in databricks? https://docs.databricks.com/security/secrets/secrets.html

3 kudos

12-22-2022 5:36:07 AM

2 More Replies

by Spauk • New Contributor II

01-03-2023 5:38:28 AM

7884 Views
5 replies
7 kudos

Resolved! Best Practices for naming Tables and Databases in Databricks

We moved in Databricks since few months from now, and before that we were in SQL Server.So, all our tables and databases follow the "camel case" rule.Apparently, in Databricks the rule is "lower case with underscore".Where can we find an official doc...

Data Engineering

7884 Views
5 replies
7 kudos

01-03-2023 5:38:28 AM

View Replies

Latest Reply

LandanG
Honored Contributor

01-03-2023 7:09:24 AM

7 kudos

Hi @Salah KHALFALLAH , looking at the documentation it appears that Databricks' preferred naming convention is lowercase and underscores as you mentioned.The reason for this is most likely because Databricks uses Hive Metastore, which is case insens...

7 kudos

01-03-2023 7:09:24 AM

4 More Replies

by jonathan-dufaul • Valued Contributor

12-30-2022 10:56:02 AM

780 Views
3 replies
3 kudos

Resolved! Why does chaining spark.read from one system/driver and .write to another system/driver take so much longer than doing each piece individually?

i am reading data from IBM DB2 and saving into a MS SQL server (the first step is moving the code itself to databricks, and then we will move the databases to databricks itself). Problem I'm running into is doing something like the below will take > ...

Data Engineering

780 Views
3 replies
3 kudos

12-30-2022 10:56:02 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-02-2023 7:56:11 AM

3 kudos

Hi, it is related to partitioning optimization. By default, the JDBC driver queries the source database with only a single thread. So write was from one partition as one partition was created, so it was using a single core. When you used pandas, it d...

3 kudos

01-02-2023 7:56:11 AM

2 More Replies

by J15S • New Contributor III

12-21-2022 6:49:08 AM

820 Views
4 replies
4 kudos

RStudio on Databricks user experience

Is anybody actually using the RStudio app integration on Databricks? I'm surprised to find so little discussion in this forum. My team has been using it for about 3 months and it seems under-developed.1) No automated backup, you have to do it yoursel...

Data Engineering

820 Views
4 replies
4 kudos

12-21-2022 6:49:08 AM

View Replies

Latest Reply

J15S
New Contributor III

01-03-2023 5:17:58 AM

4 kudos

@Jonathan Dufault Thanks for the response, and glad I'm not alone. My problem (and this is probably just a preference thing) is that the 'reward' of using a full-fledged IDE is huge, compared to bouncing between notebooks in multiple tabs. The integ...

4 kudos

01-03-2023 5:17:58 AM

3 More Replies

by Prototype998 • New Contributor III

01-03-2023 4:06:06 AM

526 Views
0 replies
0 kudos

Singleton Design Principle for pyspark database connector A singleton is a design pattern that ensures that a class has only one instance, and provide...

Singleton Design Principle for pyspark database connectorA singleton is a design pattern that ensures that a class has only one instance, and provides a global access point to that instance. Here is an example of how you could implement a singleton d...

Data Engineering

526 Views
0 replies
0 kudos

01-03-2023 4:06:06 AM

by Jfoxyyc • Valued Contributor

12-28-2022 6:23:20 PM

899 Views
2 replies
2 kudos

How to use partial_parse.msgpack with workflow dbt task?

I'm looking for direction on how to get the dbt task in workflows to use the partial_parse.msgpack file to skip parsing files that haven't changed. I'm downloading my artifacts after each run and the partial_parse file is being saved back to adls.Wha...

Data Engineering

899 Views
2 replies
2 kudos

12-28-2022 6:23:20 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

01-02-2023 10:39:11 AM

2 kudos

Hi, Could you please confirm what will be your expectation and the used case? Do you want the file to be saved somewhere else?

2 kudos

01-02-2023 10:39:11 AM

1 More Replies

by KVNARK • Honored Contributor II

01-02-2023 9:13:01 PM

1647 Views
4 replies
6 kudos

Resolved! Connecting azure synapse through data bricks note books

Hi All, Happy new year!how can we connect to azure synapse serverless sql pool through databricks notebooks and execute DDLs

Data Engineering

1647 Views
4 replies
6 kudos

01-02-2023 9:13:01 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

01-02-2023 10:41:19 PM

6 kudos

@KVNARK . https://joeho.xyz/blog-posts/how-to-connect-to-azure-synapse-in-azure-databricks/

6 kudos

01-02-2023 10:41:19 PM

3 More Replies

by APol • New Contributor II

09-08-2022 8:16:22 AM

1220 Views
2 replies
2 kudos

Read/Write concurrency issue

Hi. I assume that it can be concurrency issue. (a Read thread from Databricks and a Write thread from another system)From the start:I read 12-16 csv files (approximately 250Mb each of them) to dataframe. df = spark.read.option("header", "False").opti...

Data Engineering

1220 Views
2 replies
2 kudos

09-08-2022 8:16:22 AM

View Replies

Latest Reply

FerArribas
Contributor

01-02-2023 2:02:49 PM

2 kudos

Hi @Anastasiia Polianska,I agree, it looks like a concurrency issue. Very possibly this concurrency problem will be caused by an erroneous ETAG in the HTTP call to the Azure Storage API (https://azure.microsoft.com/de-de/blog/managing-concurrency-in...

2 kudos

01-02-2023 2:02:49 PM

1 More Replies

by maddy_081063 • New Contributor II

12-12-2022 10:53:26 AM

2461 Views
2 replies
4 kudos

Is there a way to automate the Azure Databricks dashboard to schedule and send an email output with the dashboard?

Data Engineering

2461 Views
2 replies
4 kudos

12-12-2022 10:53:26 AM

View Replies

Latest Reply

FerArribas
Contributor

01-02-2023 1:09:41 PM

4 kudos

Hi @maddy v ,I recommend that you use the Databricks SQL module for this type of reports and email alerts. It is a very interesting module with multiple options for your use case.https://learn.microsoft.com/en-us/azure/databricks/sql/user/dashboards...

4 kudos

01-02-2023 1:09:41 PM

1 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Local Development on Databricks

Passed Data Engineer Associate Exam but received twice the Lakehouse Accreditation

Resolved! Databricks Lakehouse Fundamentals Accreditation V2 badge/certificate not received.

How do we concatenate some fixed string with a secret value in Spark Config in Databricks Job Cluster?

Resolved! How to show all rows by "DataFrame.show()"?

Resolved! Can't do "Full screen" while taking Databricks Apache Spark developer course.

Access file location problem

Resolved! Best Practices for naming Tables and Databases in Databricks

Resolved! Why does chaining spark.read from one system/driver and .write to another system/driver take so much longer than doing each piece individually?

RStudio on Databricks user experience

Singleton Design Principle for pyspark database connector A singleton is a design pattern that ensures that a class has only one instance, and provide...

How to use partial_parse.msgpack with workflow dbt task?

Resolved! Connecting azure synapse through data bricks note books

Read/Write concurrency issue

Is there a way to automate the Azure Databricks dashboard to schedule and send an email output with the dashboard?

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...