Topics with Label: Apache spark

Forum Posts

Sorted by:

by Sujitha • Community Manager

12-14-2022 9:16:20 AM

733 Views
1 replies
4 kudos

Documentation Update Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, and data enginee...

Documentation Update Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, and data engineers working in the Databricks Data Science & Engineering, Databricks Machine Learning, and Databricks ...

Data Engineering

733 Views
1 replies
4 kudos

12-14-2022 9:16:20 AM

View Replies

Latest Reply

Harun
Honored Contributor

12-15-2022 6:04:11 AM

4 kudos

Thanks for sharing @Sujitha Ramamoorthy

4 kudos

12-15-2022 6:04:11 AM

by gpzz • New Contributor II

12-12-2022 9:24:57 PM

1193 Views
1 replies
3 kudos

pyspark code error

rdd4 = rdd3.reducByKey(lambda x,y: x+y)AttributeError: 'PipelinedRDD' object has no attribute 'reducByKey'Pls help me out with this

Data Engineering

1193 Views
1 replies
3 kudos

12-12-2022 9:24:57 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-12-2022 10:44:40 PM

3 kudos

Is it a typo or are you really using reducByKey instead of reduceByKey ?

3 kudos

12-12-2022 10:44:40 PM

by Sujitha • Community Manager

12-09-2022 12:20:05 AM

952 Views
6 replies
5 kudos

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers ...

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.Thes...

Data Engineering

952 Views
6 replies
5 kudos

12-09-2022 12:20:05 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-09-2022 1:03:07 AM

5 kudos

Thanks for sharing @Sujitha Ramamoorthy

5 kudos

12-09-2022 1:03:07 AM

5 More Replies

by alhuelamo • New Contributor II

12-07-2022 8:14:35 AM

4037 Views
4 replies
1 kudos

Getting non-traceable NullPointerExceptions

We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...

Data Engineering

4037 Views
4 replies
1 kudos

12-07-2022 8:14:35 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-07-2022 9:15:21 AM

1 kudos

NullPointerException will occur when you are accessing an instance method or if you are trying to access elements in a null array or you are calling a method on an object referred by null value. To give you suggestion on how to avoid that, we might ...

1 kudos

12-07-2022 9:15:21 AM

3 More Replies

by Smitha1 • Valued Contributor II

12-05-2022 4:00:55 PM

3819 Views
10 replies
6 kudos

Resolved! onsite exam center registration Databricks Certified Associate Developer for Apache Spark 3

Dear All @Nadia Elsayed @Vidula Khanna @Harshjot Singh @Jose Gonzalez @Joseph Kambourakis Hope you are well and had a good weekend.I am still waiting to receive voucher after redeeming points which is due this weekMy issue is slots are full to ...

Data Engineering

3819 Views
10 replies
6 kudos

12-05-2022 4:00:55 PM

View Replies

Latest Reply

nphau
Valued Contributor

12-06-2022 11:51:06 PM

6 kudos

I have the same problem as you. I submitted a ticket to Databricks "Help to re-schedule assessment day in webassessor", but they responsed as below: " Please accept my apologies for the inconvenience caused and the delay in responding. I'm sorry to i...

6 kudos

12-06-2022 11:51:06 PM

9 More Replies

by fury88 • New Contributor II

11-30-2022 9:04:20 AM

989 Views
1 replies
1 kudos

Does CACHE TABLE/VIEW have a create or replace like view?

I'm trying to cache data/queries that we normally have as temporary views that get replaced when the code is run based on dynamic python. What I'd like to know is will CACHE TABLE get overwritten each time you run it? Is it smart enough to recognize ...

Data Engineering

989 Views
1 replies
1 kudos

11-30-2022 9:04:20 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-30-2022 9:53:15 AM

1 kudos

Hi @Matt Fury Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached. However, you can check whether the table is cached or not using .storageLevel method. E.g. I have...

1 kudos

11-30-2022 9:53:15 AM

by rishabh4312 • Contributor II

10-30-2022 9:50:47 AM

3459 Views
18 replies
56 kudos

Voucher code error

Hi,I received a voucher in Nov 2020 for 'Databricks Certified Associate Developer for Apache Spark 3.0 exam' with an expiry date on 10th Nov 2022. However I receive an error stating the promotion code has been used. I have never used the code. Please...

Data Engineering

3459 Views
18 replies
56 kudos

10-30-2022 9:50:47 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-19-2022 6:29:50 AM

56 kudos

Hi @Rishabh Jain Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

56 kudos

11-19-2022 6:29:50 AM

17 More Replies

by BkP • Contributor

10-28-2022 12:55:01 AM

1311 Views
3 replies
3 kudos

Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment

Hi All, We are developing a new Scala/Java program which needs to read & process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs & TBs). What kind of connection ...

Data Engineering

1311 Views
3 replies
3 kudos

10-28-2022 12:55:01 AM

View Replies

Latest Reply

BkP
Contributor

10-31-2022 12:31:28 PM

3 kudos

hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this@Kaniz Fatma , @Vartika Nain , @Bilal Aslam

3 kudos

10-31-2022 12:31:28 PM

2 More Replies

by HB • New Contributor III

01-28-2022 6:58:01 AM

1177 Views
4 replies
3 kudos

Resolved! Still missing Badge for Apache Spark 3.0 Associate Dev certification

Hello,I have taken my exam 2 weeks ago and have passed it but I still did not received my badge. I have contacted the support team twice but still no response. Could you please help? Thank you!

Data Engineering

1177 Views
4 replies
3 kudos

01-28-2022 6:58:01 AM

View Replies

Latest Reply

ashok_k_gupta12
New Contributor III

10-01-2022 1:09:39 AM

3 kudos

Databricks should fix the certification platform ASAP, currently a user needs to login to multiple different sites to get a certification.Each site has its own login that makes it very difficult to remember. There is not integration or synergy among ...

3 kudos

10-01-2022 1:09:39 AM

3 More Replies

by Taha_Hussain • Valued Contributor II

09-22-2022 3:51:48 PM

1326 Views
2 replies
6 kudos

Register for Databricks Office HoursSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMT Databricks Office Hours connects you directly with exper...

Register for Databricks Office HoursSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMTDatabricks Office Hours connects you directly with experts to answer your Databricks questions.Join us to:• Troubleshoot your technical questions• Learn the ...

Data Engineering

1326 Views
2 replies
6 kudos

09-22-2022 3:51:48 PM

View Replies

Latest Reply

Taha_Hussain
Valued Contributor II

09-29-2022 10:57:53 PM

6 kudos

Cont...Q: Do generated columns in Delta Live Tables include IDENTITY columns?A: My understanding is that generated columns in Delta Live Tables do not contain IDENTITY columns. Here is more on generated columns in DLT.Q: We store raw data for each cu...

6 kudos

09-29-2022 10:57:53 PM

1 More Replies

by KumarShiv • New Contributor III

08-01-2022 5:36:22 AM

1149 Views
2 replies
2 kudos

Resolved! Databricks Spark SQL function "PERCENTILE_DISC()" output not accurate.

I am try to get the percentile values on different splits but I got that the result of Databricks PERCENTILE_DISC() function is not accurate . I have run the same query on MS SQL but getting different result set.Here are both result sets for Pyspark ...

Data Engineering

1149 Views
2 replies
2 kudos

08-01-2022 5:36:22 AM

View Replies

Latest Reply

artsheiko
Valued Contributor III

08-03-2022 11:17:53 AM

2 kudos

The reason might be that in SQL PERCENTILE_DISC is nondeterministic

2 kudos

08-03-2022 11:17:53 AM

1 More Replies

by Taha_Hussain • Valued Contributor II

08-15-2022 7:47:15 PM

819 Views
0 replies
3 kudos

Register for Databricks Office HoursAugust 17 & August 31 from 8:00am - 9:00am PT | 3:00pm - 4:00pm GMT. Databricks Office Hours connects you dire...

Register for Databricks Office HoursAugust 17 & August 31 from 8:00am - 9:00am PT | 3:00pm - 4:00pm GMT.Databricks Office Hours connects you directly with experts to answer your Databricks questions.Join us to: • Troubleshoot your technical questions...

Data Engineering

819 Views
0 replies
3 kudos

08-15-2022 7:47:15 PM

by Dicer • Valued Contributor

08-12-2022 2:16:46 AM

2596 Views
4 replies
3 kudos

Resolved! Azure Databricks: Failed to extract data which is between two timestamps within those same dates using Pyspark

Data type:AAPL_Time: timestampAAPL_Close: floatRaw Data:AAPL_Time AAPL_Close 2015-05-11T08:00:00.000+0000 29.0344 2015-05-11T08:30:00.000+0000 29.0187 2015-05-11T09:00:00.000+0000 29.0346 2015-05-11T09:3...

Data Engineering

2596 Views
4 replies
3 kudos

08-12-2022 2:16:46 AM

View Replies

Latest Reply

Anonymous
Not applicable

08-13-2022 3:50:10 PM

3 kudos

Another thing to try is the hour() and minute() functions will return integers.

3 kudos

08-13-2022 3:50:10 PM

3 More Replies

by KumarShiv • New Contributor III

07-27-2022 12:41:49 AM

2901 Views
5 replies
11 kudos

Resolved! Databricks Issue:- assertion failed: Invalid shuffle partition specs:

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).Pls hel...

Data Engineering

2901 Views
5 replies
11 kudos

07-27-2022 12:41:49 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-27-2022 6:10:11 AM

11 kudos

Please use display(df_FinalAction)Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

11 kudos

07-27-2022 6:10:11 AM

4 More Replies

by ivanychev • Contributor

07-25-2022 8:03:45 AM

735 Views
0 replies
1 kudos

How to enable remote JMX monitoring in Databricks?

Adding these optionsEXTRA_JAVA_OPTIONS = ( '-Dcom.sun.management.jmxremote.port=9999', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', )is enough in vanilla Apache Spark, but apparently it ...

Data Engineering

735 Views
0 replies
1 kudos

07-25-2022 8:03:45 AM