cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sujitha
by Community Manager
  • 733 Views
  • 1 replies
  • 4 kudos

Documentation Update  Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, and data enginee...

Documentation Update Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, and data engineers working in the Databricks Data Science & Engineering, Databricks Machine Learning, and Databricks ...

  • 733 Views
  • 1 replies
  • 4 kudos
Latest Reply
Harun
Honored Contributor
  • 4 kudos

Thanks for sharing @Sujitha Ramamoorthy​ 

  • 4 kudos
gpzz
by New Contributor II
  • 1193 Views
  • 1 replies
  • 3 kudos

pyspark code error

rdd4 = rdd3.reducByKey(lambda x,y: x+y)AttributeError: 'PipelinedRDD' object has no attribute 'reducByKey'Pls help me out with this

  • 1193 Views
  • 1 replies
  • 3 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 3 kudos

Is it a typo or are you really using reducByKey instead of reduceByKey ?

  • 3 kudos
Sujitha
by Community Manager
  • 952 Views
  • 6 replies
  • 5 kudos

KB Feedback Discussion  In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers ...

KB Feedback Discussion In addition to the Databricks Community, we have a Support team that maintains a Knowledge Base (KB). The KB contains answers to common questions about Databricks, as well as information on optimisation and troubleshooting.Thes...

  • 952 Views
  • 6 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

Thanks for sharing @Sujitha Ramamoorthy​ 

  • 5 kudos
5 More Replies
alhuelamo
by New Contributor II
  • 4037 Views
  • 4 replies
  • 1 kudos

Getting non-traceable NullPointerExceptions

We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...

  • 4037 Views
  • 4 replies
  • 1 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 1 kudos

NullPointerException will occur when you are accessing an instance method or if you are trying to access elements in a null array or you are calling a method on an object referred by null value. To give you suggestion on how to avoid that, we might ...

  • 1 kudos
3 More Replies
Smitha1
by Valued Contributor II
  • 3819 Views
  • 10 replies
  • 6 kudos

Resolved! onsite exam center registration Databricks Certified Associate Developer for Apache Spark 3

Dear All @Nadia Elsayed​  @Vidula Khanna​ @Harshjot Singh​ @Jose Gonzalez​ @Joseph Kambourakis​ Hope you are well and had a good weekend.I am still waiting to receive voucher after redeeming points which is due this weekMy issue is slots are full to ...

  • 3819 Views
  • 10 replies
  • 6 kudos
Latest Reply
nphau
Valued Contributor
  • 6 kudos

I have the same problem as you. I submitted a ticket to Databricks "Help to re-schedule assessment day in webassessor", but they responsed as below: " Please accept my apologies for the inconvenience caused and the delay in responding. I'm sorry to i...

  • 6 kudos
9 More Replies
fury88
by New Contributor II
  • 989 Views
  • 1 replies
  • 1 kudos

Does CACHE TABLE/VIEW have a create or replace like view?

I'm trying to cache data/queries that we normally have as temporary views that get replaced when the code is run based on dynamic python. What I'd like to know is will CACHE TABLE get overwritten each time you run it? Is it smart enough to recognize ...

  • 989 Views
  • 1 replies
  • 1 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 1 kudos

Hi @Matt Fury​ Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached. However, you can check whether the table is cached or not using .storageLevel method. E.g. I have...

  • 1 kudos
rishabh4312
by Contributor II
  • 3459 Views
  • 18 replies
  • 56 kudos

Voucher code error

Hi,I received a voucher in Nov 2020 for 'Databricks Certified Associate Developer for Apache Spark 3.0 exam' with an expiry date on 10th Nov 2022. However I receive an error stating the promotion code has been used. I have never used the code. Please...

  • 3459 Views
  • 18 replies
  • 56 kudos
Latest Reply
Anonymous
Not applicable
  • 56 kudos

Hi @Rishabh Jain​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 56 kudos
17 More Replies
BkP
by Contributor
  • 1311 Views
  • 3 replies
  • 3 kudos

Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment

Hi All, We are developing a new Scala/Java program which needs to read & process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs & TBs). What kind of connection ...

requirement
  • 1311 Views
  • 3 replies
  • 3 kudos
Latest Reply
BkP
Contributor
  • 3 kudos

hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this@Kaniz Fatma​ , @Vartika Nain​ , @Bilal Aslam​ 

  • 3 kudos
2 More Replies
HB
by New Contributor III
  • 1177 Views
  • 4 replies
  • 3 kudos

Resolved! Still missing Badge for Apache Spark 3.0 Associate Dev certification

Hello,I have taken my exam 2 weeks ago and have passed it but I still did not received my badge. I have contacted the support team twice but still no response. Could you please help? Thank you!

  • 1177 Views
  • 4 replies
  • 3 kudos
Latest Reply
ashok_k_gupta12
New Contributor III
  • 3 kudos

Databricks should fix the certification platform ASAP, currently a user needs to login to multiple different sites to get a certification.Each site has its own login that makes it very difficult to remember. There is not integration or synergy among ...

  • 3 kudos
3 More Replies
Taha_Hussain
by Valued Contributor II
  • 1326 Views
  • 2 replies
  • 6 kudos

Register for Databricks Office HoursSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMT Databricks Office Hours connects you directly with exper...

Register for Databricks Office HoursSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMTDatabricks Office Hours connects you directly with experts to answer your Databricks questions.Join us to:• Troubleshoot your technical questions• Learn the ...

  • 1326 Views
  • 2 replies
  • 6 kudos
Latest Reply
Taha_Hussain
Valued Contributor II
  • 6 kudos

Cont...Q: Do generated columns in Delta Live Tables include IDENTITY columns?A: My understanding is that generated columns in Delta Live Tables do not contain IDENTITY columns. Here is more on generated columns in DLT.Q: We store raw data for each cu...

  • 6 kudos
1 More Replies
KumarShiv
by New Contributor III
  • 1149 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks Spark SQL function "PERCENTILE_DISC()" output not accurate.

I am try to get the percentile values on different splits but I got that the result of Databricks PERCENTILE_DISC() function is not accurate . I have run the same query on MS SQL but getting different result set.Here are both result sets for Pyspark ...

  • 1149 Views
  • 2 replies
  • 2 kudos
Latest Reply
artsheiko
Valued Contributor III
  • 2 kudos

The reason might be that in SQL PERCENTILE_DISC is nondeterministic

  • 2 kudos
1 More Replies
Taha_Hussain
by Valued Contributor II
  • 819 Views
  • 0 replies
  • 3 kudos

Register for Databricks Office HoursAugust 17 & August 31 from 8:00am - 9:00am PT | 3:00pm - 4:00pm GMT. Databricks Office Hours connects you dire...

Register for Databricks Office HoursAugust 17 & August 31 from 8:00am - 9:00am PT | 3:00pm - 4:00pm GMT.Databricks Office Hours connects you directly with experts to answer your Databricks questions.Join us to: • Troubleshoot your technical questions...

  • 819 Views
  • 0 replies
  • 3 kudos
Dicer
by Valued Contributor
  • 2596 Views
  • 4 replies
  • 3 kudos

Resolved! Azure Databricks: Failed to extract data which is between two timestamps within those same dates using Pyspark

Data type:AAPL_Time: timestampAAPL_Close: floatRaw Data:AAPL_Time AAPL_Close 2015-05-11T08:00:00.000+0000 29.0344 2015-05-11T08:30:00.000+0000 29.0187 2015-05-11T09:00:00.000+0000 29.0346 2015-05-11T09:3...

  • 2596 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Another thing to try is the hour() and minute() functions will return integers.

  • 3 kudos
3 More Replies
KumarShiv
by New Contributor III
  • 2901 Views
  • 5 replies
  • 11 kudos

Resolved! Databricks Issue:- assertion failed: Invalid shuffle partition specs:

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).Pls hel...

DB_Issue
  • 2901 Views
  • 5 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

Please use display(df_FinalAction)Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

  • 11 kudos
4 More Replies
ivanychev
by Contributor
  • 735 Views
  • 0 replies
  • 1 kudos

How to enable remote JMX monitoring in Databricks?

Adding these optionsEXTRA_JAVA_OPTIONS = ( '-Dcom.sun.management.jmxremote.port=9999', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', )is enough in vanilla Apache Spark, but apparently it ...

  • 735 Views
  • 0 replies
  • 1 kudos
Labels