Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi @wenting_deng wenting_deng Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, pleas...
Hi! I'm optimizing several Tb of partitioned data on ZSTD lvl 9.It surprises me the level of shuffle write, it could make sense because of ZORDER but I want to be sure that I'm not missing something, here is some context: Could I be missing something...
Hi @Alejandro Martinez Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...
Currently I am using the following cluster. It is using the default python version of 3.9.5 and I would like to update it to 3.10.1.0How to achieve this?
Hi @Ayush Modi Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...
Hello,What is the relationship between Databricks Account (as described in [1]) and Azure resources? Is Databricks Account created per Azure account? Or per Azure tenant? Or maybe per subscription?[1] https://learn.microsoft.com/en-us/azure/databrick...
Hi @Chris Nawara Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...
Hello. Could someone please explain why iteration over a Pyspark dataframe is way slower than over a Pandas dataframe?Pysparkdf_list = df.collect()for index in range(0, len(df_list )):.....Pandasdf_pnd = df.toPandas() for index, row in df_p...
Hi @ELENI GEORGOUSI Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us ...
To give you a little bit of background:We use terraform to deploy a resource group with multiple Azure services Terraform leverages an Azure Service Principal that has Owner rights to the Azure subscriptionThis way, databricks is also deployed. We al...
Hi @Gent Reshtani Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers...
Hi @youssef ansari Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...
Hi @Kevin Kim Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...
Hi @Bhupesh Aggarwal Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us...
Hi,I am trying SparkNLP library for the first time. The cluster I'm using is corporate and cannot be connected to internet. I can only download packages that are provided to us or by using a jar file.I've three questions:What jar files do I need to ...
Hi @Samy Syed Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...
I was going through Data Engineering with Databricks training, and in DE 3.3L - Databases, Tables & Views Lab section, it says "Defining database directories for groups of users can greatly reduce the chances of accidental data exfiltration." I agree...
Hi @Dilorom A Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...
Dear @Jose Gonzalez Hope you're having great day. This is of HIGH priority for me, I've to schedule exam in December before slots are full.I gave Databricks Certified Associate Developer for Apache Spark 3.0 exam on 30th Nov but missed by one perc...
Hi @Smitha Nelapati Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training and our team will get back to you shortly.
I tried 'jdbc' connection to access the data from the RDS. I was able to read the data successfully but I need to do run some update queries. It seems the jdbc won't support update operation. I tried to make connection to my RDS mysql with host, user...
Hi @Manikandan Ramachandran Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...
Hi team Log files are not getting deleted automatically after logRetentionDuration internal from delta log folder and after analysis , I see checkpoint files are not getting created after 10 commits . Below table properties using spark.sql( f""" ...
Hi @vinay kumar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...
@Vidula Khanna @Nadia Elsayed Hi,I pass Databricks Certified Data Engineer Associate exam 48 hours ago. But still didn't received the certificate yet. I also created ticket(00312849) 6 hours ago but still no one reach out to me yet regarding this i...
Hi @Urvish Patel Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...