cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Ogawa
by New Contributor III
  • 3875 Views
  • 10 replies
  • 2 kudos

Selfpaced course link for "Apache Spark developer associate "

Please where can i find the the "Apache Spark developer associate " selfpaced course ?Thanks in advance.

  • 3875 Views
  • 10 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @youssef ansari​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 2 kudos
9 More Replies
ckwan48
by New Contributor III
  • 2398 Views
  • 3 replies
  • 1 kudos

Create a Dockerfile from Cluster

Is there a way to create a Dockerfile from Workspace A's cluster configurations and deploy that on a different different cluster in Workspace B?

  • 2398 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Kevin Kim​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 1 kudos
2 More Replies
Tannu858
by New Contributor
  • 1088 Views
  • 1 replies
  • 0 kudos
  • 1088 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Bhupesh Aggarwal​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us...

  • 0 kudos
ssy
by New Contributor II
  • 1368 Views
  • 1 replies
  • 1 kudos

How to use SparkNLP library and JohnSnowLabs maven coordinates in cluster which is not connected to internet

Hi,I am trying SparkNLP library for the first time. The cluster I'm using is corporate and cannot be connected to internet. I can only download packages that are provided to us or by using a jar file.I've three questions:What jar files do I need to ...

  • 1368 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Samy Syed​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 1 kudos
Dilorom
by New Contributor
  • 5900 Views
  • 3 replies
  • 4 kudos

What is a recommended directory for creating a database with a specified path?

I was going through Data Engineering with Databricks training, and in DE 3.3L - Databases, Tables & Views Lab section, it says "Defining database directories for groups of users can greatly reduce the chances of accidental data exfiltration." I agree...

  • 5900 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Dilorom A​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 4 kudos
2 More Replies
Smitha1
by Valued Contributor II
  • 1423 Views
  • 3 replies
  • 1 kudos

December exam voucher for Databricks Certified Associate Developer for Apache Spark 3.0 exam

Dear @Jose Gonzalez​  Hope you're having great day. This is of HIGH priority for me, I've to schedule exam in December before slots are full.I gave Databricks Certified Associate Developer for Apache Spark 3.0 exam on 30th Nov but missed by one perc...

  • 1423 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Smitha Nelapati​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 1 kudos
2 More Replies
Mani1800
by New Contributor
  • 1663 Views
  • 2 replies
  • 0 kudos

I need to run Sql Update/Delete commands for a AWS RDS system.

I tried 'jdbc' connection to access the data from the RDS. I was able to read the data successfully but I need to do run some update queries. It seems the jdbc won't support update operation. I tried to make connection to my RDS mysql with host, user...

  • 1663 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Manikandan Ramachandran​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...

  • 0 kudos
1 More Replies
vinaykumar
by New Contributor III
  • 4928 Views
  • 6 replies
  • 0 kudos

Log files are not getting deleted automatically after logRetentionDuration internal

Hi team Log files are not getting deleted automatically after logRetentionDuration internal from delta log folder and after analysis , I see checkpoint files are not getting created after 10 commits . Below table properties using spark.sql(    f"""  ...

No checkpoint.parquet
  • 4928 Views
  • 6 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @vinay kumar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 0 kudos
5 More Replies
UR
by New Contributor II
  • 1379 Views
  • 3 replies
  • 1 kudos

Didn't received the certificate for Databricks Certified Data Engineer Associate exam

@Vidula Khanna​ @Nadia Elsayed​ Hi,I pass Databricks Certified Data Engineer Associate exam 48 hours ago. But still didn't received the certificate yet. I also created ticket(00312849) 6 hours ago but still no one reach out to me yet regarding this i...

  • 1379 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Urvish Patel​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 1 kudos
2 More Replies
Rami2023
by New Contributor II
  • 8208 Views
  • 1 replies
  • 0 kudos

Read CSV file using SQL

Hi, I am trying to reverse engineer to get to the source file for a table. Looking at the query history, I came across SQL string which loads data from file to table, however the code looks little mystery to me. I haven't come across idbfs, Can someb...

  • 8208 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hi, I am trying to reverse engineer to get to the source file for a table. Looking at the query history, I came across SQL string which loads data from file to table, however the code looks little mystery to me. I haven't come across idbfs, Can someb...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Tush
by New Contributor
  • 1115 Views
  • 2 replies
  • 1 kudos

Databricks Certified Associate Developer for Apache Certification

Hi Team, just wanted to know is Databricks planning to sunset Databricks Certified Associate Developer for Apache certification.I guess Databricks already informed the partners about the same so just thought to reconfirm the same.thank you!

  • 1115 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Tushar Bomble​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 1 kudos
1 More Replies
sanjay
by Valued Contributor II
  • 4157 Views
  • 2 replies
  • 1 kudos

Resolved! ImportError: cannot import name dataclass_transform

Hi,I am using Standard Runtime 11.3 LTS and trying to utilize spacy-> en_core_web_sm but I am getting following error.ImportError: cannot import name dataclass_transformIt was working last week but stopped working recently.Appreciate any help. Regard...

  • 4157 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Sanjay Jain​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 1 kudos
1 More Replies
Raja_682227
by New Contributor II
  • 2166 Views
  • 2 replies
  • 2 kudos

Databricks Data Cleanroom

Just needs to understand the data cleanroom. As per the documentation, Databricks Data Cleanroom provides a secure, governed, and privacy-safe environment. Participants can enable fine-grained control access to data with the help of UnityCatalog.Also...

  • 2166 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Rajarampandian Arumugam​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear fro...

  • 2 kudos
1 More Replies
Snowhow1
by New Contributor II
  • 8225 Views
  • 1 replies
  • 1 kudos

Logging when using multiprocessing with joblib

Hi,I'm using joblib for multiprocessing in one of our processes. The logging does work well (except weird py4j errors which I supress) except when it's within multiprocessing. Also how do I supress the other errors that I always receive on DB - perha...

  • 8225 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Sam G​ :It seems like the issue is related to the py4j library used by Spark, and not specifically related to joblib or multiprocessing. The error message indicates a network error while sending a command between the Python process and the Java Virt...

  • 1 kudos
jhon341
by New Contributor
  • 4067 Views
  • 1 replies
  • 0 kudos

How can I optimize Spark performance in Databricks for large-scale data processing

I'm using Databricks for processing large-scale data with Apache Spark, but I'm experiencing performance issues. The processing time is taking longer than expected, and I'm encountering memory and CPU usage limitations. I want to optimize the perform...

  • 4067 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@jhon marton​ :Optimizing Spark performance in Databricks for large-scale data processing can involve a combination of techniques, configurations, and best practices. Below are some recommendations that can help improve the performance of your Spark ...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels