cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

joao_albuquerqu
by New Contributor II
  • 14326 Views
  • 2 replies
  • 2 kudos

Is it possible to have Cluster with pre-installed dependencies?

I run some jobs in the Databricks environment where some resources need authentication. I do this (and I need to) through the vault-cli in the init-script.However, every time in the init-script I need to install vault-cli and other libraries. Is ther...

  • 14326 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@João Victor Albuquerque​ :Yes, there are a few ways to pre-install libraries and tools in the Databricks environment:Cluster-scoped init scripts: You can specify a shell script to be run when a cluster is created or restarted. This script can includ...

  • 2 kudos
1 More Replies
__Databricks_Su
by Contributor
  • 105864 Views
  • 17 replies
  • 20 kudos
  • 105864 Views
  • 17 replies
  • 20 kudos
Latest Reply
luis_herrera
Databricks Employee
  • 20 kudos

To pass arguments/variables to a notebook, you can use a JSON file to temporarily store the arguments and then pass it as one argument to the notebook. After passing the JSON file to the notebook, you can parse it with json.loads(). The argument list...

  • 20 kudos
16 More Replies
Data_Analytics1
by Contributor III
  • 21024 Views
  • 17 replies
  • 24 kudos

Fatal error: The Python kernel is unresponsive.

I am using MultiThread in this job which creates 8 parallel jobs. It fails for few times in a day and sometimes stuck in any of the Python notebook cell process. Here The Python process exited with an unknown exit code.The last 10 KB of the process's...

  • 21024 Views
  • 17 replies
  • 24 kudos
Latest Reply
luis_herrera
Databricks Employee
  • 24 kudos

Hey, it seems that the issue is related to the driver undergoing a memory bottleneck, which causes it to crash with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. The reason for th...

  • 24 kudos
16 More Replies
source2sea
by Contributor
  • 9259 Views
  • 2 replies
  • 0 kudos

Resolved! pass application.conf file into databricks jobs

i copied my question from an very old question/post that i reponded. and decided to move it to here:context:I have jar (scala), using scala pureconfig (wrapper of typesafe config)uploaded an application.conf file to a path which is mounted to the wor...

  • 9259 Views
  • 2 replies
  • 0 kudos
Latest Reply
source2sea
Contributor
  • 0 kudos

we had to put the conf in the root folder of the mounted path, and that works.maybe the mounted storage account being blob instead of adls2 is causing the issues.

  • 0 kudos
1 More Replies
mbaumga
by New Contributor III
  • 8343 Views
  • 3 replies
  • 2 kudos

Performance issues when loading an Excel file from DBFS using R

I have uploaded small Excel files on my DBFS. I then use function read_xlsx() from the "readxl" package in R to import the file into the R memory. I use a standard cluster (12.1, non ML). The function works but it takes ages. E.g. a simple Excel tabl...

  • 8343 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Marcel Baumgartner​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 2 kudos
2 More Replies
Dhiraj_Singh
by New Contributor II
  • 7331 Views
  • 4 replies
  • 0 kudos

Databricks Associate Exam Webinar Survey link absent

Hi Databricks,I completed the Data Engineer Associate Exam Webinar held on the 11th of April 2020 and haven't received any survey link. How can I give the survey for the exam voucher?Please Guide me on this..

  • 7331 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Dhiraj Singh​,Just a friendly follow-up. Do you still need help to get your certificate? or you no longer need help? please les us know

  • 0 kudos
3 More Replies
Anonymous
by Not applicable
  • 2132 Views
  • 3 replies
  • 2 kudos

 Hello Everyone, I am thrilled to announce that we have our first winner for the raffle contest - @Vincent Slegers​. Please join me in congratulating ...

 Hello Everyone,I am thrilled to announce that we have our first winner for the raffle contest - @Vincent Slegers​. Please join me in congratulating him on this remarkable achievement!Vincent, your dedication and hard work have paid off, and we are d...

2
  • 2132 Views
  • 3 replies
  • 2 kudos
Latest Reply
yogu
Honored Contributor III
  • 2 kudos

congrats @Vsleg

  • 2 kudos
2 More Replies
Hari_Dbrc
by New Contributor II
  • 3429 Views
  • 2 replies
  • 0 kudos

Issue while using community edition

Hello,Is anyone facing issue with their community edition.?Shows the below error and cant access the workspace or previosly created notebooks..Tried accessing from different devices..(not a cache issue)Error popup:(screenshots attached)Unable to view...

  • 3429 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Hari N​ Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not resolve the...

  • 0 kudos
1 More Replies
scalasparkdev
by New Contributor
  • 3376 Views
  • 2 replies
  • 0 kudos

Pyspark Structured Streaming Avro integration to Azure Schema Registry with Kafka/Eventhub in Databricks environment.

I am looking for a simple way to have a structured streaming pipeline that would automatically register a schema to Azure schema registry when converting a df col into avro and that would be able to deserialize an avro col based on schema registry ur...

  • 3376 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Tomas Sedlon​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 2927 Views
  • 3 replies
  • 2 kudos

www.dbdemos.ai

Getting started with Databricks is being made very easy now. Presenting dbdemos.If you're looking to get started with Databricks, there's good news: dbdemos makes it easier than ever. This platform offers a range of demos that you can install directl...

  • 2927 Views
  • 3 replies
  • 2 kudos
Latest Reply
FJ
Contributor III
  • 2 kudos

That's a great share Suteja. Is that supposed to work with the Databricks Community edition account? Had a strange error while trying. Any help is appreciated!Thanks,F

  • 2 kudos
2 More Replies
JH
by New Contributor II
  • 4186 Views
  • 5 replies
  • 1 kudos

Does thrift only exist in databrick control plane?

Hi all, I'm a user of Azure databricks. We recently found there is a thrift vulnerability issue (CVE-2020-13949) in Spark Hive. We have tried to fix it at our side. We also found there is a open issue at Spark jira board - https://issues.apache.org/j...

  • 4186 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Jimin Hsieh​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 1 kudos
4 More Replies
Kearon
by New Contributor III
  • 8733 Views
  • 11 replies
  • 0 kudos

Process batches in a streaming pipeline - identifying deletes

OK. So I think I'm probably missing the obvious and tying myself in knots here.Here is the scenario:batch datasets arrive in json format in an Azure data lakeeach batch is a complete set of "current" records (the complete table)these are processed us...

  • 8733 Views
  • 11 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Kearon McNicol​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 0 kudos
10 More Replies
nyehia
by Contributor
  • 16905 Views
  • 19 replies
  • 1 kudos

Can not access SQL files in the Shared workspace

Hey,we have an issue in that we can access the SQL files whenever the notebook is in the repo path but whenever the CICD pipeline imports the repo notebooks and SQL files to the shared workspace, we can list the SQL files but can not read them.we cha...

  • 16905 Views
  • 19 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@Nermin Yehia​ yes, as you are moving files to different location manually , just update as can manage permissions in target and that should take care of everything

  • 1 kudos
18 More Replies
kinsun
by New Contributor II
  • 2868 Views
  • 3 replies
  • 0 kudos

Resolved! Delta Live Table Service Upgrade

Dear experts, Might I know what will happen to the delta live table pipeline which is in a cancelled state, when there is a runtime service upgrade? Thanks!

  • 2868 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@KS LAU​ :When a runtime service upgrade occurs in Databricks, any running tasks or pipelines may be temporarily interrupted while the upgrade is being applied. In the case of a cancelled Delta Live Table pipeline, it will not be impacted by the upgr...

  • 0 kudos
2 More Replies
GuMart
by New Contributor III
  • 5985 Views
  • 4 replies
  • 2 kudos

Resolved! DLT target schema - get value during run time

Hi,I would like to know if it is possible to get the target schema, programmatically, inside a DLT.In DLT pipeline settings, destination, target schema.I want to run more idempotent pipelines. For example, my target table has the fields: reference_da...

  • 5985 Views
  • 4 replies
  • 2 kudos
Latest Reply
GuMart
New Contributor III
  • 2 kudos

Thank you @Suteja Kanuri​ ,Looks like you solution is working, thank you.Regards,

  • 2 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels