cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

danny_edm
by New Contributor
  • 513 Views
  • 0 replies
  • 0 kudos

collect_set wired result when Proton enable

Cluster : DBR 10.4 LTS with protonSample schemaseq_no (decimal)type (string)Sample dataseq_no type1 A1 A2 A2 B2 Bcommand : F.size(F.collect_set(F.col("type")).over(Window.partitionBy("seq_no"))...

  • 513 Views
  • 0 replies
  • 0 kudos
Mamdouh_Dabjan
by New Contributor III
  • 3011 Views
  • 6 replies
  • 2 kudos

Importing a large csv file into databricks free

Basically, I have a large csv file that does not fit in a single worksheet. I can just use it in power query. I am trying to import this file into my databricks notebook. I imported it and created a table using that file. But, When I saw the table, i...

  • 3011 Views
  • 6 replies
  • 2 kudos
Latest Reply
weldermartins
Honored Contributor
  • 2 kudos

hello, manually opening one of the parts of the csv file is the view different?

  • 2 kudos
5 More Replies
yannickmo
by New Contributor III
  • 4937 Views
  • 8 replies
  • 14 kudos

Resolved! Adding JAR from Azure DevOps Artifacts feed to Databricks job

Hello,We have some Scala code which is compiled and published to an Azure DevOps Artifacts feed.The issue is we're trying to now add this JAR to a Databricks job (through Terraform) to automate the creation.To do this I'm trying to authenticate using...

  • 4937 Views
  • 8 replies
  • 14 kudos
Latest Reply
alexott
Valued Contributor II
  • 14 kudos

As of right now, Databricks can't use non-public Maven repositories as resolving of the maven coordinates happens in the control plane. That's different from the R & Python libraries. As workaround you may try to install libraries via init script or ...

  • 14 kudos
7 More Replies
User16752245312
by New Contributor III
  • 4249 Views
  • 2 replies
  • 2 kudos

How can I automatically capture the heap dump on the driver and executors in the event of an OOM error?

If you have a job that repeatedly run into Out-of-memory error (OOM) either on the driver or executors, automatically capture the heap dump on OOM event will help debugging the memory issue and identify the cause of the error.Spark config:spark.execu...

  • 4249 Views
  • 2 replies
  • 2 kudos
Latest Reply
John_360
New Contributor II
  • 2 kudos

Is it necessary to use exactly that HeapDumpPath? I find I'm unable to get driver heap dumps with a different path but otherwise the same configuration. I'm using spark_version 10.4.x-cpu-ml-scala2.12.

  • 2 kudos
1 More Replies
Serhii
by Contributor
  • 2375 Views
  • 1 replies
  • 1 kudos

Resolved! Behaviour of cluster launches in multi-task jobs

We are adapting the multi-tasks workflow example from dbx documentation for our pipelines https://dbx.readthedocs.io/en/latest/examples/python_multitask_deployment_example.html. As a part of configuration we specify cluster configuration and provide ...

  • 2375 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16873043099
Contributor
  • 1 kudos

Tasks within the same multi task job can reuse the clusters. A shared job cluster allows multiple tasks in the same job to use the cluster. The cluster is created and started when the first task using the cluster starts and terminates after the last ...

  • 1 kudos
Ashok1
by New Contributor II
  • 1006 Views
  • 2 replies
  • 1 kudos
  • 1006 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Ashok ch​ Hope everything is going great.Does @Ivan Tang​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more hel...

  • 1 kudos
1 More Replies
shubhamb
by New Contributor III
  • 3197 Views
  • 3 replies
  • 3 kudos

How to fetch environmental variables saved in one notebook into another notebook in Databricks Repos and Notebooks

I have this config.py file which is used to store environmental variablesPUSH_API_ACCOUNT_ID = '*******' PUSH_API_PASSCODE = '***********************'I am using this to fetch the variables and use it in my file.py import sys   sys.path.append("..") ...

  • 3197 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hey there @Shubham Biswas​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from ...

  • 3 kudos
2 More Replies
BradSheridan
by Valued Contributor
  • 2800 Views
  • 9 replies
  • 4 kudos

Resolved! How to use cloudFiles to completely overwrite the target

Hey there Community!! I have a client that will produce a CSV file daily that needs to be moved from Bronze -> Silver. Unfortunately, this source file will always be a full set of data....not incremental. I was thinking of using AutoLoader/cloudFil...

  • 2800 Views
  • 9 replies
  • 4 kudos
Latest Reply
BradSheridan
Valued Contributor
  • 4 kudos

I "up voted'" all of @werners suggestions b/c they are all very valid ways of addressing my need (the true power/flexibility of the Databricks UDAP!!!). However, turns out I'm going to end up getting incremental data afterall :). So now the flow wi...

  • 4 kudos
8 More Replies
Deepak_Goldwyn
by New Contributor III
  • 618 Views
  • 0 replies
  • 0 kudos

Pass parameter value from Job to DLT pipeline

We are investigating how to pass parameter from Databricks Job to DLT pipeline. Our process orchestrator is Azure Data Factory from where we trigger the Databricks Job using Jobs API. As part of the 'run-now' request, we would like to pass a paramete...

  • 618 Views
  • 0 replies
  • 0 kudos
BkP
by Contributor
  • 574 Views
  • 0 replies
  • 0 kudos

Hi, I am getting an error while creating a cluster and trying to open a notebook to run. How to overcome this error ? I have sent an email to databric...

Hi,I am getting an error while creating a cluster and trying to open a notebook to run. How to overcome this error ? I have sent an email to databricks support but not received any response till now. Please help and guide.

databricks error in community edition
  • 574 Views
  • 0 replies
  • 0 kudos
explore
by New Contributor
  • 1150 Views
  • 0 replies
  • 0 kudos

Hi, Can we connect to the Teradata vantage installed in a vm via the community notebook. I am working on a POC to fetch data from Teradata vantate (just a teradata as it uses the jdbc) and process it in community notebook. Downloaded the terajdbc4.jar

from pyspark.sql import SparkSessionspark = SparkSession.builder.getOrCreate()def load_data(driver, jdbc_url, sql, user, password):  return spark.read \    .format('jdbc') \    .option('driver', driver) \    .option('url', jdbc_url) \    .option('dbt...

  • 1150 Views
  • 0 replies
  • 0 kudos
youngchef
by New Contributor
  • 1445 Views
  • 3 replies
  • 3 kudos

Resolved! AWS Instance Profiles and DLT Pipelines

Hey everyone! I'm building a DLT pipeline that reads files from S3 (or tries to) and then writes them into different directories in my s3 bucket. The problem is I usually access S3 with an instance profile attached to a cluster, but DLT does not give...

  • 1445 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

{ "clusters": [ { "label": "default", "aws_attributes": { "instance_profile_arn": "arn:aws:..." } }, { "label": "maintenance", "aws_attributes": { "instance_profile_arn": "arn:aws:..." ...

  • 3 kudos
2 More Replies
ricard98
by New Contributor II
  • 3207 Views
  • 3 replies
  • 5 kudos

How do you connect a folder path from your desktop to DB notebook?

I have a folder with multiples excel files that contains information from different cost centers, these files get update every week , im trying to upload all these files to the DB notebook , is there a way to connect the path directly to the DBFS to...

  • 3207 Views
  • 3 replies
  • 5 kudos
Latest Reply
User16873043099
Contributor
  • 5 kudos

Hello, Thanks for your question.You can mount a cloud object storage to dbfs and use them in a notebook. Please refer here.It is not possible to mount a local folder from desktop to dbfs. But you should be able to use the Databricks CLI to copy the e...

  • 5 kudos
2 More Replies
gazzyjuruj
by Contributor II
  • 7298 Views
  • 4 replies
  • 9 kudos

Cluster start is currently disabled ?

Hi, i'm trying to run the notebooks but it doesn't do any activity.I had to create a cluster in order to start my code.pressing the play button inside of notebook does nothing at all.and the 'compute' , pressing play there on the clusters gives the e...

  • 7298 Views
  • 4 replies
  • 9 kudos
Latest Reply
jose_gonzalez
Moderator
  • 9 kudos

Hi @Ghazanfar Uruj​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 9 kudos
3 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels