cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Mamdouh_Dabjan
by New Contributor III
  • 3635 Views
  • 6 replies
  • 2 kudos

Importing a large csv file into databricks free

Basically, I have a large csv file that does not fit in a single worksheet. I can just use it in power query. I am trying to import this file into my databricks notebook. I imported it and created a table using that file. But, When I saw the table, i...

  • 3635 Views
  • 6 replies
  • 2 kudos
Latest Reply
weldermartins
Honored Contributor
  • 2 kudos

hello, manually opening one of the parts of the csv file is the view different?

  • 2 kudos
5 More Replies
yannickmo
by New Contributor III
  • 5898 Views
  • 8 replies
  • 14 kudos

Resolved! Adding JAR from Azure DevOps Artifacts feed to Databricks job

Hello,We have some Scala code which is compiled and published to an Azure DevOps Artifacts feed.The issue is we're trying to now add this JAR to a Databricks job (through Terraform) to automate the creation.To do this I'm trying to authenticate using...

  • 5898 Views
  • 8 replies
  • 14 kudos
Latest Reply
alexott
Valued Contributor II
  • 14 kudos

As of right now, Databricks can't use non-public Maven repositories as resolving of the maven coordinates happens in the control plane. That's different from the R & Python libraries. As workaround you may try to install libraries via init script or ...

  • 14 kudos
7 More Replies
User16752245312
by New Contributor III
  • 4780 Views
  • 2 replies
  • 2 kudos

How can I automatically capture the heap dump on the driver and executors in the event of an OOM error?

If you have a job that repeatedly run into Out-of-memory error (OOM) either on the driver or executors, automatically capture the heap dump on OOM event will help debugging the memory issue and identify the cause of the error.Spark config:spark.execu...

  • 4780 Views
  • 2 replies
  • 2 kudos
Latest Reply
John_360
New Contributor II
  • 2 kudos

Is it necessary to use exactly that HeapDumpPath? I find I'm unable to get driver heap dumps with a different path but otherwise the same configuration. I'm using spark_version 10.4.x-cpu-ml-scala2.12.

  • 2 kudos
1 More Replies
Serhii
by Contributor
  • 2723 Views
  • 1 replies
  • 1 kudos

Resolved! Behaviour of cluster launches in multi-task jobs

We are adapting the multi-tasks workflow example from dbx documentation for our pipelines https://dbx.readthedocs.io/en/latest/examples/python_multitask_deployment_example.html. As a part of configuration we specify cluster configuration and provide ...

  • 2723 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16873043099
Contributor
  • 1 kudos

Tasks within the same multi task job can reuse the clusters. A shared job cluster allows multiple tasks in the same job to use the cluster. The cluster is created and started when the first task using the cluster starts and terminates after the last ...

  • 1 kudos
Ashok1
by New Contributor II
  • 1229 Views
  • 2 replies
  • 1 kudos
  • 1229 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Ashok ch​ Hope everything is going great.Does @Ivan Tang​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more hel...

  • 1 kudos
1 More Replies
shubhamb
by New Contributor III
  • 3754 Views
  • 3 replies
  • 3 kudos

How to fetch environmental variables saved in one notebook into another notebook in Databricks Repos and Notebooks

I have this config.py file which is used to store environmental variablesPUSH_API_ACCOUNT_ID = '*******' PUSH_API_PASSCODE = '***********************'I am using this to fetch the variables and use it in my file.py import sys   sys.path.append("..") ...

  • 3754 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hey there @Shubham Biswas​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from ...

  • 3 kudos
2 More Replies
BradSheridan
by Valued Contributor
  • 3634 Views
  • 9 replies
  • 4 kudos

Resolved! How to use cloudFiles to completely overwrite the target

Hey there Community!! I have a client that will produce a CSV file daily that needs to be moved from Bronze -> Silver. Unfortunately, this source file will always be a full set of data....not incremental. I was thinking of using AutoLoader/cloudFil...

  • 3634 Views
  • 9 replies
  • 4 kudos
Latest Reply
BradSheridan
Valued Contributor
  • 4 kudos

I "up voted'" all of @werners suggestions b/c they are all very valid ways of addressing my need (the true power/flexibility of the Databricks UDAP!!!). However, turns out I'm going to end up getting incremental data afterall :). So now the flow wi...

  • 4 kudos
8 More Replies
Deepak_Goldwyn
by New Contributor III
  • 748 Views
  • 0 replies
  • 0 kudos

Pass parameter value from Job to DLT pipeline

We are investigating how to pass parameter from Databricks Job to DLT pipeline. Our process orchestrator is Azure Data Factory from where we trigger the Databricks Job using Jobs API. As part of the 'run-now' request, we would like to pass a paramete...

  • 748 Views
  • 0 replies
  • 0 kudos
BkP
by Contributor
  • 655 Views
  • 0 replies
  • 0 kudos

Hi, I am getting an error while creating a cluster and trying to open a notebook to run. How to overcome this error ? I have sent an email to databric...

Hi,I am getting an error while creating a cluster and trying to open a notebook to run. How to overcome this error ? I have sent an email to databricks support but not received any response till now. Please help and guide.

databricks error in community edition
  • 655 Views
  • 0 replies
  • 0 kudos
explore
by New Contributor
  • 1350 Views
  • 0 replies
  • 0 kudos

Hi, Can we connect to the Teradata vantage installed in a vm via the community notebook. I am working on a POC to fetch data from Teradata vantate (just a teradata as it uses the jdbc) and process it in community notebook. Downloaded the terajdbc4.jar

from pyspark.sql import SparkSessionspark = SparkSession.builder.getOrCreate()def load_data(driver, jdbc_url, sql, user, password):  return spark.read \    .format('jdbc') \    .option('driver', driver) \    .option('url', jdbc_url) \    .option('dbt...

  • 1350 Views
  • 0 replies
  • 0 kudos
youngchef
by New Contributor
  • 1796 Views
  • 3 replies
  • 3 kudos

Resolved! AWS Instance Profiles and DLT Pipelines

Hey everyone! I'm building a DLT pipeline that reads files from S3 (or tries to) and then writes them into different directories in my s3 bucket. The problem is I usually access S3 with an instance profile attached to a cluster, but DLT does not give...

  • 1796 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

{ "clusters": [ { "label": "default", "aws_attributes": { "instance_profile_arn": "arn:aws:..." } }, { "label": "maintenance", "aws_attributes": { "instance_profile_arn": "arn:aws:..." ...

  • 3 kudos
2 More Replies
ricard98
by New Contributor II
  • 3839 Views
  • 3 replies
  • 5 kudos

How do you connect a folder path from your desktop to DB notebook?

I have a folder with multiples excel files that contains information from different cost centers, these files get update every week , im trying to upload all these files to the DB notebook , is there a way to connect the path directly to the DBFS to...

  • 3839 Views
  • 3 replies
  • 5 kudos
Latest Reply
User16873043099
Contributor
  • 5 kudos

Hello, Thanks for your question.You can mount a cloud object storage to dbfs and use them in a notebook. Please refer here.It is not possible to mount a local folder from desktop to dbfs. But you should be able to use the Databricks CLI to copy the e...

  • 5 kudos
2 More Replies
StephanieAlba
by Valued Contributor III
  • 1920 Views
  • 3 replies
  • 6 kudos
  • 1920 Views
  • 3 replies
  • 6 kudos
Latest Reply
jose_gonzalez
Moderator
  • 6 kudos

Hi @Stephanie Rivera​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 6 kudos
2 More Replies
Karl
by New Contributor II
  • 15923 Views
  • 2 replies
  • 3 kudos

PySpark column object not callable using "when otherwise" transformation

The very first "when" function results in the posted error message (see image). The print statement of the count of df_td_amm works. A printSchema of the "df_td_amm" data frame confirms that "AGE" is a column. A select statement is also successful, s...

Error
  • 15923 Views
  • 2 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

the syntax is when(....).otherwise(...), not other(...)And there are some backslashes missing.

  • 3 kudos
1 More Replies
antoniodavideca
by New Contributor III
  • 2927 Views
  • 5 replies
  • 1 kudos

Resolved! Jobs REST Api - Run a Job that is connected to a git_source

On Jobs REST API is possible to create a new Job, specifying a git_source.My question is about triggering the job.Still on Jobs REST Api is possible to trigger a job using the job_id, but I don't find a way to tell anyhow to Databricks, what's the en...

  • 2927 Views
  • 5 replies
  • 1 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 1 kudos

Ah. Got it. So is your issue resolved or are you looking for further information.

  • 1 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels