Data Engineering

Forum Posts

Sorted by:

by BradSheridan • Valued Contributor

07-27-2022 6:13:27 AM

1962 Views
9 replies
4 kudos

Resolved! How to use cloudFiles to completely overwrite the target

Hey there Community!! I have a client that will produce a CSV file daily that needs to be moved from Bronze -> Silver. Unfortunately, this source file will always be a full set of data....not incremental. I was thinking of using AutoLoader/cloudFil...

Data Engineering

1962 Views
9 replies
4 kudos

07-27-2022 6:13:27 AM

View Replies

Latest Reply

BradSheridan
Valued Contributor

08-12-2022 10:44:42 AM

4 kudos

I "up voted'" all of @werners suggestions b/c they are all very valid ways of addressing my need (the true power/flexibility of the Databricks UDAP!!!). However, turns out I'm going to end up getting incremental data afterall :). So now the flow wi...

4 kudos

08-12-2022 10:44:42 AM

8 More Replies

by Deepak_Goldwyn • New Contributor III

08-18-2022 5:02:28 AM

454 Views
0 replies
0 kudos

Pass parameter value from Job to DLT pipeline

We are investigating how to pass parameter from Databricks Job to DLT pipeline. Our process orchestrator is Azure Data Factory from where we trigger the Databricks Job using Jobs API. As part of the 'run-now' request, we would like to pass a paramete...

Data Engineering

454 Views
0 replies
0 kudos

08-18-2022 5:02:28 AM

by BkP • Contributor

08-18-2022 4:49:27 AM

448 Views
0 replies
0 kudos

Hi, I am getting an error while creating a cluster and trying to open a notebook to run. How to overcome this error ? I have sent an email to databric...

Hi,I am getting an error while creating a cluster and trying to open a notebook to run. How to overcome this error ? I have sent an email to databricks support but not received any response till now. Please help and guide.

Data Engineering

448 Views
0 replies
0 kudos

08-18-2022 4:49:27 AM

by explore • New Contributor

08-18-2022 1:38:18 AM

903 Views
0 replies
0 kudos

Hi, Can we connect to the Teradata vantage installed in a vm via the community notebook. I am working on a POC to fetch data from Teradata vantate (just a teradata as it uses the jdbc) and process it in community notebook. Downloaded the terajdbc4.jar

from pyspark.sql import SparkSessionspark = SparkSession.builder.getOrCreate()def load_data(driver, jdbc_url, sql, user, password): return spark.read \ .format('jdbc') \ .option('driver', driver) \ .option('url', jdbc_url) \ .option('dbt...

Data Engineering

903 Views
0 replies
0 kudos

08-18-2022 1:38:18 AM

by chandan_a_v • Valued Contributor

08-18-2022 1:25:35 AM

918 Views
1 replies
1 kudos

Can't import local files under repo

I have a yaml file inside one of the sub dir in Databricks, I have appended the repo path to sys. Still I can't access this file. https://docs.databricks.com/_static/notebooks/files-in-repos.html

Data Engineering

918 Views
1 replies
1 kudos

08-18-2022 1:25:35 AM

View Replies

Latest Reply

chandan_a_v
Valued Contributor

08-18-2022 1:26:30 AM

1 kudos

@Kaniz Fatma ,Could you please help me out here?

1 kudos

08-18-2022 1:26:30 AM

by youngchef • New Contributor

08-01-2022 4:27:03 PM

1001 Views
3 replies
3 kudos

Resolved! AWS Instance Profiles and DLT Pipelines

Hey everyone! I'm building a DLT pipeline that reads files from S3 (or tries to) and then writes them into different directories in my s3 bucket. The problem is I usually access S3 with an instance profile attached to a cluster, but DLT does not give...

Data Engineering

1001 Views
3 replies
3 kudos

08-01-2022 4:27:03 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

08-02-2022 6:49:52 AM

3 kudos

{ "clusters": [ { "label": "default", "aws_attributes": { "instance_profile_arn": "arn:aws:..." } }, { "label": "maintenance", "aws_attributes": { "instance_profile_arn": "arn:aws:..." ...

3 kudos

08-02-2022 6:49:52 AM

2 More Replies

by ricard98 • New Contributor II

07-28-2022 8:13:05 AM

2497 Views
3 replies
5 kudos

How do you connect a folder path from your desktop to DB notebook?

I have a folder with multiples excel files that contains information from different cost centers, these files get update every week , im trying to upload all these files to the DB notebook , is there a way to connect the path directly to the DBFS to...

Data Engineering

2497 Views
3 replies
5 kudos

07-28-2022 8:13:05 AM

View Replies

Latest Reply

User16873043099
Contributor

07-28-2022 7:43:17 PM

5 kudos

Hello, Thanks for your question.You can mount a cloud object storage to dbfs and use them in a notebook. Please refer here.It is not possible to mount a local folder from desktop to dbfs. But you should be able to use the Databricks CLI to copy the e...

5 kudos

07-28-2022 7:43:17 PM

2 More Replies

by gazzyjuruj • Contributor II

07-27-2022 10:44:15 PM

5900 Views
4 replies
9 kudos

Cluster start is currently disabled ?

Hi, i'm trying to run the notebooks but it doesn't do any activity.I had to create a cluster in order to start my code.pressing the play button inside of notebook does nothing at all.and the 'compute' , pressing play there on the clusters gives the e...

Data Engineering

5900 Views
4 replies
9 kudos

07-27-2022 10:44:15 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

08-17-2022 2:31:58 PM

9 kudos

Hi @Ghazanfar Uruj,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

9 kudos

08-17-2022 2:31:58 PM

3 More Replies

by StephanieRivera • Valued Contributor II

07-27-2022 10:32:53 AM

1065 Views
3 replies
6 kudos

Resolved! Does Delta format create a full copy of the data for each change I make to a table?

Data Engineering

1065 Views
3 replies
6 kudos

07-27-2022 10:32:53 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

08-17-2022 2:24:08 PM

6 kudos

Hi @Stephanie Rivera,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

6 kudos

08-17-2022 2:24:08 PM

2 More Replies

by Karl • New Contributor II

07-26-2022 1:46:34 PM

13216 Views
2 replies
3 kudos

PySpark column object not callable using "when otherwise" transformation

The very first "when" function results in the posted error message (see image). The print statement of the count of df_td_amm works. A printSchema of the "df_td_amm" data frame confirms that "AGE" is a column. A select statement is also successful, s...

Data Engineering

13216 Views
2 replies
3 kudos

07-26-2022 1:46:34 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

07-27-2022 4:22:12 AM

3 kudos

the syntax is when(....).otherwise(...), not other(...)And there are some backslashes missing.

3 kudos

07-27-2022 4:22:12 AM

1 More Replies

by antoniodavideca • New Contributor III

08-16-2022 12:15:41 AM

1685 Views
5 replies
1 kudos

Resolved! Jobs REST Api - Run a Job that is connected to a git_source

On Jobs REST API is possible to create a new Job, specifying a git_source.My question is about triggering the job.Still on Jobs REST Api is possible to trigger a job using the job_id, but I don't find a way to tell anyhow to Databricks, what's the en...

Data Engineering

1685 Views
5 replies
1 kudos

08-16-2022 12:15:41 AM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

08-17-2022 12:30:05 AM

1 kudos

Ah. Got it. So is your issue resolved or are you looking for further information.

1 kudos

08-17-2022 12:30:05 AM

4 More Replies

by Gabriel0007 • New Contributor III

08-17-2022 7:53:22 AM

1518 Views
0 replies
3 kudos

How to save json data to Delta Table: ParseError on Insert

I'm trying to save the returned json data from a requests API call to a Delta Table. I get a ParseError when I INSERT the response object which is in json format. The error shows the json data and a marker that states a ' or } or ) is missing. I v...

Data Engineering

1518 Views
0 replies
3 kudos

08-17-2022 7:53:22 AM

by Kit • New Contributor III

08-11-2022 6:31:24 PM

2296 Views
7 replies
1 kudos

Resolved! Can't run a job that use GitHub as source

I have a list of jobs that are using the code in GitHub as source.Everything worked fine until yesterday. Yesterday, I noticed that all the job that were using GitHub as source were failing. Because of the following error: ``` Run result unavailable:...

Data Engineering

2296 Views
7 replies
1 kudos

08-11-2022 6:31:24 PM

View Replies

Latest Reply

User16766737456
New Contributor III

08-17-2022 12:39:55 AM

1 kudos

Just an update, to round this out. We investigated further internally, and found that although we have a cleanup process in place to remove the internal repos that are being checked out for workflows, it was failing to catch up due to the sheer volum...

1 kudos

08-17-2022 12:39:55 AM

6 More Replies

by antoniodavideca • New Contributor III

08-16-2022 12:13:54 AM

1270 Views
2 replies
0 kudos

Jobs REST Api - Create new Job with a new Cluster, and install a Maven Library on the Cluster

I would need to use the Job REST API to create a Job on our databrick Cluster.At the Job Creation, is possible to specify an existing cluster, or, create a new one.I can forward alot of information to the Cluster, but what I would like to specify is ...

Data Engineering

1270 Views
2 replies
0 kudos

08-16-2022 12:13:54 AM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

08-16-2022 2:14:56 AM

0 kudos

@Antonio Davide Cali You can use the existing cluster in your json to use it for the job.To update or push libraries to the job, you can use the JobsUpdate API. As you want to push libraries to the cluster, you can push them using the new setting an...

0 kudos

08-16-2022 2:14:56 AM

1 More Replies

by Lazloo • New Contributor III

08-16-2022 10:55:45 PM

629 Views
0 replies
2 kudos

Cannot load spark-avro jars with databricksversion 10.4

Currently, I am facing an issue since the `databricks-connect` runtime on our cluster was updated to 10.4. Since then, I cannot load the jars for spark-avro anymore. By Running the following code from pyspark.sql import SparkSession spark = SparkSe...

Data Engineering

629 Views
0 replies
2 kudos

08-16-2022 10:55:45 PM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! How to use cloudFiles to completely overwrite the target

Pass parameter value from Job to DLT pipeline

Hi, I am getting an error while creating a cluster and trying to open a notebook to run. How to overcome this error ? I have sent an email to databric...

Hi, Can we connect to the Teradata vantage installed in a vm via the community notebook. I am working on a POC to fetch data from Teradata vantate (just a teradata as it uses the jdbc) and process it in community notebook. Downloaded the terajdbc4.jar

Can't import local files under repo

Resolved! AWS Instance Profiles and DLT Pipelines

How do you connect a folder path from your desktop to DB notebook?

Cluster start is currently disabled ?

Resolved! Does Delta format create a full copy of the data for each change I make to a table?

PySpark column object not callable using "when otherwise" transformation

Resolved! Jobs REST Api - Run a Job that is connected to a git_source

How to save json data to Delta Table: ParseError on Insert

Resolved! Can't run a job that use GitHub as source

Jobs REST Api - Create new Job with a new Cluster, and install a Maven Library on the Cluster

Cannot load spark-avro jars with databricksversion 10.4

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...