Data Engineering

Forum Posts

Sorted by:

by noimeta • Contributor III

09-09-2022 3:42:43 AM

1625 Views
0 replies
0 kudos

How to use Terraform to add Git provider credentials to a workspace in order to use service principal for CI/CD

Hi,I'm very new to Terraform. Currently, I'm trying to automate the service principal setup process using Terraform.Following this example, I successfully created a service principal and an access token. However, when I tried adding databricks_git_cr...

Data Engineering

1625 Views
0 replies
0 kudos

09-09-2022 3:42:43 AM

by jakubk • Contributor

09-07-2022 9:52:25 PM

3843 Views
2 replies
0 kudos

spark.read.parquet() - how to check for file lock before reading? (azure)

I have some python code which takes parquet files from an adlsv2 location and merges it into delta tables (run as a workflow job on a schedule)I have a try catch wrapper around this so that any files that fail get moved into a failed folder using dbu...

Data Engineering

3843 Views
2 replies
0 kudos

09-07-2022 9:52:25 PM

View Replies

Latest Reply

jakubk
Contributor

09-08-2022 7:33:57 PM

0 kudos

That's the problem - it's not being locked (or fs.mv() isn't checking/honoring the lock). The upload process/tool is a 3rd-prty external toolI can see via the upload tool that the file upload is 'in progress'I can also see the 0 byte destination file...

0 kudos

09-08-2022 7:33:57 PM

1 More Replies

by VaDim • New Contributor III

09-08-2022 2:33:05 AM

7117 Views
8 replies
3 kudos

Resolved! Are MERGE INTO inserts supported when the delta table has an identity column ?

I can't seem to make it work as I keep getting:DeltaInvariantViolationException: NOT NULL constraint violated for column: dl_id.

Data Engineering

7117 Views
8 replies
3 kudos

09-08-2022 2:33:05 AM

View Replies

Latest Reply

byrdman
New Contributor III

09-08-2022 6:35:39 PM

3 kudos

if you are using 'delta.columnMapping.mode' = 'name' on your table i could not get it to work, without that line .. for the not matched .. WHEN NOT MATCHED THEN INSERT (columnname,columnName2) values(columnname,columnName2)WHEN MATCHED Then UPDAT...

3 kudos

09-08-2022 6:35:39 PM

7 More Replies

by anders_poirel • New Contributor II

09-08-2022 3:00:14 PM

1308 Views
0 replies
2 kudos

Moving Notebook Cell causes browser to run out of memory

Platform:AWS Databricksenabled "Turn on the new, updated code editor" in Notebook SettingsMacOS 12.5.1Firefox 104.0.2When I attempt to drag a notebook cell to move it, the tab crashes and causes my computer to run out of memory. I profiled the tab to...

Data Engineering

1308 Views
0 replies
2 kudos

09-08-2022 3:00:14 PM

by Taha_Hussain • Databricks Employee

09-08-2022 11:59:46 AM

1654 Views
1 replies
3 kudos

Register for Databricks Office HoursSeptember 14: 8:00 - 9:00 AM PT | 3:00pm - 4:00pm GMTSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMT Dat...

Register for Databricks Office HoursSeptember 14: 8:00 - 9:00 AM PT | 3:00pm - 4:00pm GMTSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMTDatabricks Office Hours connects you directly with experts to answer your Databricks questions.Join us t...

Data Engineering

1654 Views
1 replies
3 kudos

09-08-2022 11:59:46 AM

View Replies

Latest Reply

Taha_Hussain
Databricks Employee

09-08-2022 12:11:21 PM

3 kudos

Check out some of the questions from fellow users during our last Office Hours. All these questions were answered live by a Databricks expert!Q: What's the best way of using a UDF in a class?A: You need to define your class and then register the func...

3 kudos

09-08-2022 12:11:21 PM

by osoucy • New Contributor II

09-08-2022 10:10:56 AM

1229 Views
0 replies
1 kudos

Is it possible to join two aggregated streams of data?

ObjectiveWithin the context of a delta live table, I'm trying to merge two streams aggregation, but run into challenges. Is it possible to achieve such a join?ContextAssume- table trades stores a list of trades with their associated time stamps- tabl...

Data Engineering

1229 Views
0 replies
1 kudos

09-08-2022 10:10:56 AM

by Aran_Oribu • New Contributor II

09-08-2022 3:43:52 AM

5013 Views
5 replies
2 kudos

Resolved! Create and update a csv/json file in ADLSG2 with Eventhub in Databricks streaming

Hello ,This is my first post here and I am a total beginner with DataBricks and spark.Working on an IoT Cloud project with azure , I'm looking to set up a continuous stream processing of data.A current architecture already exists thanks to Stream Ana...

Data Engineering

5013 Views
5 replies
2 kudos

09-08-2022 3:43:52 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-08-2022 3:48:23 AM

2 kudos

So the event hub creates files (json/csv) on adls.You can read those files into databricks with the spark.read.csv/json method. If you want to read many files in one go, you can use wildcards.f.e. spark.read.json("/mnt/datalake/bronze/directory/*/*...

2 kudos

09-08-2022 3:48:23 AM

4 More Replies

by jacob1 • New Contributor II

08-09-2022 10:10:27 AM

1209 Views
1 replies
1 kudos

I passed my DE associate exam, but unable to see/download my certificate on credentials.databricks.com. Can someone help download the certificate - this is time sensitive

I passed my DE associate exam, but unable to see/download my certificate on credentials.databricks.com. I am using the same email as the one on Kryterion on webassessor.com/databricks.I can log invto Kryterion and see that I have passed the exam

Data Engineering

1209 Views
1 replies
1 kudos

08-09-2022 10:10:27 AM

View Replies

Latest Reply

Vidula
Honored Contributor

09-08-2022 3:33:29 AM

1 kudos

Hi @jacob stallone Thank you for reaching out!Let us look into this for you, and we will get back to you.

1 kudos

09-08-2022 3:33:29 AM

by PChan • New Contributor II

09-07-2022 10:29:16 PM

1112 Views
1 replies
0 kudos

www.googleapis.com

It happens after databricks deleted my cluster{ "protoPayload": { "@type": "type.googleapis.com/google.cloud.audit.AuditLog", "status": {}, "serviceName": "container.googleapis.com", "methodName": "google.container.v1.ClusterMa...

Data Engineering

1112 Views
1 replies
0 kudos

09-07-2022 10:29:16 PM

View Replies

Latest Reply

PChan
New Contributor II

09-07-2022 10:33:09 PM

0 kudos

attached the error log.

0 kudos

09-07-2022 10:33:09 PM

by Anonymous • Not applicable

09-07-2022 12:47:31 PM

1538 Views
1 replies
5 kudos

www.linkedin.com

September 2022 Featured Member Interview Aman Sehgal - @AmanSehgal Pronouns: He, Him Company: CyberCXJob Title: Senior Data Engineer Could you give a brief description of your professional journey to date? A. I started my career as software develope...

Data Engineering

1538 Views
1 replies
5 kudos

09-07-2022 12:47:31 PM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

09-07-2022 8:46:33 PM

5 kudos

Thank you @Lindsay Olson and @Christy Seto for interviewing me and nominating me as this months featured member. It's a pleasure to be member of Databricks community and I'm looking forward to contribute more in future.To all the community members...

5 kudos

09-07-2022 8:46:33 PM

by Vickyster • New Contributor II

09-07-2022 8:33:36 PM

1461 Views
0 replies
0 kudos

Column partitioning is not working in delta live table when `columnMapping` table property is enabled.

I'm trying to create delta live table on top of json files placed in azure blob. The json files contains white spaces in column names instead of renaming I tried `columnMapping` table property which let me create the table with spaces but the column ...

Data Engineering

1461 Views
0 replies
0 kudos

09-07-2022 8:33:36 PM

by bblakey • New Contributor II

08-23-2022 2:19:03 PM

2351 Views
1 replies
1 kudos

Recommendations for loading table from two different folder paths using Autoloader and DLT

I have a new (bronze) table that I want to write to - the initial table load (refresh) csv file is placed in folder a, the incremental changes (inserts/updates/deletes) csv files are placed in folder b. I've written a notebook that can load one OR t...

Data Engineering

2351 Views
1 replies
1 kudos

08-23-2022 2:19:03 PM

View Replies

by akdm • Contributor

09-02-2022 8:20:57 AM

2946 Views
3 replies
1 kudos

Resolved! FileNotFoundError when using sftp to write to disk within jobs

When I try to convert a notebook into a job I frequently run into an issue with writing to the local filesystem. For this particular example, I did all my notebook testing with a bytestream for small files. When I tried to run as a job, I used the me...

Data Engineering

2946 Views
3 replies
1 kudos

09-02-2022 8:20:57 AM

View Replies

Latest Reply

akdm
Contributor

09-07-2022 9:10:42 AM

1 kudos

I was able to fix it. It was an issue with the nested files on the SFTP. I had to ensure that the parent folders were being created as well. Splitting out the local path and file made it easier to ensure that it existed with os.path.exists() and os.m...

1 kudos

09-07-2022 9:10:42 AM

2 More Replies

by Pritesh1 • New Contributor II

08-04-2022 12:19:08 PM

3892 Views
3 replies
0 kudos

Resolved! Ganglia UI not showing visuals

Hello, I am trying to use Metrics and Ganglia UI to monitor the state of my clusters better. But, I am seeing that the visuals are not coming up. I have tried opening on Chrome and microsoft edge, it shows same. Is there something that I need to inst...

Data Engineering

3892 Views
3 replies
0 kudos

08-04-2022 12:19:08 PM

View Replies

Latest Reply

Pritesh1
New Contributor II

09-07-2022 7:48:58 AM

0 kudos

I dont exactly know what was the issue. But, it seems to be related to some kind of network security. Apparently, my IT team had set up a separate vm and making the changes for that specific vm to be able to use Ganglia from there. I end up RDP into ...

0 kudos

09-07-2022 7:48:58 AM

2 More Replies

by Sparks • New Contributor III

08-08-2022 8:14:33 PM

4167 Views
4 replies
1 kudos

Resolved! Delta Live Table - How to pass OPTION "ignoreChanges" using SQL?

I am running a Delta Live Pipeline that explodes JSON docs into small Delta Live Tables. The docs can receive multiple updates over the lifecycle of the transaction. I am curating the data via medallion architecture, when I run an API /update with {"...

Data Engineering

4167 Views
4 replies
1 kudos

08-08-2022 8:14:33 PM

View Replies

Latest Reply

Vidula
Honored Contributor

09-07-2022 5:58:52 AM

1 kudos

Hey there @Danny Aguirre Does @Prabakar Ammeappin response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

1 kudos

09-07-2022 5:58:52 AM

3 More Replies

User

Count

1611

768

345

286

252

Databricks Community

Forum Posts

How to use Terraform to add Git provider credentials to a workspace in order to use service principal for CI/CD

spark.read.parquet() - how to check for file lock before reading? (azure)

Resolved! Are MERGE INTO inserts supported when the delta table has an identity column ?

Moving Notebook Cell causes browser to run out of memory

Register for Databricks Office HoursSeptember 14: 8:00 - 9:00 AM PT | 3:00pm - 4:00pm GMTSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMT Dat...

Is it possible to join two aggregated streams of data?

Resolved! Create and update a csv/json file in ADLSG2 with Eventhub in Databricks streaming

I passed my DE associate exam, but unable to see/download my certificate on credentials.databricks.com. Can someone help download the certificate - this is time sensitive

www.googleapis.com

www.linkedin.com

Column partitioning is not working in delta live table when `columnMapping` table property is enabled.

Recommendations for loading table from two different folder paths using Autoloader and DLT

Resolved! FileNotFoundError when using sftp to write to disk within jobs

Resolved! Ganglia UI not showing visuals

Resolved! Delta Live Table - How to pass OPTION "ignoreChanges" using SQL?

Join Us as a Local Community Builder!

Issue while reading external iceberg table from GC...

What's the best way to get from Python dict > JSON...

[INTERNAL_ERROR] The Spark SQL phase analysis fail...

Unable to see All purpose compute

How to schedule workflow in python script