Data Engineering

Forum Posts

Sorted by:

by -werners- • Esteemed Contributor III

12-09-2021 3:27:53 AM

4282 Views
3 replies
14 kudos

Notebook fails in job but not in interactive mode

I have this notebook which is scheduled by Data Factory on a daily basis.It works fine, up to today. All of a sudden I keep on getting NullpointerException when writing the data.After some searching online, I disabled AQE. But this does not help.Th...

Data Engineering

4282 Views
3 replies
14 kudos

12-09-2021 3:27:53 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-09-2021 7:53:30 AM

14 kudos

After some tests it seems that if I run the notebook on an interactive cluster, I only get 80% of load (Ganglia metrics).If I run the same notebook on a job cluster with the same VM types etc (so the only difference is interactive vs job), I get over...

14 kudos

12-09-2021 7:53:30 AM

2 More Replies

by pjp94 • Contributor

12-09-2021 5:58:51 AM

3262 Views
4 replies
9 kudos

Databrick Job - Notebook Execution

Question - When you set a reoccuring job to simply update a notebook, does databricks clear the state of the notebook prior to executing the notebook? If not, can I configure it to make sure it clears the state before running?

Data Engineering

3262 Views
4 replies
9 kudos

12-09-2021 5:58:51 AM

View Replies

Latest Reply

Anonymous
Not applicable

12-09-2021 8:27:36 AM

9 kudos

@Paras Patel - Would you be happy to mark Hubert's answer as best so that other members can find the solution more easily?Thanks!

9 kudos

12-09-2021 8:27:36 AM

3 More Replies

by morganmazouchi • Databricks Employee

11-28-2021 11:31:48 AM

9971 Views
7 replies
2 kudos

Resolved! Incremental updates in Delta Live Tables

What happens if we change the logic for the delta live tables and we do an incremental update. Does the table get reset (refresh) automatically or would it only apply the logic to new incoming data? would we have to trigger a reset in this case?

Data Engineering

9971 Views
7 replies
2 kudos

11-28-2021 11:31:48 AM

View Replies

Latest Reply

morganmazouchi
Databricks Employee

11-29-2021 7:47:26 AM

2 kudos

Here is my finding on when to refresh (reset) the table: If it is a complete table all the changes would be apply automatically. If the table is incremental table, you need to do a manually reset (full refresh).

2 kudos

11-29-2021 7:47:26 AM

6 More Replies

by Kody_Devl • New Contributor II

12-11-2021 6:15:32 AM

7891 Views
3 replies
2 kudos

%SQL Append null values into a SQL Table

Hi All, I am new to Databricks and am writing my first program.Note: Code Shown Below:I am creating a table with 3 columns to store data. 2 of the columns will be appended in from data that I have in another table.When I run my append query into the...

Data Engineering

7891 Views
3 replies
2 kudos

12-11-2021 6:15:32 AM

View Replies

Latest Reply

Kody_Devl
New Contributor II

12-12-2021 2:52:11 AM

2 kudos

Hi Hubert,Your answer moves me closer to being able to update pieces of a 26 field MMR_Restated table in pieces are the correct fields values are calculated Thru the process. I have been looking for a way to be able to update in "pieces"...... 2 fie...

2 kudos

12-12-2021 2:52:11 AM

2 More Replies

by RiyazAliM • Honored Contributor

11-23-2021 10:31:46 PM

16859 Views
7 replies
4 kudos

Issue while trying to read a text file in databricks using Local File API's instead of Spark API.

I'm trying to read a small txt file which is added as a table to the default db on Databricks. While trying to read the file via Local File API, I get a `FileNotFoundError`, but I'm able to read the same file as Spark RDD using SparkContext.Please fi...

Data Engineering

16859 Views
7 replies
4 kudos

11-23-2021 10:31:46 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-23-2021 11:49:16 PM

4 kudos

can you try with /dbfs/Filestore/tables/boringwords.txt?

4 kudos

11-23-2021 11:49:16 PM

6 More Replies

by ak09 • New Contributor

12-11-2021 6:40:05 AM

1396 Views
0 replies
0 kudos

Triggering Notebook in Azure Repos via Azure DevOps

I have been using Databricks workspace for all my data science projects in my firm. In my current project, I have built a CI pipeline using databricks-cli & Azure DevOps. Using databricks-cli I can trigger the Notebook which is present in my workspa...

Data Engineering

1396 Views
0 replies
0 kudos

12-11-2021 6:40:05 AM

by tarente • New Contributor III

11-22-2021 3:15:54 AM

5289 Views
3 replies
3 kudos

Partitioned parquet table (folder) with different structure

Hi,We have a parquet table (folder) in Azure Storage Account.The table is partitioned by column PeriodId (represents a day in the format YYYYMMDD) and has data from 20181001 until 20211121 (yesterday).We have a new development that adds a new column ...

Data Engineering

5289 Views
3 replies
3 kudos

11-22-2021 3:15:54 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

11-22-2021 3:34:50 AM

3 kudos

I think problem is in overwrite as when you overwrite it overwrites all folders. Solution is to mix append with dynamic overwrite so it will overwrite only folders which have data and doesn't affect old partitions:spark.conf.set("spark.sql.sources.pa...

3 kudos

11-22-2021 3:34:50 AM

2 More Replies

by Khaled • New Contributor III

12-08-2021 11:58:42 AM

5417 Views
4 replies
2 kudos

Uploading CSV to Databricks community edition

When I upload a csv file of size 1 GB from my PC the in the upload place, it is uploading untill the file reach some point and disappear for example it reach 600 MB and disappear from that place

Data Engineering

5417 Views
4 replies
2 kudos

12-08-2021 11:58:42 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

12-10-2021 3:17:20 PM

2 kudos

Hi @Khaled ALZHARANI ,I would also recommend to split up your CSV files into smaller files.

2 kudos

12-10-2021 3:17:20 PM

3 More Replies

by tap • New Contributor III

12-06-2021 7:21:28 AM

13514 Views
8 replies
10 kudos

Could Not Connect to ADLS Gen2 Using ABFSS

I'm new to Databricks, not sure what can I do about this issue. I run a simple comment to list all file paths but get SSLHandshakeException.Is there any way to resolve this? The full error messageExecutionError Traceback (most recent ca...

Data Engineering

13514 Views
8 replies
10 kudos

12-06-2021 7:21:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

12-10-2021 12:02:09 PM

10 kudos

@suet pooi tan - Thank you for letting us know.

10 kudos

12-10-2021 12:02:09 PM

7 More Replies

by pantelis_mare • Contributor III

11-04-2021 3:23:46 PM

8606 Views
6 replies
1 kudos

Delta merge file size control

Hello community!I have a rather weird issue where a delta merge is writing very big files (~1GB) that slow down my pipeline. Here is some context:I have a dataframe containg updates for several dates in the past. Current and last day contain the vast...

Data Engineering

8606 Views
6 replies
1 kudos

11-04-2021 3:23:46 PM

View Replies

Latest Reply

pantelis_mare
Contributor III

12-09-2021 10:52:25 AM

1 kudos

Hello Jose,I just went with splitting the merge in 2 so I have a merge that touches many partitions but few rows per file and a second that touches 2-3 partitions but contain the build of the data.

1 kudos

12-09-2021 10:52:25 AM

5 More Replies

by -werners- • Esteemed Contributor III

12-03-2021 1:43:29 AM

3551 Views
5 replies
22 kudos

Look what I just saw appearing in my notebook:a data histogram of your dataframe!

Data Engineering

3551 Views
5 replies
22 kudos

12-03-2021 1:43:29 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-08-2021 12:18:51 AM

22 kudos

you heard it first in here!https://databricks.com/blog/2021/12/07/introducing-data-profiles-in-the-databricks-notebook.html

22 kudos

12-08-2021 12:18:51 AM

4 More Replies

by Nilave • New Contributor III

12-08-2021 5:10:41 AM

6330 Views
2 replies
1 kudos

Resolved! Solution for API hosted on Databricks

I'm using Azure Databricks Python notebooks. We are preparing a front end to display the Databricks tables via API to query the tables. Is there a solution from Databricks to host callable APIs for querying its table and sending it as response to fro...

Data Engineering

6330 Views
2 replies
1 kudos

12-08-2021 5:10:41 AM

View Replies

Latest Reply

Nilave
New Contributor III

12-09-2021 12:28:17 AM

1 kudos

@Prabakar Ammeappin Thanks for the linkAlso was wondering for web page front end will it be more effective to query from SQL Database or from Azure Databricks tables. If from Azure SQL database, is there any efficient way to sync the tables from Az...

1 kudos

12-09-2021 12:28:17 AM

1 More Replies

by SailajaB • Databricks Partner

12-07-2021 9:50:05 PM

2979 Views
4 replies
4 kudos

facing format issue while converting one type nested json to other brand new json schema

Hi,We are writing our flatten json dataframe to user defined nested schema json using pysprk in Databricks.But we are not getting the expected formatExpecting : {"ID":"aaa",c_id":[{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"1"},...

Data Engineering

2979 Views
4 replies
4 kudos

12-07-2021 9:50:05 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

12-08-2021 2:24:24 AM

4 kudos

as @wereners said you need to share the code. If it is dataframe to json probably you need to use StructType - Array to get that list but without code is hard to help.

4 kudos

12-08-2021 2:24:24 AM

3 More Replies

by JD2 • Contributor

12-02-2021 2:28:08 PM

7203 Views
4 replies
4 kudos

Resolved! Databricks Delta Table

Hello:I am new to databricks and need little help on Delta Table creation.I am having great difficulty to understand creating of delta table and they are:-Do I need to create S3 bucket for Delta Table? If YES then do I have to mount on the mountpoint...

Data Engineering

7203 Views
4 replies
4 kudos

12-02-2021 2:28:08 PM

View Replies

Latest Reply

mathan_pillai
Databricks Employee

12-08-2021 4:37:22 PM

4 kudos

Hi Jay,I would suggest to start with creating managed delta table. please run a simple commandCREATE TABLE events(id long) USING DELTAThis will create a managed delta table called "events"Then perform %sql describe extended eventsThe above command ...

4 kudos

12-08-2021 4:37:22 PM

3 More Replies

by Siddhesh2525 • New Contributor III

12-07-2021 2:57:58 AM

11264 Views
4 replies
4 kudos

how to set retry attempt and how to set email alert with error message of databricks notebook

how to set retry attempt in the data bricks notebook in term of like if any cmd /cell get fails that times that particular cmd/cell should be rerun for purpose of connection issue etc.

Data Engineering

11264 Views
4 replies
4 kudos

12-07-2021 2:57:58 AM

View Replies

Latest Reply

Siddhesh2525
New Contributor III

12-07-2021 3:26:57 AM

4 kudos

"you can just implement try/except in cell, handling it by using dbutils.notebook.exit(jobId) and using other dbutils can help,@HubertDudek As i am fresher in the databricks ,Could you please suggest /explain me in detail

4 kudos

12-07-2021 3:26:57 AM

3 More Replies

Databricks Community

Forum Posts

Notebook fails in job but not in interactive mode

Databrick Job - Notebook Execution

Resolved! Incremental updates in Delta Live Tables

%SQL Append null values into a SQL Table

Issue while trying to read a text file in databricks using Local File API's instead of Spark API.

Triggering Notebook in Azure Repos via Azure DevOps

Partitioned parquet table (folder) with different structure

Uploading CSV to Databricks community edition

Could Not Connect to ADLS Gen2 Using ABFSS

Delta merge file size control

Look what I just saw appearing in my notebook:a data histogram of your dataframe!

Resolved! Solution for API hosted on Databricks

facing format issue while converting one type nested json to other brand new json schema

Resolved! Databricks Delta Table

how to set retry attempt and how to set email alert with error message of databricks notebook

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template

Use .R file in data pipeline