Data Engineering

Forum Posts

Sorted by:

by maranBH • New Contributor III

10-19-2021 1:41:18 PM

19420 Views
5 replies
12 kudos

Resolved! How to import a function to another notebook using Repos without %run?

Hi all,I was reading the Repos documentation: https://docs.databricks.com/repos.html#migrate-from-run-commandsIt is explained that, one advantage of Repos is no longer necessary to use %run magic command to make funcions available in one notebook to ...

Data Engineering

19420 Views
5 replies
12 kudos

10-19-2021 1:41:18 PM

View Replies

Latest Reply

maranBH
New Contributor III

10-22-2021 7:25:54 AM

12 kudos

Thank you all for your help! I tried all that was suggested; but I finally realized it was my fault in first place:I was testing Files in Repos with a runtime < 8.4.I was trying to import a file from a DB Notebook instead of a static .py file.Upgradi...

12 kudos

10-22-2021 7:25:54 AM

4 More Replies

by xiaozy • New Contributor

10-20-2021 3:47:53 PM

723 Views
2 replies
1 kudos

How to partition frame with a computed column in window function in Spark SQL?

Data Engineering

723 Views
2 replies
1 kudos

10-20-2021 3:47:53 PM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

10-21-2021 1:04:47 PM

1 kudos

Hi @xiaojun wang please check the blog and let us know if this helps you.https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

1 kudos

10-21-2021 1:04:47 PM

1 More Replies

by dbu_spark • New Contributor III

10-20-2021 8:47:54 AM

3739 Views
10 replies
6 kudos

Older Spark Version loaded into the spark notebook

I have databricks runtime for a job set to latest 10.0 Beta (includes Apache Spark 3.2.0, Scala 2.12) .In the notebook when I check for the spark version, I see version 3.1.0 instead of version 3.2.0I need the Spark version 3.2 to process workloads a...

Data Engineering

3739 Views
10 replies
6 kudos

10-20-2021 8:47:54 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-21-2021 9:47:45 AM

6 kudos

hi @Dhaivat Upadhyay ,Good news, DBR 10 was release yesterday October 20th. You can find more details in the release notes website

6 kudos

10-21-2021 9:47:45 AM

9 More Replies

by Daniel • New Contributor III

10-21-2021 8:31:30 AM

1036 Views
2 replies
6 kudos

Where can I enable notification for comments made on my notebook?

I'm not being communicated when a comment is made on a notebook, is it possible to enable?

Data Engineering

1036 Views
2 replies
6 kudos

10-21-2021 8:31:30 AM

View Replies

Latest Reply

Daniel
New Contributor III

10-21-2021 9:31:15 AM

6 kudos

Hi Prabakar,Thank you for your help.

6 kudos

10-21-2021 9:31:15 AM

1 More Replies

by D3nnisd • New Contributor III

10-20-2021 7:39:35 AM

8525 Views
15 replies
6 kudos

Resolved! BufferHolder Exceeded on Json flattening

On Databricks, we use the following code to flatten JSON in Python. The data is from a REST API:```df = spark.read.format("json").option("header", "true").option("multiline", "true").load(SourceFileFolder + sourcetable + "*.json")df2 = df.select(psf....

Data Engineering

8525 Views
15 replies
6 kudos

10-20-2021 7:39:35 AM

View Replies

Latest Reply

Dan_Z
Honored Contributor

10-20-2021 8:57:02 AM

6 kudos

@Dennis D , what's happening here is that more than 2 GB (2147483648 bytes) is being loaded into a single column value. This is a hard-limit for serialization. This KB article addresses it. The solution would be to find some way to have this loaded ...

6 kudos

10-20-2021 8:57:02 AM

14 More Replies

by Erik • Valued Contributor II

10-20-2021 6:24:41 AM

881 Views
4 replies
3 kudos

Feature request: It is possible to add comments to both databricks sql databases and tables. It would be really usefull if these comments could show u...

Feature request: It is possible to add comments to both databricks sql databases and tables. It would be really usefull if these comments could show up (if they are provided) in PowerBI when one connects to the Databricks SQL endpoint, e.g. in this w...

Data Engineering

881 Views
4 replies
3 kudos

10-20-2021 6:24:41 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-20-2021 6:47:25 AM

3 kudos

Nice idea!

3 kudos

10-20-2021 6:47:25 AM

3 More Replies

by tarente • New Contributor III

10-08-2021 10:04:23 AM

1537 Views
6 replies
5 kudos

Resolved! How to implement the where not exists pattern in scala?

I have a dataframe with the following columns:Key1Key2Y_N_ColCol1Col2For the key tuple (Key1, Key2), I have rows with Y_N_Col = "Y" and Y_N_Col = "N".I need a new dataframe with all rows with Y_N_Col = "Y" (regardless of the key tuple), plus all Y_N_...

Data Engineering

1537 Views
6 replies
5 kudos

10-08-2021 10:04:23 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-11-2021 3:21:51 AM

5 kudos

I'd use a left-anti join.So create a df with all the Y, then create a df with all the N and do a left_anti join (on key1 and key2) on the df with the Y.then a union of those two.

5 kudos

10-11-2021 3:21:51 AM

5 More Replies

by Programming_Sch • New Contributor

10-21-2021 12:56:09 AM

233 Views
0 replies
0 kudos

aws logo

What is the future of aws?The future of AWS is very promising. So, if you are thinking of a cloud career or want to switch your position to something related to the cloud, I would highly recommend you going for AWS training. No matter what field you ...

Data Engineering

233 Views
0 replies
0 kudos

10-21-2021 12:56:09 AM

by xiaozy • New Contributor

10-20-2021 3:48:39 PM

1461 Views
1 replies
0 kudos

How to reference a computed column directly in Spark SQL?

Data Engineering

1461 Views
1 replies
0 kudos

10-20-2021 3:48:39 PM

View Replies

Latest Reply

Kaniz
Community Manager

10-20-2021 9:04:25 PM

0 kudos

Hi @ xiaozy! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

0 kudos

10-20-2021 9:04:25 PM

by User16826992666 • Valued Contributor

06-15-2021 2:17:51 PM

621 Views
1 replies
0 kudos

If data from a Delta table is cached in Databricks SQL and the table is altered in the backend, does it invalidate the cache?

Basically I'm worried about the scenario where data that gets cached on Databricks SQL endpoints becomes out of sync with the source Delta table. If that were to happen and data was read from the cache it would be out of date/incorrect. Is this a con...

Data Engineering

621 Views
1 replies
0 kudos

06-15-2021 2:17:51 PM

View Replies

Latest Reply

mathan_pillai
Valued Contributor

10-20-2021 2:31:01 PM

0 kudos

There are 3 types of caching. 1-Databricks SQL UI caching, 2-Query results caching , 3-Delta caching . (1) does not get invalidated. It's like your BI dashboard. BI dashboard needs to be manually refreshed.(2) and (3) gets auto invalidation.pls check...

0 kudos

10-20-2021 2:31:01 PM

by nlee • New Contributor

10-04-2021 7:57:05 AM

1853 Views
1 replies
1 kudos

Resolved! How to create a temporary file with sql

what are the commands to create a temporary file with SQL

Data Engineering

1853 Views
1 replies
1 kudos

10-04-2021 7:57:05 AM

View Replies

Latest Reply

mathan_pillai
Valued Contributor

10-20-2021 10:20:07 AM

1 kudos

In Spark SQL, you could use commands like "insert overwrite directory" that indirectly creates a temporary file with the datahttps://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-dml-insert-overwrite-directory.html#example...

1 kudos

10-20-2021 10:20:07 AM

by Sumeet_Dora • New Contributor II

09-17-2021 4:56:14 AM

974 Views
2 replies
4 kudos

Resolved! Write mode features in Bigquey using Databricks notebook.

Currently using df.write.format("bigquery") ,Databricks only supports append and Overwrite modes in to writing Bigquery tables.Does Databricks has any option of executing the DMLs like Merge in to Bigquey using Databricks Notebooks.?

Data Engineering

974 Views
2 replies
4 kudos

09-17-2021 4:56:14 AM

View Replies

Latest Reply

mathan_pillai
Valued Contributor

10-20-2021 9:40:08 AM

4 kudos

@Sumeet Dora , Unfortunately there is no direct "merge into" option for writing to Bigquery using Databricks notebook. You could write to an intermediate delta table using the "merge into" option in delta table. Then read from the delta table and pe...

4 kudos

10-20-2021 9:40:08 AM

1 More Replies

by gbrueckl • Contributor II

09-10-2021 2:36:11 AM

3583 Views
10 replies
9 kudos

Slow performance of VACUUM on Azure Data Lake Store Gen2

We need to run VACCUM on one of our biggest tables to free the storage. According to our analysis using VACUUM bigtable DRY RUN this affects 30M+ files that need to be deleted.If we run the final VACUUM, the file-listing takes up to 2h (which is OK) ...

Data Engineering

3583 Views
10 replies
9 kudos

09-10-2021 2:36:11 AM

View Replies

Latest Reply

Deepak_Bhutada
Contributor III

10-20-2021 5:28:46 AM

9 kudos

@Gerhard Brueckl we have seen near 80k-120k file deletions in Azure per hour while doing a VACUUM on delta tables, it's just that the vacuum is slower in azure and S3. It might take some time as you said when deleting the files from the delta path. ...

9 kudos

10-20-2021 5:28:46 AM

9 More Replies

by Erik • Valued Contributor II

09-20-2021 5:46:56 AM

2545 Views
8 replies
2 kudos

Run more than nr-of-cores concurrent tasks.

We are using the terraform databricks provier, which is starting a cluster and checking every mount (since there is no mount rest API!). Each mount takes 20 seconds to check, and 99.9% of that time is idle waiting, and it starts a job per mount. If w...

Data Engineering

2545 Views
8 replies
2 kudos

09-20-2021 5:46:56 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-18-2021 3:00:33 PM

2 kudos

hi @Erik Parmann ,It is possible to do, but you might need to also enable dynamic allocation at the cluster level to be able to make sure your settings are apply at cluster creation . You can find more details here. As best practice, we do not recom...

2 kudos

10-18-2021 3:00:33 PM

7 More Replies

by Jon • New Contributor II

10-20-2021 1:20:41 AM

10009 Views
3 replies
5 kudos

How can I use custom python library in Azure Databricks?

I am trying to access functions in my coreapi.py by importing in the main notebook, but I have error ModuleNotFoundError: No module named 'coreapi'. I tried by uploading the file into the same folder and I tried creating a python egg and uploading it...

Data Engineering

10009 Views
3 replies
5 kudos

10-20-2021 1:20:41 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-20-2021 3:01:16 AM

5 kudos

There is also the possibility to use Repos file functionality:https://databricks.com/blog/2021/10/07/databricks-repos-is-now-generally-available.html

5 kudos

10-20-2021 3:01:16 AM

2 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! How to import a function to another notebook using Repos without %run?

How to partition frame with a computed column in window function in Spark SQL?

Older Spark Version loaded into the spark notebook

Where can I enable notification for comments made on my notebook?

Resolved! BufferHolder Exceeded on Json flattening

Feature request: It is possible to add comments to both databricks sql databases and tables. It would be really usefull if these comments could show u...

Resolved! How to implement the where not exists pattern in scala?

aws logo

How to reference a computed column directly in Spark SQL?

If data from a Delta table is cached in Databricks SQL and the table is altered in the backend, does it invalidate the cache?

Resolved! How to create a temporary file with sql

Resolved! Write mode features in Bigquey using Databricks notebook.

Slow performance of VACUUM on Azure Data Lake Store Gen2

Run more than nr-of-cores concurrent tasks.

How can I use custom python library in Azure Databricks?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...