Data Engineering

Forum Posts

Sorted by:

by ssm3819 • Databricks Partner

10-24-2021 9:52:33 PM

12290 Views
2 replies
3 kudos

Please let me know how i can install PyAudio using the Databricks notebook

Hi,i am trying to install the PyAudio package.but i am getting the following error. Collecting pyaudio Using cached PyAudio-0.2.11.tar.gz (37 kB)Building wheels for collected packages: pyaudio Building wheel for pyaudio (setup.py) ... error ERROR: Co...

Data Engineering

12290 Views
2 replies
3 kudos

10-24-2021 9:52:33 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-25-2021 1:08:49 AM

3 kudos

looks like missing dependencies on the server (linux): portaudioThis should be installed:https://stackoverflow.com/questions/48690984/portaudio-h-no-such-file-or-directory

3 kudos

10-25-2021 1:08:49 AM

1 More Replies

by NAS • New Contributor III

10-23-2021 9:31:49 AM

3774 Views
1 replies
0 kudos

Set tags for an MLFlow Experiment using Python?

There is this rest API: https://www.mlflow.org/docs/latest/rest-api.html#set-experiment-tagCan I do the same from python's MLFlow API?

Data Engineering

3774 Views
1 replies
0 kudos

10-23-2021 9:31:49 AM

View Replies

Latest Reply

NAS
New Contributor III

10-24-2021 1:28:19 PM

0 kudos

Someone answered first in StackOverflow. Here it is:from mlflow.tracking import MlflowClient # Create an experiment with a name that is unique and case sensitive. client = MlflowClient() experiment_id = client.create_experiment("Social NLP Experime...

0 kudos

10-24-2021 1:28:19 PM

by MadelynM • Databricks Employee

06-09-2021 12:32:09 PM

4254 Views
2 replies
4 kudos

Resolved! Why isn't my notebook search function working?

My search function is broken. I can't search for notebook contents.

Data Engineering

4254 Views
2 replies
4 kudos

06-09-2021 12:32:09 PM

View Replies

Latest Reply

lizou
Contributor III

10-24-2021 10:25:14 AM

4 kudos

Here is a tool availableelsevierlabs-os/NotebookDiscovery: Notebook Discovery Tool for Databricks notebooks (github.com)How to Catalog and Discover Your Databricks Notebooks Faster - The Databricks Blog

4 kudos

10-24-2021 10:25:14 AM

1 More Replies

by Jreco • Contributor

10-21-2021 2:50:15 PM

19320 Views
13 replies
3 kudos

Event hub streaming improve processing rate

Hi all,I'm working with event hubs and data bricks to process and enrich data in real-time.Doing a "simple" test, I'm getting some weird values (input rate vs processing rate) and I think I'm losing data:If you can see, there is a peak with 5k record...

Data Engineering

19320 Views
13 replies
3 kudos

10-21-2021 2:50:15 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-22-2021 3:24:24 PM

3 kudos

hi @Jhonatan Reyes ,How many Event hubs partitions are you readying from? your micro-batch takes a few milliseconds to complete, which I think is good time, but I would like to undertand better what are you trying to improve here.Also, in this case ...

3 kudos

10-22-2021 3:24:24 PM

12 More Replies

by BigJay • New Contributor II

10-14-2021 6:36:12 PM

8362 Views
5 replies
5 kudos

Capture num_affected_rows in notebooks

If I run some code, say for an ETL process to migrate data from bronze to silver storage, when a cell executes it reports num_affected_rows in a table format. I want to capture that and log it in my logger. Is it stored in a variable or syslogged som...

Data Engineering

8362 Views
5 replies
5 kudos

10-14-2021 6:36:12 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-19-2021 1:25:56 AM

5 kudos

Good one Dan! I never thought of using the delta api for this but there you go.

5 kudos

10-19-2021 1:25:56 AM

4 More Replies

by xiaozy • New Contributor

10-20-2021 3:47:53 PM

2093 Views
1 replies
1 kudos

How to partition frame with a computed column in window function in Spark SQL?

Data Engineering

2093 Views
1 replies
1 kudos

10-20-2021 3:47:53 PM

View Replies

Latest Reply

Prabakar
Databricks Employee

10-21-2021 1:04:47 PM

1 kudos

Hi @xiaojun wang please check the blog and let us know if this helps you.https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

1 kudos

10-21-2021 1:04:47 PM

by Frankooo • Databricks Partner

10-14-2021 11:54:53 AM

9907 Views
8 replies
7 kudos

How to optimize exporting dataframe to delta file?

Scenario : I have a dataframe that have 5 billion records/rows and 100+ columns. Is there a way to write this in a delta format efficiently. I have tried to export it but cancelled it after 2 hours (write didnt finish) as this processing time is not ...

Data Engineering

9907 Views
8 replies
7 kudos

10-14-2021 11:54:53 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-21-2021 10:18:47 AM

7 kudos

Hi @Franco Sia ,I will recommend to avoid to use the repartition(50), instead enable optimizes writes on your Delta table. You can find more details hereEnable optimized writes and auto compaction on your Delta table. Use AQE (docs here) to have eno...

7 kudos

10-21-2021 10:18:47 AM

7 More Replies

by dbu_spark • New Contributor III

10-20-2021 8:47:54 AM

11531 Views
10 replies
6 kudos

Older Spark Version loaded into the spark notebook

I have databricks runtime for a job set to latest 10.0 Beta (includes Apache Spark 3.2.0, Scala 2.12) .In the notebook when I check for the spark version, I see version 3.1.0 instead of version 3.2.0I need the Spark version 3.2 to process workloads a...

Data Engineering

11531 Views
10 replies
6 kudos

10-20-2021 8:47:54 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-21-2021 9:47:45 AM

6 kudos

hi @Dhaivat Upadhyay ,Good news, DBR 10 was release yesterday October 20th. You can find more details in the release notes website

6 kudos

10-21-2021 9:47:45 AM

9 More Replies

by Daniel • New Contributor III

10-21-2021 8:31:30 AM

2763 Views
2 replies
6 kudos

Where can I enable notification for comments made on my notebook?

I'm not being communicated when a comment is made on a notebook, is it possible to enable?

Data Engineering

2763 Views
2 replies
6 kudos

10-21-2021 8:31:30 AM

View Replies

Latest Reply

Daniel
New Contributor III

10-21-2021 9:31:15 AM

6 kudos

Hi Prabakar,Thank you for your help.

6 kudos

10-21-2021 9:31:15 AM

1 More Replies

by D3nnisd • New Contributor III

10-20-2021 7:39:35 AM

26382 Views
15 replies
6 kudos

Resolved! BufferHolder Exceeded on Json flattening

On Databricks, we use the following code to flatten JSON in Python. The data is from a REST API:```df = spark.read.format("json").option("header", "true").option("multiline", "true").load(SourceFileFolder + sourcetable + "*.json")df2 = df.select(psf....

Data Engineering

26382 Views
15 replies
6 kudos

10-20-2021 7:39:35 AM

View Replies

Latest Reply

Dan_Z
Databricks Employee

10-20-2021 8:57:02 AM

6 kudos

@Dennis D , what's happening here is that more than 2 GB (2147483648 bytes) is being loaded into a single column value. This is a hard-limit for serialization. This KB article addresses it. The solution would be to find some way to have this loaded ...

6 kudos

10-20-2021 8:57:02 AM

14 More Replies

by Erik • Valued Contributor III

10-20-2021 6:24:41 AM

2846 Views
4 replies
3 kudos

Feature request: It is possible to add comments to both databricks sql databases and tables. It would be really usefull if these comments could show u...

Feature request: It is possible to add comments to both databricks sql databases and tables. It would be really usefull if these comments could show up (if they are provided) in PowerBI when one connects to the Databricks SQL endpoint, e.g. in this w...

Data Engineering

2846 Views
4 replies
3 kudos

10-20-2021 6:24:41 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

10-20-2021 6:47:25 AM

3 kudos

Nice idea!

3 kudos

10-20-2021 6:47:25 AM

3 More Replies

by tarente • New Contributor III

10-08-2021 10:04:23 AM

5590 Views
6 replies
5 kudos

Resolved! How to implement the where not exists pattern in scala?

I have a dataframe with the following columns:Key1Key2Y_N_ColCol1Col2For the key tuple (Key1, Key2), I have rows with Y_N_Col = "Y" and Y_N_Col = "N".I need a new dataframe with all rows with Y_N_Col = "Y" (regardless of the key tuple), plus all Y_N_...

Data Engineering

5590 Views
6 replies
5 kudos

10-08-2021 10:04:23 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-11-2021 3:21:51 AM

5 kudos

I'd use a left-anti join.So create a df with all the Y, then create a df with all the N and do a left_anti join (on key1 and key2) on the df with the Y.then a union of those two.

5 kudos

10-11-2021 3:21:51 AM

5 More Replies

by Programming_Sch • New Contributor

10-21-2021 12:56:09 AM

1018 Views
0 replies
0 kudos

aws logo

What is the future of aws?The future of AWS is very promising. So, if you are thinking of a cloud career or want to switch your position to something related to the cloud, I would highly recommend you going for AWS training. No matter what field you ...

Data Engineering

1018 Views
0 replies
0 kudos

10-21-2021 12:56:09 AM

by xiaozy • New Contributor

10-20-2021 3:48:39 PM

2817 Views
0 replies
0 kudos

How to reference a computed column directly in Spark SQL?

Data Engineering

2817 Views
0 replies
0 kudos

10-20-2021 3:48:39 PM

by User16826992666 • Databricks Employee

06-15-2021 2:17:51 PM

2405 Views
1 replies
0 kudos

If data from a Delta table is cached in Databricks SQL and the table is altered in the backend, does it invalidate the cache?

Basically I'm worried about the scenario where data that gets cached on Databricks SQL endpoints becomes out of sync with the source Delta table. If that were to happen and data was read from the cache it would be out of date/incorrect. Is this a con...

Data Engineering

2405 Views
1 replies
0 kudos

06-15-2021 2:17:51 PM

View Replies

Latest Reply

mathan_pillai
Databricks Employee

10-20-2021 2:31:01 PM

0 kudos

There are 3 types of caching. 1-Databricks SQL UI caching, 2-Query results caching , 3-Delta caching . (1) does not get invalidated. It's like your BI dashboard. BI dashboard needs to be manually refreshed.(2) and (3) gets auto invalidation.pls check...

0 kudos

10-20-2021 2:31:01 PM

Databricks Community

Forum Posts

Please let me know how i can install PyAudio using the Databricks notebook

Set tags for an MLFlow Experiment using Python?

Resolved! Why isn't my notebook search function working?

Event hub streaming improve processing rate

Capture num_affected_rows in notebooks

How to partition frame with a computed column in window function in Spark SQL?

How to optimize exporting dataframe to delta file?

Older Spark Version loaded into the spark notebook

Where can I enable notification for comments made on my notebook?

Resolved! BufferHolder Exceeded on Json flattening

Feature request: It is possible to add comments to both databricks sql databases and tables. It would be really usefull if these comments could show u...

Resolved! How to implement the where not exists pattern in scala?

aws logo

How to reference a computed column directly in Spark SQL?

If data from a Delta table is cached in Databricks SQL and the table is altered in the backend, does it invalidate the cache?

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template

Use .R file in data pipeline