Data Engineering

Forum Posts

Sorted by:

by pantelis_mare • Contributor III

10-22-2021 6:17:25 AM

6436 Views
7 replies
5 kudos

Resolved! [SOLVED] maxPartitionBytes ignored?

Hello all!I'm running a simple read noop query where I read a specific partition of a delta table that looks like this:With the default configuration, I read the data in 12 partitions, which makes sense as the files that are more than 128MB are split...

Data Engineering

6436 Views
7 replies
5 kudos

10-22-2021 6:17:25 AM

View Replies

Latest Reply

ashish1
New Contributor III

10-22-2021 12:56:56 PM

5 kudos

AQE doesn't affect the read time partitioning but at the shuffle time. It would be better to run optimize on the delta lake which will compact the files to approx 1 GB each, it would provide better read time performance.

5 kudos

10-22-2021 12:56:56 PM

6 More Replies

by Nickels • New Contributor II

10-25-2021 2:04:31 AM

914 Views
4 replies
1 kudos

Resolved! Reply on inline runtime commands

I feel like the answer to this question should be simple, but none the less I'm struggling.I run a python code that prompts me with the following warning:On my local machine, I can accept this through my terminal and my machine do not run out of memo...

Data Engineering

914 Views
4 replies
1 kudos

10-25-2021 2:04:31 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-26-2021 1:41:23 PM

1 kudos

Hi @Nickels Köhling ,In Databricks, you will only be able to see the output in the driver logs. If you go to your driver logs, you will be able to see 3 windows that are displaying the output of "stdout", "stderr" and "log4j".If in your code you do ...

1 kudos

10-26-2021 1:41:23 PM

3 More Replies

by yatharth29 • New Contributor II

10-26-2021 2:54:54 AM

1950 Views
1 replies
2 kudos

How can I get/read Event Logs of a cluster in the form of a delta table within a notebook?

Data Engineering

1950 Views
1 replies
2 kudos

10-26-2021 2:54:54 AM

View Replies

Latest Reply

Sajesh
New Contributor III

10-26-2021 3:35:47 AM

2 kudos

Hi @Yatharth Kaushik ,You can get the data into a table using the clusters event API: https://docs.databricks.com/dev-tools/api/latest/clusters.html#events

2 kudos

10-26-2021 3:35:47 AM

by Jreco • Contributor

10-25-2021 6:59:46 AM

2248 Views
3 replies
5 kudos

Resolved! Reference py file from a notebook

Hi All,I'm trying to reference a py file from a notebook following this documentation: Files in repoI downloaded and added the files to my repo and when I try to run the notebook, the modules is not recognized: Any idea why is this happening? Thanks ...

Data Engineering

2248 Views
3 replies
5 kudos

10-25-2021 6:59:46 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-25-2021 9:36:54 AM

5 kudos

In this topic you can find some more info:https://community.databricks.com/s/question/0D53f00001Pp5EhCAJThe docs are not that clear.

5 kudos

10-25-2021 9:36:54 AM

2 More Replies

by Mec_Mec • New Contributor II

10-15-2021 12:29:08 AM

2343 Views
6 replies
4 kudos

Resolved! Copy a script from the current subscription to new subscription

I would like to check if there is a process to copy a script/code or migrate the script from the current subscription of the Azure Databricks - Notebooks to new subscription of Databricks (new notebook).

Data Engineering

2343 Views
6 replies
4 kudos

10-15-2021 12:29:08 AM

View Replies

Latest Reply

Mec_Mec
New Contributor II

10-22-2021 2:44:10 AM

4 kudos

how quickly move the Databricks notebooks from one account to another?

4 kudos

10-22-2021 2:44:10 AM

5 More Replies

by Håkon_Åmdal • New Contributor III

10-25-2021 2:11:51 AM

1288 Views
1 replies
1 kudos

Resolved! Incorrect length for `string` returned by the Databricks ODBC driver

Dear Databricks and community,I have been struggling with a bug related to using golang and the Databricks ODBC driver.It turns out that `SQLDescribeColW` consequently returns 256 as a length for `string` columns. However, in Spark, strings might b...

Data Engineering

1288 Views
1 replies
1 kudos

10-25-2021 2:11:51 AM

View Replies

Latest Reply

User16829050420
New Contributor III

10-25-2021 3:19:43 AM

1 kudos

Thanks for posting this issue @Håkon Åmdal . We should be able to reproduce and report it to the Magnitude team subsquently.

1 kudos

10-25-2021 3:19:43 AM

by RasmusOlesen • New Contributor III

08-09-2021 2:07:29 AM

2056 Views
5 replies
1 kudos

Resolved! ciso8601 library stopped installing out of the blue on DB clusters

We have multiple DB clusters (6.4 Extended Support) that have not changed in terms of libs installed or nodes etc. Sudden from one day to the other, after a cluster restart August 7th, they stopped installing ciso8601 lib as they would usually. Anyb...

Data Engineering

2056 Views
5 replies
1 kudos

08-09-2021 2:07:29 AM

View Replies

Latest Reply

RasmusOlesen
New Contributor III

10-25-2021 1:34:50 AM

1 kudos

Just to close this old qustion:We solved this by switching to a PEP517 free pip install, using the a Global Init Script:/databricks/python/bin/pip install ciso8601 --disable-pip-version-check --no-use-pep517Now it works for us.

1 kudos

10-25-2021 1:34:50 AM

4 More Replies

by ssm3819 • New Contributor III

10-24-2021 9:52:33 PM

2988 Views
3 replies
3 kudos

Please let me know how i can install PyAudio using the Databricks notebook

Hi,i am trying to install the PyAudio package.but i am getting the following error. Collecting pyaudio Using cached PyAudio-0.2.11.tar.gz (37 kB)Building wheels for collected packages: pyaudio Building wheel for pyaudio (setup.py) ... error ERROR: Co...

Data Engineering

2988 Views
3 replies
3 kudos

10-24-2021 9:52:33 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-25-2021 1:08:49 AM

3 kudos

looks like missing dependencies on the server (linux): portaudioThis should be installed:https://stackoverflow.com/questions/48690984/portaudio-h-no-such-file-or-directory

3 kudos

10-25-2021 1:08:49 AM

2 More Replies

by NAS • New Contributor III

10-23-2021 9:31:49 AM

1231 Views
2 replies
0 kudos

Set tags for an MLFlow Experiment using Python?

There is this rest API: https://www.mlflow.org/docs/latest/rest-api.html#set-experiment-tagCan I do the same from python's MLFlow API?

Data Engineering

1231 Views
2 replies
0 kudos

10-23-2021 9:31:49 AM

View Replies

Latest Reply

NAS
New Contributor III

10-24-2021 1:28:19 PM

0 kudos

Someone answered first in StackOverflow. Here it is:from mlflow.tracking import MlflowClient # Create an experiment with a name that is unique and case sensitive. client = MlflowClient() experiment_id = client.create_experiment("Social NLP Experime...

0 kudos

10-24-2021 1:28:19 PM

1 More Replies

by MadelynM • New Contributor III

06-09-2021 12:32:09 PM

1172 Views
2 replies
4 kudos

Resolved! Why isn't my notebook search function working?

My search function is broken. I can't search for notebook contents.

Data Engineering

1172 Views
2 replies
4 kudos

06-09-2021 12:32:09 PM

View Replies

Latest Reply

lizou
Contributor II

10-24-2021 10:25:14 AM

4 kudos

Here is a tool availableelsevierlabs-os/NotebookDiscovery: Notebook Discovery Tool for Databricks notebooks (github.com)How to Catalog and Discover Your Databricks Notebooks Faster - The Databricks Blog

4 kudos

10-24-2021 10:25:14 AM

1 More Replies

by Prabakar • Esteemed Contributor III

10-22-2021 3:30:57 PM

1253 Views
0 replies
2 kudos

Accessing the regions that are disabled by default in AWS from Databricks. In AWS we have 4 regions that are disabled by default. You must first enabl...

Accessing the regions that are disabled by default in AWS from Databricks.In AWS we have 4 regions that are disabled by default. You must first enable it before you can create and manage resources. The following Regions are disabled by default:Africa...

Data Engineering

1253 Views
0 replies
2 kudos

10-22-2021 3:30:57 PM

by Jreco • Contributor

10-21-2021 2:50:15 PM

4569 Views
14 replies
3 kudos

Event hub streaming improve processing rate

Hi all,I'm working with event hubs and data bricks to process and enrich data in real-time.Doing a "simple" test, I'm getting some weird values (input rate vs processing rate) and I think I'm losing data:If you can see, there is a peak with 5k record...

Data Engineering

4569 Views
14 replies
3 kudos

10-21-2021 2:50:15 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-22-2021 3:24:24 PM

3 kudos

hi @Jhonatan Reyes ,How many Event hubs partitions are you readying from? your micro-batch takes a few milliseconds to complete, which I think is good time, but I would like to undertand better what are you trying to improve here.Also, in this case ...

3 kudos

10-22-2021 3:24:24 PM

13 More Replies

by BigJay • New Contributor II

10-14-2021 6:36:12 PM

2790 Views
5 replies
5 kudos

Capture num_affected_rows in notebooks

If I run some code, say for an ETL process to migrate data from bronze to silver storage, when a cell executes it reports num_affected_rows in a table format. I want to capture that and log it in my logger. Is it stored in a variable or syslogged som...

Data Engineering

2790 Views
5 replies
5 kudos

10-14-2021 6:36:12 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-19-2021 1:25:56 AM

5 kudos

Good one Dan! I never thought of using the delta api for this but there you go.

5 kudos

10-19-2021 1:25:56 AM

4 More Replies

by maranBH • New Contributor III

10-19-2021 1:41:18 PM

19415 Views
5 replies
12 kudos

Resolved! How to import a function to another notebook using Repos without %run?

Hi all,I was reading the Repos documentation: https://docs.databricks.com/repos.html#migrate-from-run-commandsIt is explained that, one advantage of Repos is no longer necessary to use %run magic command to make funcions available in one notebook to ...

Data Engineering

19415 Views
5 replies
12 kudos

10-19-2021 1:41:18 PM

View Replies

Latest Reply

maranBH
New Contributor III

10-22-2021 7:25:54 AM

12 kudos

Thank you all for your help! I tried all that was suggested; but I finally realized it was my fault in first place:I was testing Files in Repos with a runtime < 8.4.I was trying to import a file from a DB Notebook instead of a static .py file.Upgradi...

12 kudos

10-22-2021 7:25:54 AM

4 More Replies

by xiaozy • New Contributor

10-20-2021 3:47:53 PM

720 Views
2 replies
1 kudos

How to partition frame with a computed column in window function in Spark SQL?

Data Engineering

720 Views
2 replies
1 kudos

10-20-2021 3:47:53 PM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

10-21-2021 1:04:47 PM

1 kudos

Hi @xiaojun wang please check the blog and let us know if this helps you.https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

1 kudos

10-21-2021 1:04:47 PM

1 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! [SOLVED] maxPartitionBytes ignored?

Resolved! Reply on inline runtime commands

How can I get/read Event Logs of a cluster in the form of a delta table within a notebook?

Resolved! Reference py file from a notebook

Resolved! Copy a script from the current subscription to new subscription

Resolved! Incorrect length for `string` returned by the Databricks ODBC driver

Resolved! ciso8601 library stopped installing out of the blue on DB clusters

Please let me know how i can install PyAudio using the Databricks notebook

Set tags for an MLFlow Experiment using Python?

Resolved! Why isn't my notebook search function working?

Accessing the regions that are disabled by default in AWS from Databricks. In AWS we have 4 regions that are disabled by default. You must first enabl...

Event hub streaming improve processing rate

Capture num_affected_rows in notebooks

Resolved! How to import a function to another notebook using Repos without %run?

How to partition frame with a computed column in window function in Spark SQL?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...