Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Kaniz • Community Manager

09-21-2021 10:36:17 AM

442 Views
0 replies
0 kudos

How to get the count of Azure Data Factory datasets, triggers, pipelines and linked services?

Data Engineering

442 Views
0 replies
0 kudos

09-21-2021 10:36:17 AM

by gbrueckl • Contributor II

09-16-2021 12:34:36 PM

7326 Views
2 replies
4 kudos

Resolved! dbutils.notebook.run with multiselect parameter

I have a notebook which has a parameter defined as dbutils.widgets.multiselect("my_param", "ALL", ["ALL", "A", "B", "C")and I would like to pass this parameter when calling the notebook via dbutils.notebook.run()However, I tried passing it as an pyth...

Data Engineering

7326 Views
2 replies
4 kudos

09-16-2021 12:34:36 PM

View Replies

Latest Reply

gbrueckl
Contributor II

09-21-2021 2:57:00 AM

4 kudos

you are right, this actually works fine.I just realized I had two multiselect parameters in my tests and only changing one of them still resulted in the same error message for the second one I ended up writing a function that parses whatever comes in...

4 kudos

09-21-2021 2:57:00 AM

1 More Replies

by Kaniz • Community Manager

09-21-2021 2:04:49 AM

388 Views
0 replies
0 kudos

How to sort an array of records into descending order?

Data Engineering

388 Views
0 replies
0 kudos

09-21-2021 2:04:49 AM

by Kaniz • Community Manager

09-21-2021 1:31:30 AM

679 Views
0 replies
0 kudos

How to read video file data from s3 and convert to mp4 using python?

Data Engineering

679 Views
0 replies
0 kudos

09-21-2021 1:31:30 AM

by Kaniz • Community Manager

09-21-2021 1:31:00 AM

1306 Views
0 replies
0 kudos

How can I solve Keyerror( return self.attrs[key]) to extract data on Python?

Data Engineering

1306 Views
0 replies
0 kudos

09-21-2021 1:31:00 AM

by tarente • New Contributor III

09-18-2021 11:09:15 AM

699 Views
2 replies
3 kudos

Resolved! How to create a csv using a Scala notebook that as " in some columns?

In a project we use Azure Databricks to create csv files to be loaded in ThoughtSpot.Below is a sample to the code I use to write the file:val fileRepartition = 1 val fileFormat = "csv" val fileSaveMode = "overwrite" var fileOptions = Map ( ...

Data Engineering

699 Views
2 replies
3 kudos

09-18-2021 11:09:15 AM

View Replies

Latest Reply

tarente
New Contributor III

09-21-2021 1:03:14 AM

3 kudos

Hi Shan,Thanks for the link.I now know more options for creating different csv files.I have not yet completed the problem, but that is related with a destination application (ThoughtSpot) not being able to load the data in the csv file correctly.Rega...

3 kudos

09-21-2021 1:03:14 AM

1 More Replies

by potluri • New Contributor II

08-08-2021 9:19:44 PM

1460 Views
2 replies
1 kudos

Resolved! Cluster frequently crashing

Cluster crashing, prompting me to use a different cluster or restart the cluster. Previously worked fine for the same code

Data Engineering

1460 Views
2 replies
1 kudos

08-08-2021 9:19:44 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

09-20-2021 10:51:09 AM

1 kudos

Hi @potluri ,What kind of cluster care you using? Is it an interactive cluster or a job cluster? what is the error message you are getting? The following KB article could help you to find the cause and the solution to your problem. Please check the ...

1 kudos

09-20-2021 10:51:09 AM

1 More Replies

by Ougagagoubu • New Contributor

09-18-2021 1:06:38 PM

763 Views
0 replies
0 kudos

FileBug in DBFS? Can not remove file (table) nor create it in Apache Spark (TM) SQL for Data Analysts Coursera course from Unit 6.2 onwards on.

Hello,as the title already suggests, i'm not able to remove a file via the shell (%sh rm -f "path") nor continue the notebook 6.2 onwards on (6.3 etc...) inside DataBricks. I'm using the DataBricks Community edition.While the error message is clear:"...

Data Engineering

763 Views
0 replies
0 kudos

09-18-2021 1:06:38 PM

by hoopla • New Contributor II

08-17-2021 6:52:59 PM

4250 Views
3 replies
1 kudos

Unable to copy mutiple files from file:/tmp to dbfs:/tmp

I am downloading multiple files by web scraping and by default they are stored in /tmp I can copy a single file by providing the filename and path %fs cp file:/tmp/2020-12-14_listings.csv.gz dbfs:/tmp but when I try to copy multiple files I get an ...

Data Engineering

4250 Views
3 replies
1 kudos

08-17-2021 6:52:59 PM

View Replies

Latest Reply

hoopla
New Contributor II

09-16-2021 11:10:01 AM

1 kudos

Thanks DeepakThis is what I have suspected.Hopefully the wild card feature might be available in futureThanks

1 kudos

09-16-2021 11:10:01 AM

2 More Replies

by User16826992724 • New Contributor III

09-15-2021 5:18:50 PM

659 Views
1 replies
2 kudos

When should I use a bloom filter index vs. a z-order index? Any best practices around it?

Data Engineering

659 Views
1 replies
2 kudos

09-15-2021 5:18:50 PM

View Replies

Latest Reply

User16826992724
New Contributor III

09-15-2021 5:31:16 PM

2 kudos

Just like B-tree indices in the traditional EDW world, Z-order indexing can be used on high-cardinality columns like Primary Key columns and high-cardinality joins like facts and dimension tables joins. Z-order indexes can be created only on the ...

2 kudos

09-15-2021 5:31:16 PM

by User16826992724 • New Contributor III

09-15-2021 5:06:56 PM

597 Views
1 replies
4 kudos

What are the different methods to implement Surrogate Keys in Databricks?

Data Engineering

597 Views
1 replies
4 kudos

09-15-2021 5:06:56 PM

View Replies

Latest Reply

User16826992724
New Contributor III

09-15-2021 5:15:57 PM

4 kudos

There are various methods like using uuid , monotonically_increasing_id(), using row_number() OVER (ORDER BY NULL) AS SK, using md5() or sha() hashing functions etc. Detailed discussion of various options and the pros/cons can be found in this youtu...

4 kudos

09-15-2021 5:15:57 PM

by User16752240003 • Contributor

09-08-2021 5:43:49 PM

4049 Views
7 replies
4 kudos

Resolved! Is there a way that admins can restrict users to install libraries on clusters and notebooks?

Data Engineering

4049 Views
7 replies
4 kudos

09-08-2021 5:43:49 PM

View Replies

Latest Reply

Sebastian
Contributor

09-13-2021 11:34:38 AM

4 kudos

one way to manage is make the cluster permission only to can restart and then use an init script to install libraries on start up so that users wont install libraries on the fly.

4 kudos

09-13-2021 11:34:38 AM

6 More Replies

by BeardyMan • New Contributor III

09-14-2021 6:59:34 AM

2853 Views
9 replies
3 kudos

Resolved! MLFlow Serve Logging

When using Azure Databricks and serving a model, we have received requests to capture additional logging. In some instances, they would like to capture input and output or even some of the steps from a pipeline. Is there any way we can extend the lo...

Data Engineering

2853 Views
9 replies
3 kudos

09-14-2021 6:59:34 AM

View Replies

Latest Reply

Dan_Z
Honored Contributor

09-14-2021 6:14:55 PM

3 kudos

Another word from a Databricks employee:"""You can use the custom model approach but configuring it is painful. Plus you have ended every loggable model in the custom model. Another less intrusive solution would be to have a proxy server do the loggi...

3 kudos

09-14-2021 6:14:55 PM

8 More Replies

by saipujari_spark • Valued Contributor

09-14-2021 3:05:52 PM

761 Views
1 replies
3 kudos

Delta Optimized Write vs Reparation, Which is recommended?

When streaming to a Delta table, both repartitioning on the partition column and optimized write can help to avoid small files.Which is recommended between Delta Optimized Write vs Repartitioning?

Data Engineering

761 Views
1 replies
3 kudos

09-14-2021 3:05:52 PM

View Replies

Latest Reply

saipujari_spark
Valued Contributor

09-14-2021 3:08:40 PM

3 kudos

Optimized write is recommended over repartitioning for the below reasons.* The key part of Optimized Writes is that it is an adaptive shuffle. If you have a streaming ingest use case and input data rates change over time, the adaptive shuffle will a...

3 kudos

09-14-2021 3:08:40 PM

by Artem_Yevtushen • New Contributor III

09-14-2021 1:23:22 PM

682 Views
0 replies
2 kudos

Accelerating row-wise Python UDF functions without using Pandas UDF ProblemSpark will not automatically parallelize UDF operations on smaller/medium d...

Accelerating row-wise Python UDF functions without using Pandas UDFProblemSpark will not automatically parallelize UDF operations on smaller/medium dataframes. As a result, spark will process the UDF as a single non parallelized task. For row-wise op...

Data Engineering

682 Views
0 replies
2 kudos

09-14-2021 1:23:22 PM

User

Count

1601

736

343

284

246

Databricks

Forum Posts

How to get the count of Azure Data Factory datasets, triggers, pipelines and linked services?

Resolved! dbutils.notebook.run with multiselect parameter

How to sort an array of records into descending order?

How to read video file data from s3 and convert to mp4 using python?

How can I solve Keyerror( return self.attrs[key]) to extract data on Python?

Resolved! How to create a csv using a Scala notebook that as " in some columns?

Resolved! Cluster frequently crashing

FileBug in DBFS? Can not remove file (table) nor create it in Apache Spark (TM) SQL for Data Analysts Coursera course from Unit 6.2 onwards on.

Unable to copy mutiple files from file:/tmp to dbfs:/tmp

When should I use a bloom filter index vs. a z-order index? Any best practices around it?

What are the different methods to implement Surrogate Keys in Databricks?

Resolved! Is there a way that admins can restrict users to install libraries on clusters and notebooks?

Resolved! MLFlow Serve Logging

Delta Optimized Write vs Reparation, Which is recommended?

Accelerating row-wise Python UDF functions without using Pandas UDF ProblemSpark will not automatically parallelize UDF operations on smaller/medium d...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...