cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

gbrueckl
by Contributor II
  • 9381 Views
  • 2 replies
  • 4 kudos

Resolved! dbutils.notebook.run with multiselect parameter

I have a notebook which has a parameter defined as dbutils.widgets.multiselect("my_param", "ALL", ["ALL", "A", "B", "C")and I would like to pass this parameter when calling the notebook via dbutils.notebook.run()However, I tried passing it as an pyth...

  • 9381 Views
  • 2 replies
  • 4 kudos
Latest Reply
gbrueckl
Contributor II
  • 4 kudos

you are right, this actually works fine.I just realized I had two multiselect parameters in my tests and only changing one of them still resulted in the same error message for the second one I ended up writing a function that parses whatever comes in...

  • 4 kudos
1 More Replies
tarente
by New Contributor III
  • 1218 Views
  • 2 replies
  • 3 kudos

Resolved! How to create a csv using a Scala notebook that as " in some columns?

In a project we use Azure Databricks to create csv files to be loaded in ThoughtSpot.Below is a sample to the code I use to write the file:val fileRepartition = 1 val fileFormat = "csv" val fileSaveMode = "overwrite" var fileOptions = Map ( ...

  • 1218 Views
  • 2 replies
  • 3 kudos
Latest Reply
tarente
New Contributor III
  • 3 kudos

Hi Shan,Thanks for the link.I now know more options for creating different csv files.I have not yet completed the problem, but that is related with a destination application (ThoughtSpot) not being able to load the data in the csv file correctly.Rega...

  • 3 kudos
1 More Replies
potluri
by New Contributor II
  • 2731 Views
  • 2 replies
  • 1 kudos

Resolved! Cluster frequently crashing

Cluster crashing, prompting me to use a different cluster or restart the cluster. Previously worked fine for the same code

  • 2731 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @potluri​ ,What kind of cluster care you using? Is it an interactive cluster or a job cluster? what is the error message you are getting? The following KB article could help you to find the cause and the solution to your problem. Please check the ...

  • 1 kudos
1 More Replies
Ougagagoubu
by New Contributor
  • 1016 Views
  • 0 replies
  • 0 kudos

FileBug in DBFS? Can not remove file (table) nor create it in Apache Spark (TM) SQL for Data Analysts Coursera course from Unit 6.2 onwards on.

Hello,as the title already suggests, i'm not able to remove a file via the shell (%sh rm -f "path") nor continue the notebook 6.2 onwards on (6.3 etc...) inside DataBricks. I'm using the DataBricks Community edition.While the error message is clear:"...

  • 1016 Views
  • 0 replies
  • 0 kudos
hoopla
by New Contributor II
  • 5978 Views
  • 3 replies
  • 1 kudos

Unable to copy mutiple files from file:/tmp to dbfs:/tmp

I am downloading multiple files by web scraping and by default they are stored in /tmp I can copy a single file by providing the filename and path %fs cp file:/tmp/2020-12-14_listings.csv.gz dbfs:/tmp but when I try to copy multiple files I get an ...

  • 5978 Views
  • 3 replies
  • 1 kudos
Latest Reply
hoopla
New Contributor II
  • 1 kudos

Thanks DeepakThis is what I have suspected.Hopefully the wild card feature might be available in futureThanks

  • 1 kudos
2 More Replies
User16826992724
by New Contributor III
  • 1938 Views
  • 1 replies
  • 2 kudos
  • 1938 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16826992724
New Contributor III
  • 2 kudos

Just like B-tree indices in the traditional EDW world, Z-order indexing can be used on high-cardinality columns like Primary Key columns and high-cardinality joins like facts and dimension tables joins. Z-order indexes can be created only on the ...

  • 2 kudos
User16826992724
by New Contributor III
  • 1031 Views
  • 1 replies
  • 4 kudos
  • 1031 Views
  • 1 replies
  • 4 kudos
Latest Reply
User16826992724
New Contributor III
  • 4 kudos

There are various methods like using uuid , monotonically_increasing_id(), using row_number() OVER (ORDER BY NULL) AS SK, using md5() or sha() hashing functions etc. Detailed discussion of various options and the pros/cons can be found in this youtu...

  • 4 kudos
morganmazouchi
by Esteemed Contributor III
  • 6299 Views
  • 7 replies
  • 4 kudos
  • 6299 Views
  • 7 replies
  • 4 kudos
Latest Reply
Sebastian
Contributor
  • 4 kudos

one way to manage is make the cluster permission only to can restart and then use an init script to install libraries on start up so that users wont install libraries on the fly.

  • 4 kudos
6 More Replies
saipujari_spark
by Esteemed Contributor III
  • 1182 Views
  • 1 replies
  • 3 kudos

Delta Optimized Write vs Reparation, Which is recommended?

When streaming to a Delta table, both repartitioning on the partition column and optimized write can help to avoid small files.Which is recommended between Delta Optimized Write vs Repartitioning?

  • 1182 Views
  • 1 replies
  • 3 kudos
Latest Reply
saipujari_spark
Esteemed Contributor III
  • 3 kudos

 Optimized write is recommended over repartitioning for the below reasons.* The key part of Optimized Writes is that it is an adaptive shuffle. If you have a streaming ingest use case and input data rates change over time, the adaptive shuffle will a...

  • 3 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels