cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NOOR_BASHASHAIK
by Contributor
  • 5124 Views
  • 4 replies
  • 4 kudos

Azure Databricks VM type for OPTIMIZE with ZORDER on a single column

DearsI was trying to check what Azure Databricks VM type is best suited for executing OPTIMIZE with ZORDER on a single timestamp value (but string data type) column for around 5000+ tables in the Delta Lake.I chose Standard_F16s_v2 with 6 workers & 1...

image image image image
  • 5124 Views
  • 4 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi,The Standard_F16s_v2 is a compute optimize type machine. On the other-hand, for delta optimize (both bin-packing and Z-Ordering), we recommend Stabdard_DS_v2-series. Also, follow Hubert's recommendations.

  • 4 kudos
3 More Replies
KKo
by Contributor III
  • 4768 Views
  • 2 replies
  • 7 kudos

Incompatible format detected while writing in Parquet format.

I am writing/reading data from Azure databricks to data lake. I wrote dataframe to a path in delta format using query a below, later I realized that I need the data in parquet format, and I went to the storage account and manually deleted the filepat...

  • 4768 Views
  • 2 replies
  • 7 kudos
Latest Reply
KKo
Contributor III
  • 7 kudos

Update: I tried Clear state and outputs which did not help, but when I restarted the cluster it worked without an issue. Though the issue is fixed, I still don't know what caused the issue to come in.

  • 7 kudos
1 More Replies
John_BardessGro
by New Contributor II
  • 6489 Views
  • 2 replies
  • 4 kudos

Cluster Reuse for delta live tables

I have several delta live table notebooks that are tied to different delta live table jobs so that I can use multiple target schema names. I know it's possible to reuse a cluster for job segments but is it possible for these delta live table jobs (w...

  • 6489 Views
  • 2 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

The same DLT job (workflow) will use the same cluster in development mode (shutdown in 2h) and new in production (shutdown 0). Although in JSON, you can manipulate that value:{ "configuration": { "pipelines.clusterShutdown.delay": "60s" } }Yo...

  • 4 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 6068 Views
  • 3 replies
  • 4 kudos

How do you structure and storage you medallion architecture ?

Hi guys,How you suggestion about how to create a medalion archeterure ? how many and what datalake zones, how store data, how databases used to store, anuthing I think that zones:1.landing zone, file storage in /landing_zone - databricks database.bro...

  • 6068 Views
  • 3 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @William Scardua​ ,I will highly recommend you to use Delta Live Tables (DLT) for your use case. Please check the docs with sample notebooks here https://docs.databricks.com/workflows/delta-live-tables/index.html

  • 4 kudos
2 More Replies
Chris_Shehu
by Valued Contributor III
  • 4971 Views
  • 1 replies
  • 5 kudos

Resolved! Getting errors while following Microsoft Databricks Best-Practices for DevOps Integration

I'm currently trying to follow the Software engineering best practices for notebooks - Azure Databricks guide but I keep running into the following during step 4.5: Run the test============================= test session starts =======================...

image.png image image image
  • 4971 Views
  • 1 replies
  • 5 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 5 kudos

Closing the loop on this in case anyone gets stuck in the same situation. You can see in the images that the transforms_test.py shows a different icon then the testdata.csv. This is because it was saved as a juypter notebook not a .py file. When the ...

  • 5 kudos
140015
by New Contributor III
  • 1569 Views
  • 1 replies
  • 0 kudos

Resolved! Is S3 dbfs mount faster then direct access?

Hi,Is there any speed difference between mounted s3 bucket and direct access during reading/writing delta tables or other type of files? I tried to find something in docs, but didn't found anything.

  • 1569 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 0 kudos

Hi @Jacek Dembowiak​ , behind the scenes, mounting an S3 bucket and reading from it works the same way as directly accessing it. Mounts are just metadata, the underlying access mechanism is the same for both the scenarios you mentioned. Mounting the ...

  • 0 kudos
Mado
by Valued Contributor II
  • 2240 Views
  • 2 replies
  • 3 kudos

How to apply Pandas functions on PySpark DataFrame?

Hi, I want to apply Pandas functions (like isna, concat, append, etc) on PySpark DataFrame in such a way that computations are done on multi-node cluster.I don't want to convert PySpark DataFrame into Pandas DataFrame since, I think, only one node is...

  • 2240 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

The best is to use pandas on a spark, it is virtually interchangeable so it just different API for Spark data frameimport pyspark.pandas as ps   psdf = ps.range(10) sdf = psdf.to_spark().filter("id > 5") sdf.show()

  • 3 kudos
1 More Replies
AJDJ
by New Contributor III
  • 8421 Views
  • 9 replies
  • 4 kudos

Delta Lake Demo - Not working

Hi there, I imported the delta lake demo notebook from databricks link and at command 12 it errors out. I tired other ways and path but couldnt get past the error. May be the notebook is outdated?https://www.databricks.com/notebooks/Demo_Hub-Delta_La...

  • 8421 Views
  • 9 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @AJ DJ​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 4 kudos
8 More Replies
JoeS
by New Contributor III
  • 6439 Views
  • 1 replies
  • 1 kudos

When will Github Copilot be available in the Databricks IDE?

It's been quite difficult to stay in VSCode while developing data science experiments and tooling for Databricks. Our team would like to have Github Copilot for the databricks IDE.

  • 6439 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Joe Shull​ Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
RJB
by New Contributor II
  • 12751 Views
  • 6 replies
  • 0 kudos

Resolved! How to pass outputs from a python task to a notebook task

I am trying to create a job which has 2 tasks as follows:A python task which accepts a date and an integer from the user and outputs a list of dates (say, a list of 5 dates in string format).A notebook which runs once for each of the dates from the d...

  • 12751 Views
  • 6 replies
  • 0 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 0 kudos

Just a note that this feature, Task Values, has been generally available for a while.

  • 0 kudos
5 More Replies
hari
by Contributor
  • 23112 Views
  • 3 replies
  • 7 kudos

How to add the partition for an existing delta table

We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow conc...

  • 23112 Views
  • 3 replies
  • 7 kudos
Latest Reply
hari
Contributor
  • 7 kudos

Updated the description

  • 7 kudos
2 More Replies
Anonymous
by Not applicable
  • 966 Views
  • 0 replies
  • 1 kudos

Heads up! November Community Social!  On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure ...

Heads up! November Community Social! On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have the chance to connect as a community often. Come network, talk data, and just get social...

  • 966 Views
  • 0 replies
  • 1 kudos
Taha_Hussain
by Databricks Employee
  • 1653 Views
  • 0 replies
  • 8 kudos

Ask your technical questions at Databricks Office Hours October 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register...

Ask your technical questions at Databricks Office HoursOctober 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register Here (NEW EMEA Office Hours)Databricks Office Hours connects you directly with experts to answer all...

  • 1653 Views
  • 0 replies
  • 8 kudos
pen
by New Contributor II
  • 2160 Views
  • 2 replies
  • 2 kudos

Pyspark will error while I pack source zip package without dir.

If I send the package made by zipfile on spark.submit.pyFiles which zip by this code. import zipfile, os def make_zip(source_dir, output_filename): with zipfile.ZipFile(output_filename, 'w') as zipf: pre_len = len(os.path....

  • 2160 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

I checked, and your code is ok. If you set source_dir and output_filename please remember to start path with /dbfsIf you work on the community edition you can get problems with access to underlying filesystem.

  • 2 kudos
1 More Replies
mghildiy
by New Contributor
  • 1469 Views
  • 1 replies
  • 1 kudos

Checking spark performance locally

I am experimenting with spark, on my local machine. So, is there some tool/api available to check the performance of the code I write?For eg. I write:val startTime = System.nanoTime() invoicesDF .select( count("*").as("Total Number Of Inv...

  • 1469 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Please check the details about your code (task in jobs) in Spark UI.

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels