cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

William_Scardua
by Valued Contributor
  • 4012 Views
  • 4 replies
  • 4 kudos

How do you structure and storage you medallion architecture ?

Hi guys,How you suggestion about how to create a medalion archeterure ? how many and what datalake zones, how store data, how databases used to store, anuthing I think that zones:1.landing zone, file storage in /landing_zone - databricks database.bro...

  • 4012 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @William Scardua​ ​, We haven’t heard from you since the last response from @Jose Gonzalez​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others....

  • 4 kudos
3 More Replies
John_BardessGro
by New Contributor II
  • 5246 Views
  • 3 replies
  • 4 kudos

Cluster Reuse for delta live tables

I have several delta live table notebooks that are tied to different delta live table jobs so that I can use multiple target schema names. I know it's possible to reuse a cluster for job segments but is it possible for these delta live table jobs (w...

  • 5246 Views
  • 3 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @John Fico​ ​, We haven’t heard from you since the last response from @Hubert Dudek​ and @Jose Gonzalez​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpf...

  • 4 kudos
2 More Replies
Mado
by Valued Contributor II
  • 6728 Views
  • 4 replies
  • 2 kudos

Resolved! Pandas API on Spark, Does it run on a multi-node cluster?

Hi, I have a few questions about "Pandas API on Spark". Thanks for your time to read my questions1) Input to these functions are Pandas DataFrame or PySpark DataFrame?2) When I use any pandas function (like isna, size, apply, where, etc ), does it ru...

  • 6728 Views
  • 4 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi @Mohammad Saber​ , Pandas dataset lives in the single machine, and is naturally iterable locally within the same machine. However, pandas-on-Spark dataset lives across multiple machines, and they are computed in a distributed manner. It is difficu...

  • 2 kudos
3 More Replies
Markus
by New Contributor II
  • 1806 Views
  • 2 replies
  • 2 kudos

dbutils.notebook.run raises HTTP 401 Unauthorized Error

Hello,since a while I use dbutils.notebook.run for multiple calling of additional notebooks and passing parameters to them. So far I could use the function without any difficulties - also today.But since a few hours now I get the following error mess...

  • 1806 Views
  • 2 replies
  • 2 kudos
Latest Reply
Markus
New Contributor II
  • 2 kudos

Hello Community,the issue occurred due to a changed central configuration.Recommendation by Databricks: "Admin Protection: New feature and security recommendations for No Isolation Shared clusters"Here is the link to the current restrictions: Enable ...

  • 2 kudos
1 More Replies
Chris_Shehu
by Valued Contributor III
  • 2879 Views
  • 1 replies
  • 5 kudos

Resolved! Getting errors while following Microsoft Databricks Best-Practices for DevOps Integration

I'm currently trying to follow the Software engineering best practices for notebooks - Azure Databricks guide but I keep running into the following during step 4.5: Run the test============================= test session starts =======================...

image.png image image image
  • 2879 Views
  • 1 replies
  • 5 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 5 kudos

Closing the loop on this in case anyone gets stuck in the same situation. You can see in the images that the transforms_test.py shows a different icon then the testdata.csv. This is because it was saved as a juypter notebook not a .py file. When the ...

  • 5 kudos
140015
by New Contributor III
  • 922 Views
  • 1 replies
  • 0 kudos

Resolved! Is S3 dbfs mount faster then direct access?

Hi,Is there any speed difference between mounted s3 bucket and direct access during reading/writing delta tables or other type of files? I tried to find something in docs, but didn't found anything.

  • 922 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 0 kudos

Hi @Jacek Dembowiak​ , behind the scenes, mounting an S3 bucket and reading from it works the same way as directly accessing it. Mounts are just metadata, the underlying access mechanism is the same for both the scenarios you mentioned. Mounting the ...

  • 0 kudos
Mado
by Valued Contributor II
  • 1536 Views
  • 2 replies
  • 3 kudos

How to apply Pandas functions on PySpark DataFrame?

Hi, I want to apply Pandas functions (like isna, concat, append, etc) on PySpark DataFrame in such a way that computations are done on multi-node cluster.I don't want to convert PySpark DataFrame into Pandas DataFrame since, I think, only one node is...

  • 1536 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

The best is to use pandas on a spark, it is virtually interchangeable so it just different API for Spark data frameimport pyspark.pandas as ps   psdf = ps.range(10) sdf = psdf.to_spark().filter("id > 5") sdf.show()

  • 3 kudos
1 More Replies
AJDJ
by New Contributor III
  • 4362 Views
  • 9 replies
  • 4 kudos

Delta Lake Demo - Not working

Hi there, I imported the delta lake demo notebook from databricks link and at command 12 it errors out. I tired other ways and path but couldnt get past the error. May be the notebook is outdated?https://www.databricks.com/notebooks/Demo_Hub-Delta_La...

  • 4362 Views
  • 9 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @AJ DJ​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 4 kudos
8 More Replies
JoeS
by New Contributor III
  • 4958 Views
  • 2 replies
  • 1 kudos

When will Github Copilot be available in the Databricks IDE?

It's been quite difficult to stay in VSCode while developing data science experiments and tooling for Databricks. Our team would like to have Github Copilot for the databricks IDE.

  • 4958 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Joe Shull​ Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 1 kudos
1 More Replies
RJB
by New Contributor II
  • 8403 Views
  • 6 replies
  • 0 kudos

Resolved! How to pass outputs from a python task to a notebook task

I am trying to create a job which has 2 tasks as follows:A python task which accepts a date and an integer from the user and outputs a list of dates (say, a list of 5 dates in string format).A notebook which runs once for each of the dates from the d...

  • 8403 Views
  • 6 replies
  • 0 kudos
Latest Reply
BilalAslamDbrx
Honored Contributor III
  • 0 kudos

Just a note that this feature, Task Values, has been generally available for a while.

  • 0 kudos
5 More Replies
ShenghuaNi
by New Contributor II
  • 1197 Views
  • 1 replies
  • 0 kudos

$200 Voucher

Has any one really get the $200 voucher? I contacted the training support, but still did not get that voucher. The support just said need to investigate, but never reply any more. Do not know what happened. Or this is just a fault ads.

  • 1197 Views
  • 1 replies
  • 0 kudos
Latest Reply
ShenghuaNi
New Contributor II
  • 0 kudos

The training support sent me the voucher number.

  • 0 kudos
StephanieRivera
by Valued Contributor II
  • 6419 Views
  • 7 replies
  • 10 kudos

Resolved! How do I kick off Azure Data Factory from within Databricks?

I want to kick off ingestion in ADF from Databricks. When ADF ingestion is done, my DBX bronze-silver-gold pipeline follows within DBX.I see it is possible to call Databricks notebooks from ADF. Can I also go the other way? I want to start the ingest...

  • 6419 Views
  • 7 replies
  • 10 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 10 kudos

Hi @Stephanie Rivera​​, We haven’t heard from you since the last response from @Werner Stinckens​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to othe...

  • 10 kudos
6 More Replies
hari
by Contributor
  • 16471 Views
  • 5 replies
  • 5 kudos

How to add the partition for an existing delta table

We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow conc...

  • 16471 Views
  • 5 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @Harikrishnan P H​ , We haven’t heard from you since the last response from @Hubert Dudek​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. ...

  • 5 kudos
4 More Replies
Anonymous
by Not applicable
  • 646 Views
  • 0 replies
  • 1 kudos

Heads up! November Community Social!  On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure ...

Heads up! November Community Social! On November 17th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have the chance to connect as a community often. Come network, talk data, and just get social...

  • 646 Views
  • 0 replies
  • 1 kudos
Taha_Hussain
by Valued Contributor II
  • 1212 Views
  • 0 replies
  • 8 kudos

Ask your technical questions at Databricks Office Hours October 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register...

Ask your technical questions at Databricks Office HoursOctober 26 - 11:00 AM - 12:00 PM PT: Register HereNovember 9 - 8:00 AM - 9:00 AM GMT: Register Here (NEW EMEA Office Hours)Databricks Office Hours connects you directly with experts to answer all...

  • 1212 Views
  • 0 replies
  • 8 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels