cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

HenriqueMoniz
by New Contributor II
  • 1709 Views
  • 1 replies
  • 2 kudos

How to access Delta Live Tables feature?

Hi, I tried following the Delta Live Tables quickstart (https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-quickstart.html), but I don't see the Pipelines tab under the Jobs page in my workspace. The same guide mentions...

  • 1709 Views
  • 1 replies
  • 2 kudos
Latest Reply
virbickt
New Contributor III
  • 2 kudos

Hi, you need a Premium workspace for the Pipelines tab to show up. This is what I see on my workspace with Standard Pricing Tier selected: And this is what what I see on my workspace with the Premium Pricing Tier:

  • 2 kudos
THIAM_HUATTAN
by Valued Contributor
  • 3591 Views
  • 3 replies
  • 3 kudos

Using R, how do we write csv file to say dbfs:/tmp?

let us say I already have the data 'TotalData'write.csv(TotalData,file='/tmp/TotalData.csv',row.names = FALSE)I do not see any error from abovewhen I list files below:%fs ls /tmpI do not see any files written there. Why?

  • 3591 Views
  • 3 replies
  • 3 kudos
Latest Reply
Cedric
Databricks Employee
  • 3 kudos

Hi Thiam,Thank you for reaching out to us. In this case it seems that you have written a file to the OS /tmp and tried to fetch the same folder in DBFS.Written >> /tmp/TotalData.csvReading >> /dbfs/tmp/TotalData.csvPlease try to execute write.csv wit...

  • 3 kudos
2 More Replies
jm99
by New Contributor III
  • 2538 Views
  • 2 replies
  • 3 kudos

Ingesting Kafka Avro into an Delta STREAMING LIVE TABLE

Using Azure Databricks:I can create a DLT table in python usingimport dlt import pyspark.sql.functions as fn from pyspark.sql.types import StringType   @dlt.table( name = "<<landingTable>>", path = "<<storage path>>", comment = "<< descri...

  • 2538 Views
  • 2 replies
  • 3 kudos
Latest Reply
lninza
New Contributor II
  • 3 kudos

Hi @John Mathews​  did you find a way to progress here?i am stuck in the same point...

  • 3 kudos
1 More Replies
jon1
by New Contributor II
  • 1083 Views
  • 1 replies
  • 0 kudos

How to dedupe a source table prior to merge through JDBC SQL driver integration

Hi!We're working with change event data from relational and NoSQL databases then processing and ingesting that into DataBricks. It's streamed from source to our messaging platform. Then, our connector is pushing to DataBricks.Right now we're doing th...

  • 1083 Views
  • 1 replies
  • 0 kudos
Latest Reply
jon1
New Contributor II
  • 0 kudos

Update on the theory we are looking at. It'd be similar to below (with necessary changes to support best practices for MERGE such as reducing the search space):-- View for deduping pre-merge CREATE OR REPLACE TEMPORARY VIEW {view} AS SELECT * EXCEPT ...

  • 0 kudos
NickMendes
by New Contributor III
  • 1830 Views
  • 3 replies
  • 2 kudos

Resolved! Alert e-mail is not recognizing my html text

I've always used alert e-mail notifications with my custom message, written in HTML. The problem is that today it suddenly is not working anymore and I'm getting the alert e-mail notification distorted, as HTML doesn't work anymore.Does anyone know w...

  • 1830 Views
  • 3 replies
  • 2 kudos
Latest Reply
NickMendes
New Contributor III
  • 2 kudos

Apparently, it has been corrected and it is working again. Thank you everyone

  • 2 kudos
2 More Replies
Mado
by Valued Contributor II
  • 8111 Views
  • 4 replies
  • 2 kudos

Resolved! Pandas API on Spark, Does it run on a multi-node cluster?

Hi, I have a few questions about "Pandas API on Spark". Thanks for your time to read my questions1) Input to these functions are Pandas DataFrame or PySpark DataFrame?2) When I use any pandas function (like isna, size, apply, where, etc ), does it ru...

  • 8111 Views
  • 4 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi @Mohammad Saber​ , Pandas dataset lives in the single machine, and is naturally iterable locally within the same machine. However, pandas-on-Spark dataset lives across multiple machines, and they are computed in a distributed manner. It is difficu...

  • 2 kudos
3 More Replies
Markus
by New Contributor II
  • 2805 Views
  • 2 replies
  • 2 kudos

dbutils.notebook.run raises HTTP 401 Unauthorized Error

Hello,since a while I use dbutils.notebook.run for multiple calling of additional notebooks and passing parameters to them. So far I could use the function without any difficulties - also today.But since a few hours now I get the following error mess...

  • 2805 Views
  • 2 replies
  • 2 kudos
Latest Reply
Markus
New Contributor II
  • 2 kudos

Hello Community,the issue occurred due to a changed central configuration.Recommendation by Databricks: "Admin Protection: New feature and security recommendations for No Isolation Shared clusters"Here is the link to the current restrictions: Enable ...

  • 2 kudos
1 More Replies
NOOR_BASHASHAIK
by Contributor
  • 4161 Views
  • 4 replies
  • 4 kudos

Azure Databricks VM type for OPTIMIZE with ZORDER on a single column

DearsI was trying to check what Azure Databricks VM type is best suited for executing OPTIMIZE with ZORDER on a single timestamp value (but string data type) column for around 5000+ tables in the Delta Lake.I chose Standard_F16s_v2 with 6 workers & 1...

image image image image
  • 4161 Views
  • 4 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi,The Standard_F16s_v2 is a compute optimize type machine. On the other-hand, for delta optimize (both bin-packing and Z-Ordering), we recommend Stabdard_DS_v2-series. Also, follow Hubert's recommendations.

  • 4 kudos
3 More Replies
KKo
by Contributor III
  • 4075 Views
  • 2 replies
  • 7 kudos

Incompatible format detected while writing in Parquet format.

I am writing/reading data from Azure databricks to data lake. I wrote dataframe to a path in delta format using query a below, later I realized that I need the data in parquet format, and I went to the storage account and manually deleted the filepat...

  • 4075 Views
  • 2 replies
  • 7 kudos
Latest Reply
KKo
Contributor III
  • 7 kudos

Update: I tried Clear state and outputs which did not help, but when I restarted the cluster it worked without an issue. Though the issue is fixed, I still don't know what caused the issue to come in.

  • 7 kudos
1 More Replies
John_BardessGro
by New Contributor II
  • 6029 Views
  • 2 replies
  • 4 kudos

Cluster Reuse for delta live tables

I have several delta live table notebooks that are tied to different delta live table jobs so that I can use multiple target schema names. I know it's possible to reuse a cluster for job segments but is it possible for these delta live table jobs (w...

  • 6029 Views
  • 2 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

The same DLT job (workflow) will use the same cluster in development mode (shutdown in 2h) and new in production (shutdown 0). Although in JSON, you can manipulate that value:{ "configuration": { "pipelines.clusterShutdown.delay": "60s" } }Yo...

  • 4 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 5127 Views
  • 3 replies
  • 4 kudos

How do you structure and storage you medallion architecture ?

Hi guys,How you suggestion about how to create a medalion archeterure ? how many and what datalake zones, how store data, how databases used to store, anuthing I think that zones:1.landing zone, file storage in /landing_zone - databricks database.bro...

  • 5127 Views
  • 3 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @William Scardua​ ,I will highly recommend you to use Delta Live Tables (DLT) for your use case. Please check the docs with sample notebooks here https://docs.databricks.com/workflows/delta-live-tables/index.html

  • 4 kudos
2 More Replies
Chris_Shehu
by Valued Contributor III
  • 3959 Views
  • 1 replies
  • 5 kudos

Resolved! Getting errors while following Microsoft Databricks Best-Practices for DevOps Integration

I'm currently trying to follow the Software engineering best practices for notebooks - Azure Databricks guide but I keep running into the following during step 4.5: Run the test============================= test session starts =======================...

image.png image image image
  • 3959 Views
  • 1 replies
  • 5 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 5 kudos

Closing the loop on this in case anyone gets stuck in the same situation. You can see in the images that the transforms_test.py shows a different icon then the testdata.csv. This is because it was saved as a juypter notebook not a .py file. When the ...

  • 5 kudos
140015
by New Contributor III
  • 1288 Views
  • 1 replies
  • 0 kudos

Resolved! Is S3 dbfs mount faster then direct access?

Hi,Is there any speed difference between mounted s3 bucket and direct access during reading/writing delta tables or other type of files? I tried to find something in docs, but didn't found anything.

  • 1288 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 0 kudos

Hi @Jacek Dembowiak​ , behind the scenes, mounting an S3 bucket and reading from it works the same way as directly accessing it. Mounts are just metadata, the underlying access mechanism is the same for both the scenarios you mentioned. Mounting the ...

  • 0 kudos
Mado
by Valued Contributor II
  • 1930 Views
  • 2 replies
  • 3 kudos

How to apply Pandas functions on PySpark DataFrame?

Hi, I want to apply Pandas functions (like isna, concat, append, etc) on PySpark DataFrame in such a way that computations are done on multi-node cluster.I don't want to convert PySpark DataFrame into Pandas DataFrame since, I think, only one node is...

  • 1930 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

The best is to use pandas on a spark, it is virtually interchangeable so it just different API for Spark data frameimport pyspark.pandas as ps   psdf = ps.range(10) sdf = psdf.to_spark().filter("id > 5") sdf.show()

  • 3 kudos
1 More Replies
AJDJ
by New Contributor III
  • 6583 Views
  • 9 replies
  • 4 kudos

Delta Lake Demo - Not working

Hi there, I imported the delta lake demo notebook from databricks link and at command 12 it errors out. I tired other ways and path but couldnt get past the error. May be the notebook is outdated?https://www.databricks.com/notebooks/Demo_Hub-Delta_La...

  • 6583 Views
  • 9 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @AJ DJ​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 4 kudos
8 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels