cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Raymond_Garcia
by Contributor II
  • 8303 Views
  • 4 replies
  • 4 kudos

Resolved! Issue with databricks and DRIVER_LIBRARY_INSTALLATION_FAILURE.?

I have like 5 maven libraries, and with all of them, I have the same issue with Job or Notebooks. How much do I have to wait? is there another solution? Thank you very much!

issue with databricks
  • 8303 Views
  • 4 replies
  • 4 kudos
Latest Reply
Debayan
Databricks Employee
  • 4 kudos

@Raymond Garcia​ , could you please open a support case with Databricks for the same? We will triage the issue and provide a solution.

  • 4 kudos
3 More Replies
mmlime
by New Contributor III
  • 2772 Views
  • 4 replies
  • 0 kudos

Resolved! Can I use VMs from Pool for my Workflow cluster?

Hi,there is no option to take VMs from a Pool for a new workflow (Azure Cloud)?default schema for a new cluster:{ "num_workers": 0, "spark_version": "10.4.x-scala2.12", "spark_conf": { "spark.master": "local[*, 4]", "spark...

  • 2772 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 0 kudos

@Michal Mlaka​ I just checked on the UI and I could find the pools listing under worker type in a job cluster configuration. It should work.

  • 0 kudos
3 More Replies
kthneighbor
by New Contributor II
  • 3180 Views
  • 5 replies
  • 2 kudos

Resolved! What will be the next LTS version after 10.4?

What will be the next LTS version after 10.4?

  • 3180 Views
  • 5 replies
  • 2 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 2 kudos

Hello, 11.3 LTS is now available https://learn.microsoft.com/en-us/azure/databricks/release-notes/runtime/11.3

  • 2 kudos
4 More Replies
HenriqueMoniz
by New Contributor II
  • 1815 Views
  • 1 replies
  • 2 kudos

How to access Delta Live Tables feature?

Hi, I tried following the Delta Live Tables quickstart (https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-quickstart.html), but I don't see the Pipelines tab under the Jobs page in my workspace. The same guide mentions...

  • 1815 Views
  • 1 replies
  • 2 kudos
Latest Reply
virbickt
New Contributor III
  • 2 kudos

Hi, you need a Premium workspace for the Pipelines tab to show up. This is what I see on my workspace with Standard Pricing Tier selected: And this is what what I see on my workspace with the Premium Pricing Tier:

  • 2 kudos
THIAM_HUATTAN
by Valued Contributor
  • 3846 Views
  • 3 replies
  • 3 kudos

Using R, how do we write csv file to say dbfs:/tmp?

let us say I already have the data 'TotalData'write.csv(TotalData,file='/tmp/TotalData.csv',row.names = FALSE)I do not see any error from abovewhen I list files below:%fs ls /tmpI do not see any files written there. Why?

  • 3846 Views
  • 3 replies
  • 3 kudos
Latest Reply
Cedric
Databricks Employee
  • 3 kudos

Hi Thiam,Thank you for reaching out to us. In this case it seems that you have written a file to the OS /tmp and tried to fetch the same folder in DBFS.Written >> /tmp/TotalData.csvReading >> /dbfs/tmp/TotalData.csvPlease try to execute write.csv wit...

  • 3 kudos
2 More Replies
jm99
by New Contributor III
  • 2666 Views
  • 2 replies
  • 3 kudos

Ingesting Kafka Avro into an Delta STREAMING LIVE TABLE

Using Azure Databricks:I can create a DLT table in python usingimport dlt import pyspark.sql.functions as fn from pyspark.sql.types import StringType   @dlt.table( name = "<<landingTable>>", path = "<<storage path>>", comment = "<< descri...

  • 2666 Views
  • 2 replies
  • 3 kudos
Latest Reply
lninza
New Contributor II
  • 3 kudos

Hi @John Mathews​  did you find a way to progress here?i am stuck in the same point...

  • 3 kudos
1 More Replies
jon1
by New Contributor II
  • 1138 Views
  • 1 replies
  • 0 kudos

How to dedupe a source table prior to merge through JDBC SQL driver integration

Hi!We're working with change event data from relational and NoSQL databases then processing and ingesting that into DataBricks. It's streamed from source to our messaging platform. Then, our connector is pushing to DataBricks.Right now we're doing th...

  • 1138 Views
  • 1 replies
  • 0 kudos
Latest Reply
jon1
New Contributor II
  • 0 kudos

Update on the theory we are looking at. It'd be similar to below (with necessary changes to support best practices for MERGE such as reducing the search space):-- View for deduping pre-merge CREATE OR REPLACE TEMPORARY VIEW {view} AS SELECT * EXCEPT ...

  • 0 kudos
NickMendes
by New Contributor III
  • 1920 Views
  • 3 replies
  • 2 kudos

Resolved! Alert e-mail is not recognizing my html text

I've always used alert e-mail notifications with my custom message, written in HTML. The problem is that today it suddenly is not working anymore and I'm getting the alert e-mail notification distorted, as HTML doesn't work anymore.Does anyone know w...

  • 1920 Views
  • 3 replies
  • 2 kudos
Latest Reply
NickMendes
New Contributor III
  • 2 kudos

Apparently, it has been corrected and it is working again. Thank you everyone

  • 2 kudos
2 More Replies
Mado
by Valued Contributor II
  • 8335 Views
  • 4 replies
  • 2 kudos

Resolved! Pandas API on Spark, Does it run on a multi-node cluster?

Hi, I have a few questions about "Pandas API on Spark". Thanks for your time to read my questions1) Input to these functions are Pandas DataFrame or PySpark DataFrame?2) When I use any pandas function (like isna, size, apply, where, etc ), does it ru...

  • 8335 Views
  • 4 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi @Mohammad Saber​ , Pandas dataset lives in the single machine, and is naturally iterable locally within the same machine. However, pandas-on-Spark dataset lives across multiple machines, and they are computed in a distributed manner. It is difficu...

  • 2 kudos
3 More Replies
Markus
by New Contributor II
  • 3016 Views
  • 2 replies
  • 2 kudos

dbutils.notebook.run raises HTTP 401 Unauthorized Error

Hello,since a while I use dbutils.notebook.run for multiple calling of additional notebooks and passing parameters to them. So far I could use the function without any difficulties - also today.But since a few hours now I get the following error mess...

  • 3016 Views
  • 2 replies
  • 2 kudos
Latest Reply
Markus
New Contributor II
  • 2 kudos

Hello Community,the issue occurred due to a changed central configuration.Recommendation by Databricks: "Admin Protection: New feature and security recommendations for No Isolation Shared clusters"Here is the link to the current restrictions: Enable ...

  • 2 kudos
1 More Replies
NOOR_BASHASHAIK
by Contributor
  • 4343 Views
  • 4 replies
  • 4 kudos

Azure Databricks VM type for OPTIMIZE with ZORDER on a single column

DearsI was trying to check what Azure Databricks VM type is best suited for executing OPTIMIZE with ZORDER on a single timestamp value (but string data type) column for around 5000+ tables in the Delta Lake.I chose Standard_F16s_v2 with 6 workers & 1...

image image image image
  • 4343 Views
  • 4 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi,The Standard_F16s_v2 is a compute optimize type machine. On the other-hand, for delta optimize (both bin-packing and Z-Ordering), we recommend Stabdard_DS_v2-series. Also, follow Hubert's recommendations.

  • 4 kudos
3 More Replies
KKo
by Contributor III
  • 4279 Views
  • 2 replies
  • 7 kudos

Incompatible format detected while writing in Parquet format.

I am writing/reading data from Azure databricks to data lake. I wrote dataframe to a path in delta format using query a below, later I realized that I need the data in parquet format, and I went to the storage account and manually deleted the filepat...

  • 4279 Views
  • 2 replies
  • 7 kudos
Latest Reply
KKo
Contributor III
  • 7 kudos

Update: I tried Clear state and outputs which did not help, but when I restarted the cluster it worked without an issue. Though the issue is fixed, I still don't know what caused the issue to come in.

  • 7 kudos
1 More Replies
John_BardessGro
by New Contributor II
  • 6152 Views
  • 2 replies
  • 4 kudos

Cluster Reuse for delta live tables

I have several delta live table notebooks that are tied to different delta live table jobs so that I can use multiple target schema names. I know it's possible to reuse a cluster for job segments but is it possible for these delta live table jobs (w...

  • 6152 Views
  • 2 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

The same DLT job (workflow) will use the same cluster in development mode (shutdown in 2h) and new in production (shutdown 0). Although in JSON, you can manipulate that value:{ "configuration": { "pipelines.clusterShutdown.delay": "60s" } }Yo...

  • 4 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 5344 Views
  • 3 replies
  • 4 kudos

How do you structure and storage you medallion architecture ?

Hi guys,How you suggestion about how to create a medalion archeterure ? how many and what datalake zones, how store data, how databases used to store, anuthing I think that zones:1.landing zone, file storage in /landing_zone - databricks database.bro...

  • 5344 Views
  • 3 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @William Scardua​ ,I will highly recommend you to use Delta Live Tables (DLT) for your use case. Please check the docs with sample notebooks here https://docs.databricks.com/workflows/delta-live-tables/index.html

  • 4 kudos
2 More Replies
Chris_Shehu
by Valued Contributor III
  • 4193 Views
  • 1 replies
  • 5 kudos

Resolved! Getting errors while following Microsoft Databricks Best-Practices for DevOps Integration

I'm currently trying to follow the Software engineering best practices for notebooks - Azure Databricks guide but I keep running into the following during step 4.5: Run the test============================= test session starts =======================...

image.png image image image
  • 4193 Views
  • 1 replies
  • 5 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 5 kudos

Closing the loop on this in case anyone gets stuck in the same situation. You can see in the images that the transforms_test.py shows a different icon then the testdata.csv. This is because it was saved as a juypter notebook not a .py file. When the ...

  • 5 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels