cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Maxi1693
by New Contributor II
  • 1626 Views
  • 4 replies
  • 1 kudos

Monitoring structure streaming in externar sink

Hi! Today working trying to collect some metrics to create a splot in my spark structure streaming. It is configured with a trigger(processingTime="30 seconds") and I am trying to collect data with the following Listener Class (just an example).  # D...

Screenshot 2024-03-08 113453.png
  • 1626 Views
  • 4 replies
  • 1 kudos
Latest Reply
MichTalebzadeh
Contributor III
  • 1 kudos

Hi,I have done further investigation on this.Below I have tried to illustrate the issue through PySpark code def onQueryProgress(self, event): print("onQueryProgress") # Access micro-batch data microbatch_data = event.progre...

  • 1 kudos
3 More Replies
Sen
by New Contributor
  • 4812 Views
  • 8 replies
  • 1 kudos

Resolved! Performance enhancement while writing dataframes into Parquet tables

Hi,I am trying to write the contents of a dataframe into a parquet table using the command below.df.write.mode("overwrite").format("parquet").saveAsTable("sample_parquet_table")The dataframe contains an extract from one of our source systems, which h...

  • 4812 Views
  • 8 replies
  • 1 kudos
Latest Reply
MichTalebzadeh
Contributor III
  • 1 kudos

Hi,I agree with the reply around the benefits of Delta tables, specifically Delta brings additional features,such as ACID transactions and schema evolution. However, I am not sure whether the problem below and I quote "The problem is, this statement ...

  • 1 kudos
7 More Replies
Neha_Gupta
by New Contributor II
  • 701 Views
  • 2 replies
  • 0 kudos

Job Concurrency Queue not working as expected

Hi, We have created a Databricks Jobs in Workflows where concurrent runs are set to 10 and the queue is enabled. We were trying to perform concurrent users testing by triggering 100 job runs using Jmeter script. We have observed that the first 10 job...

  • 701 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Neha_Gupta, If you want to change this behaviour and allow more queued jobs to run in parallel, ensure that your Databricks cluster has sufficient resources (CPU, memory, etc.) to handle the desired concurrency. Job Configuration: Check the job c...

  • 0 kudos
1 More Replies
aditya_pawase
by New Contributor
  • 470 Views
  • 1 replies
  • 0 kudos

How to send dataframe output from Notebook to my angular app?

I have got two data frames in notebook after transforming the data. I want to used that transformed data which will be updating daily in my angular app. how to send it to my angular app.

  • 470 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @aditya_pawase, You can use a server framework (such as Express.js) to create this endpoint.

  • 0 kudos
kmaley
by New Contributor
  • 548 Views
  • 1 replies
  • 0 kudos

Delta Live tables - Refer the output table to load by checking condition for scd type 1

I have a scenario to implement using the delta live tables.I get the id and timestamp column from source and I have to load that into my delta live streaming output table only if the source timestamp for less that the existing value in the output tab...

  • 548 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @kmaley,  Double-check the table and column names, ensuring they match your actual Delta tables.Verify that the schema of the output table matches the expected schema.Confirm that the Delta Live Tables environment is set up correctly.

  • 0 kudos
PassionateDBD
by New Contributor II
  • 1479 Views
  • 1 replies
  • 0 kudos

DLT full refresh

Running a task with full refresh in delta live tables removes existing data and reloads it from scratch. We are ingesting data from an event hub topic and from files. The event hub topic stores messages for seven days after arrival. If we would run a...

  • 1479 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @PassionateDBD, About preserving the full history of data, consider marking certain tables as the “original source.” Unfortunately, there isn’t a direct built-in mechanism to explicitly mark tables as such in Delta Live Tables.However, you can ado...

  • 0 kudos
PassionateDBD
by New Contributor II
  • 2116 Views
  • 1 replies
  • 0 kudos

MLOps + DLT

What are the best practices for using MLOps and DLT together?This page https://learn.microsoft.com/en-us/azure/databricks/compute/access-mode-limitations states that You cannot use a single user cluster to query tables created by a Unity Catalog-enab...

  • 2116 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

  Hi @PassionateDBD , As per the provided links, DLT tables created by a Unity Catalog-enabled pipeline cannot be queried using a single-user cluster. Instead, a shared cluster with Databricks Runtime 13.1 and above is required.MLOps cluster...

  • 0 kudos
raghu2
by New Contributor III
  • 1573 Views
  • 1 replies
  • 0 kudos

DAB run

Hello All,I am running this command : databricks bundle run -t dev dltPpl_job --debugBundle name: dltPpl. Bundle was generated using: databricks bundle init --target devError message: Error: exit status 1Failed to marshal state to json: unsupported a...

  • 1573 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @raghu2, Could you please verify that your bundle configuration is correctly set up? Ensure that the “for_each_task” attribute is used appropriately.

  • 0 kudos
Kaniz_Fatma
by Community Manager
  • 2645 Views
  • 2 replies
  • 0 kudos
  • 2645 Views
  • 2 replies
  • 0 kudos
Latest Reply
_raman_
New Contributor II
  • 0 kudos

I have tried to connect the mysql database using above code but failed to connect.getting this as error : DatabaseError: 2003 (HY000): Can't connect to MySQL server on 'localhost:3306' (111)and after using host as 127.0.0.1 getting this as error : Da...

  • 0 kudos
1 More Replies
Vishwanath_Rao
by New Contributor II
  • 914 Views
  • 2 replies
  • 0 kudos

Photon plan invariant violated Error

We've run into a niche error where we get the below message only on our non prod environment, with the same data, with the same code as our prod environment.org.apache.spark.sql.execution.adaptive.InvalidAQEPlanException: Photon plan invariant violat...

  • 914 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vishwanath_Rao
New Contributor II
  • 0 kudos

Thank you @Kaniz_Fatma! It looks like the issue was with a recent release on the databricks end, I'd raised a support ticket just to be sure, the difference also died down when we set these two up, at the cluster level.spark.conf.set("spark.sql.adapt...

  • 0 kudos
1 More Replies
slothPetete
by New Contributor II
  • 1282 Views
  • 2 replies
  • 0 kudos

Error with mosaic.enable_mosaic() when created DLT Pipeline with Mosaic lib

The error was raised when I tried to start a DLT pipeline with simple code, just to start experimenting the DLT. The primary library was Mosaic, which is already instructed to installed first before importing. The code is roughly as follow $ %pip ins...

Data Engineering
Delta Live Table
dlt
geospatial
mosaic
  • 1282 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @slothPetete, The error message you’re encountering, “OAuth error response, generally means someone clicked cancel: access_denied (errorCode=180002),” indicates that the authorization process was interrupted or denied. Let’s break it down: OA...

  • 0 kudos
1 More Replies
vdeorios
by New Contributor II
  • 2321 Views
  • 4 replies
  • 2 kudos

Resolved! 404 on GET Billing usage data (API)

I'm trying to get my billing usage data from Databricks API (documentation: https://docs.databricks.com/api/gcp/account/billableusage/download) but I keep getting an 404 error.Code:import requestsimport jsontoken = dbutils.notebook.entry_point.getDbu...

  • 2321 Views
  • 4 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

HIi @vdeorios , Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 2 kudos
3 More Replies
ac0
by New Contributor III
  • 1611 Views
  • 2 replies
  • 4 kudos

Resolved! Can I have additional logic in a DLT notebook that is unrelated to directly creating DLTs?

I have an Azure Storage Data Table that I would like to update based on records that were just streamed into a Delta Live Table. Below is example code:@Dlt.create_table( comment="comment", table_properties={ "pipelines.autoOptimize.managed": ...

  • 1611 Views
  • 2 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Moderator
  • 4 kudos

Hi @ac0, Please check @raphaelblg 's response and let us know if this helped to resolve your issue. If it did, please mark it as the accepted solution.

  • 4 kudos
1 More Replies
Jennifer
by New Contributor III
  • 5079 Views
  • 5 replies
  • 0 kudos

Resolved! Import python file to notebook doesn't work

I followed the documentation here under the section "Import a file into a notebook" to import a shared python file among notebooks used by delta live table. But it sometimes can find the module, sometimes not and returns me exception No module named ...

  • 5079 Views
  • 5 replies
  • 0 kudos
Latest Reply
Vartika
Moderator
  • 0 kudos

Thank you so much for getting back to us @Jennifer MJ​ . It's really great of you to send in the solution. Would you be happy to mark the answer as best so other community members can find the solution quickly and easily? We really appreciate your ti...

  • 0 kudos
4 More Replies
117074
by New Contributor III
  • 2092 Views
  • 3 replies
  • 0 kudos

Notebook Visualisations suddenly not working

Hi all,I have a python script which runs SQL code against our Delta Live Tables and returns a pandas dataframe. I do this multiple times and then use 'display(pandas_dataframe)'. Once this displays I then create a visualization from the UI which is t...

  • 2092 Views
  • 3 replies
  • 0 kudos
Latest Reply
117074
New Contributor III
  • 0 kudos

Thank you for the detailed response Kaniz, I appreciate it! I do think it may have been cache issues due to there being no spark computation when running them when the error occured.It did lead me down a train of thought.. is it possible to extract t...

  • 0 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels