cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mwoods
by New Contributor III
  • 3071 Views
  • 2 replies
  • 2 kudos

Delta Live Tables error with Kafka SSL

We have a spark streaming job that consumes data from a Kafka topic and writes out to delta tables in Unity Catalog.Looking to refactor it to use Delta Live Tables, but it appears that it is not possible at present to have a DLT Pipeline that can acc...

  • 3071 Views
  • 2 replies
  • 2 kudos
Latest Reply
gabriall
New Contributor II
  • 2 kudos

Indeed its already patched. you just have to configure your pipeline on the "preview" channel.

  • 2 kudos
1 More Replies
Noosphera
by New Contributor III
  • 9618 Views
  • 0 replies
  • 0 kudos

Resolved! How to reinstantiate the Cloudformation template for AWS

Hi Everyone!I am new to Databricks, and had chosen to use the Cloudformation template to create my AWS Workspace. I regretfully must admit I felt creative in the process and varied the suggested stackname and that must have created errors which ended...

Data Engineering
AWS
Cloudformation template
Unity Catalog
  • 9618 Views
  • 0 replies
  • 0 kudos
Erik
by Valued Contributor III
  • 3316 Views
  • 0 replies
  • 0 kudos

Why not enable "decommissioning" in spark?

You can enable "decommissioning" in spark, which causes it to remove work from a worker when it gets a notification from the cloud that the instance goes away (e.g. SPOT instances). This is dissabled by default, but it seems like such a no-brainer to...

  • 3316 Views
  • 0 replies
  • 0 kudos
jimbo
by New Contributor II
  • 9197 Views
  • 0 replies
  • 0 kudos

Pyspark datatype missing microsecond precision last three SSS: h:mm:ss:SSSSSS - datetype

Hi all,We are having issues with the datetype data type in spark when ingesting files.Effectively the source data has 6 microseconds worth of precision but the most we can extract from the datatype is three. For example 12:03:23.123, but what is requ...

Data Engineering
pyspark datetype precision missing
  • 9197 Views
  • 0 replies
  • 0 kudos
Sangram
by New Contributor III
  • 2447 Views
  • 0 replies
  • 0 kudos

Unable to mount ADLS gen2 to databricks file system

I am unable to mount ADLS gen2 storage path into databricks storage path.It is throwing error as unsupported azure scheme:abfssMay I know the reason.Below are the steps that I followed: -1. create a service principal2. store the service principal's s...

Sangram_0-1700274947304.png
  • 2447 Views
  • 0 replies
  • 0 kudos
Rdipak
by New Contributor II
  • 1590 Views
  • 2 replies
  • 0 kudos

Delta live table blocks pipeline autoloader rate limit

I have created a ETL pipeline with DLT. My first step is to ingest into raw delta table using autoloader file notification. when I have 20k notification pipe line run well across all stages. But when we have surge in number of messages pipeline waits...

  • 1590 Views
  • 2 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

Did you try following options:.option('cloudFiles.maxFilesPerTrigger', 10000) or maxBytesPerTrigger ?

  • 0 kudos
1 More Replies
kulkpd
by Contributor
  • 2711 Views
  • 2 replies
  • 2 kudos

Resolved! Autoloader with filenotification

I am using DLT with filenotification and DLT job is just fetching 1 notification from SQS queue at a time. My pipeline is expected to process 500K notifications per day but it running hours behind. Any recommendations?spark.readStream.format("cloudFi...

  • 2711 Views
  • 2 replies
  • 2 kudos
Latest Reply
Rdipak
New Contributor II
  • 2 kudos

Can you set this value to higher number and trycloudFiles.fetchParallelism its 1 by default

  • 2 kudos
1 More Replies
AndrewSilver
by New Contributor II
  • 1389 Views
  • 1 replies
  • 1 kudos

Uncertainty on Databricks job variables: {{run_id}}, {{parent_run_id}}.

In Azure's Databricks jobs, {{run_id}} and {{parent_run_id}} serve as variables. In jobs with multiple tasks, {{run_id}} aligns with task_run_id, while {{parent_run_id}} matches job_run_id. In single-task jobs, {{parent_run_id}} aligns with task_run_...

  • 1389 Views
  • 1 replies
  • 1 kudos
Latest Reply
kulkpd
Contributor
  • 1 kudos

I am using job with single task and multiple retry.Upon job retry the run_id get changed, I tried to using  {{parent_run_id}} but never worked so switched to val parentRunId = dbutils.notebook.getContext.tags("jobRunOriginalAttempt")

  • 1 kudos
Shawn_Eary
by Contributor
  • 1700 Views
  • 0 replies
  • 0 kudos

Streaming Delta Live Tables Cluster Management

If I use code like this: -- 8:56 -- https://youtu.be/PIFL7W3DmaY?si=MWDSiC_bftoCh4sH&t=536 CREATE STREAMING LIVE TABLE report AS SELECT * FROM cloud_files("/mydata", "json") To create a STREAMING Delta Live Table though the Workflows Section of...

  • 1700 Views
  • 0 replies
  • 0 kudos
Direo
by Contributor II
  • 9035 Views
  • 2 replies
  • 3 kudos

Resolved! JavaPackage object is not callable - pydeequ

Hi!When I run a notebook on databricks, it throws error - " 'JavaPackage' object is not callable" which points to pydeequ library:/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/che...

  • 9035 Views
  • 2 replies
  • 3 kudos
Latest Reply
JSatiro
New Contributor II
  • 3 kudos

Hi. If you are struggling like I was, these were the steps I followed to make it work:1 - Created a cluster with Runtime 10.4 LTS, which has spark version 3.2.1 (it should work with more recent runtimes, but be aware of the spark version)2 - When cre...

  • 3 kudos
1 More Replies
NathanE
by New Contributor II
  • 3177 Views
  • 1 replies
  • 1 kudos

Time travel on views

Hello,At my company, we design an application to analyze data, and we can do so on top of external databases such as Databricks. Our application cache some data in-memory and to avoid synchronization issues with the data on Databricks, we rely heavil...

  • 3177 Views
  • 1 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@NathanE As you said, based on below article it may not support currenlty https://docs.databricks.com/en/sql/user/materialized-views.html, but at the same time looks as Materialized View is built on top of table and It is synchronous operation ( when...

  • 1 kudos
DatabricksNIN
by New Contributor II
  • 1629 Views
  • 1 replies
  • 0 kudos

Pulling data from Azure Boards (Specifically 'Analytics Views' into databricks

Building upon a previous post/topic from one year ago.. I am looking for best practises/examples on how to pull data  from Azure Boards and specifically from 'Analytics Views' into databricks for analysis.I have succeeded in doing so with 'Work Items...

  • 1629 Views
  • 1 replies
  • 0 kudos
VtotheG
by New Contributor
  • 5894 Views
  • 0 replies
  • 0 kudos

Problem Visual Studio Plugin with custom modules

We are using the Databricks Visual Studio Plugin to write our python / spark code.We are using the upload file to databricks functionality because our organisation has turned unity catelog off. We are now running into a weird bug with custom modules....

Data Engineering
databricks visual studio plug in
visual studio code
  • 5894 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels