cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

FabriceDeseyn
by Contributor
  • 816 Views
  • 1 replies
  • 0 kudos

Bug - data profile internal code

Hi I am not sure how to post a potential bug but I stumble upon the following issue on DBR 13.2.The same code 'sometimes' works on DBR 12.2 LTS. But if I do it on a real table, this issue always occurs. 

FabriceDeseyn_0-1690530658137.png
  • 816 Views
  • 1 replies
  • 0 kudos
Latest Reply
mathan_pillai
Databricks Employee
  • 0 kudos

Tried reproducing the issue on DBR 13.2, but unable to. find attached the screenshot How intermittently is the issue occurring ?  

  • 0 kudos
Remit
by New Contributor III
  • 3263 Views
  • 1 replies
  • 0 kudos

Resolved! Merge error in streaming case

I have a streaming case, where i stream from 2 sources: source1 and source2. I write to seperate streams to pick the data up from the landing area (step1). then i write 2 extra streams to apply some tranformations in order to give them the same schem...

Data Engineering
MERGE
streaming
  • 3263 Views
  • 1 replies
  • 0 kudos
Latest Reply
Remit
New Contributor III
  • 0 kudos

Solved the problem by changing the cluster settings. The whole thing works when disabling Photon Acceleration...

  • 0 kudos
nyck33
by New Contributor II
  • 4016 Views
  • 0 replies
  • 0 kudos

snowflake python connector import error

```--------------------------------------------------------------------------- ImportError Traceback (most recent call last) File <command-1961894174266859>:1 ----> 1 con = snowflake.connector.connect( 2 user=USER, 3 password=SNOWSQL_PWD, 4 account=A...

  • 4016 Views
  • 0 replies
  • 0 kudos
mwoods
by New Contributor III
  • 2283 Views
  • 2 replies
  • 2 kudos

Delta Live Tables error with Kafka SSL

We have a spark streaming job that consumes data from a Kafka topic and writes out to delta tables in Unity Catalog.Looking to refactor it to use Delta Live Tables, but it appears that it is not possible at present to have a DLT Pipeline that can acc...

  • 2283 Views
  • 2 replies
  • 2 kudos
Latest Reply
gabriall
New Contributor II
  • 2 kudos

Indeed its already patched. you just have to configure your pipeline on the "preview" channel.

  • 2 kudos
1 More Replies
Noosphera
by New Contributor III
  • 9177 Views
  • 0 replies
  • 0 kudos

Resolved! How to reinstantiate the Cloudformation template for AWS

Hi Everyone!I am new to Databricks, and had chosen to use the Cloudformation template to create my AWS Workspace. I regretfully must admit I felt creative in the process and varied the suggested stackname and that must have created errors which ended...

Data Engineering
AWS
Cloudformation template
Unity Catalog
  • 9177 Views
  • 0 replies
  • 0 kudos
Erik
by Valued Contributor III
  • 2357 Views
  • 0 replies
  • 0 kudos

Why not enable "decommissioning" in spark?

You can enable "decommissioning" in spark, which causes it to remove work from a worker when it gets a notification from the cloud that the instance goes away (e.g. SPOT instances). This is dissabled by default, but it seems like such a no-brainer to...

  • 2357 Views
  • 0 replies
  • 0 kudos
jimbo
by New Contributor II
  • 7704 Views
  • 0 replies
  • 0 kudos

Pyspark datatype missing microsecond precision last three SSS: h:mm:ss:SSSSSS - datetype

Hi all,We are having issues with the datetype data type in spark when ingesting files.Effectively the source data has 6 microseconds worth of precision but the most we can extract from the datatype is three. For example 12:03:23.123, but what is requ...

Data Engineering
pyspark datetype precision missing
  • 7704 Views
  • 0 replies
  • 0 kudos
Sangram
by New Contributor III
  • 2026 Views
  • 0 replies
  • 0 kudos

Unable to mount ADLS gen2 to databricks file system

I am unable to mount ADLS gen2 storage path into databricks storage path.It is throwing error as unsupported azure scheme:abfssMay I know the reason.Below are the steps that I followed: -1. create a service principal2. store the service principal's s...

Sangram_0-1700274947304.png
  • 2026 Views
  • 0 replies
  • 0 kudos
Rdipak
by New Contributor II
  • 1258 Views
  • 2 replies
  • 0 kudos

Delta live table blocks pipeline autoloader rate limit

I have created a ETL pipeline with DLT. My first step is to ingest into raw delta table using autoloader file notification. when I have 20k notification pipe line run well across all stages. But when we have surge in number of messages pipeline waits...

  • 1258 Views
  • 2 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

Did you try following options:.option('cloudFiles.maxFilesPerTrigger', 10000) or maxBytesPerTrigger ?

  • 0 kudos
1 More Replies
kulkpd
by Contributor
  • 2044 Views
  • 2 replies
  • 2 kudos

Resolved! Autoloader with filenotification

I am using DLT with filenotification and DLT job is just fetching 1 notification from SQS queue at a time. My pipeline is expected to process 500K notifications per day but it running hours behind. Any recommendations?spark.readStream.format("cloudFi...

  • 2044 Views
  • 2 replies
  • 2 kudos
Latest Reply
Rdipak
New Contributor II
  • 2 kudos

Can you set this value to higher number and trycloudFiles.fetchParallelism its 1 by default

  • 2 kudos
1 More Replies
AndrewSilver
by New Contributor II
  • 1016 Views
  • 1 replies
  • 1 kudos

Uncertainty on Databricks job variables: {{run_id}}, {{parent_run_id}}.

In Azure's Databricks jobs, {{run_id}} and {{parent_run_id}} serve as variables. In jobs with multiple tasks, {{run_id}} aligns with task_run_id, while {{parent_run_id}} matches job_run_id. In single-task jobs, {{parent_run_id}} aligns with task_run_...

  • 1016 Views
  • 1 replies
  • 1 kudos
Latest Reply
kulkpd
Contributor
  • 1 kudos

I am using job with single task and multiple retry.Upon job retry the run_id get changed, I tried to using  {{parent_run_id}} but never worked so switched to val parentRunId = dbutils.notebook.getContext.tags("jobRunOriginalAttempt")

  • 1 kudos
Shawn_Eary
by Contributor
  • 1390 Views
  • 0 replies
  • 0 kudos

Streaming Delta Live Tables Cluster Management

If I use code like this: -- 8:56 -- https://youtu.be/PIFL7W3DmaY?si=MWDSiC_bftoCh4sH&t=536 CREATE STREAMING LIVE TABLE report AS SELECT * FROM cloud_files("/mydata", "json") To create a STREAMING Delta Live Table though the Workflows Section of...

  • 1390 Views
  • 0 replies
  • 0 kudos
Direo
by Contributor
  • 7748 Views
  • 2 replies
  • 3 kudos

Resolved! JavaPackage object is not callable - pydeequ

Hi!When I run a notebook on databricks, it throws error - " 'JavaPackage' object is not callable" which points to pydeequ library:/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/che...

  • 7748 Views
  • 2 replies
  • 3 kudos
Latest Reply
JSatiro
New Contributor II
  • 3 kudos

Hi. If you are struggling like I was, these were the steps I followed to make it work:1 - Created a cluster with Runtime 10.4 LTS, which has spark version 3.2.1 (it should work with more recent runtimes, but be aware of the spark version)2 - When cre...

  • 3 kudos
1 More Replies
NathanE
by New Contributor II
  • 2102 Views
  • 1 replies
  • 1 kudos

Time travel on views

Hello,At my company, we design an application to analyze data, and we can do so on top of external databases such as Databricks. Our application cache some data in-memory and to avoid synchronization issues with the data on Databricks, we rely heavil...

  • 2102 Views
  • 1 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@NathanE As you said, based on below article it may not support currenlty https://docs.databricks.com/en/sql/user/materialized-views.html, but at the same time looks as Materialized View is built on top of table and It is synchronous operation ( when...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels