cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Databricks Employee
  • 6857 Views
  • 3 replies
  • 2 kudos

TPC -DS test On databricks

If I want to run TPC-DS test on databricks what are the steps involved, do we have already daya available on databricks file system or I have to download or create from somewhere.

  • 6857 Views
  • 3 replies
  • 2 kudos
Latest Reply
aladda
Databricks Employee
  • 2 kudos

See the spark-sql-perf repo for details on how to run benchmark tests using TPC-DS - https://github.com/databricks/spark-sql-perf

  • 2 kudos
2 More Replies
FabriceDeseyn
by Contributor
  • 1265 Views
  • 1 replies
  • 0 kudos

Bug - data profile internal code

Hi I am not sure how to post a potential bug but I stumble upon the following issue on DBR 13.2.The same code 'sometimes' works on DBR 12.2 LTS. But if I do it on a real table, this issue always occurs. 

FabriceDeseyn_0-1690530658137.png
  • 1265 Views
  • 1 replies
  • 0 kudos
Latest Reply
mathan_pillai
Databricks Employee
  • 0 kudos

Tried reproducing the issue on DBR 13.2, but unable to. find attached the screenshot How intermittently is the issue occurring ?  

  • 0 kudos
Remit
by New Contributor III
  • 4416 Views
  • 1 replies
  • 0 kudos

Resolved! Merge error in streaming case

I have a streaming case, where i stream from 2 sources: source1 and source2. I write to seperate streams to pick the data up from the landing area (step1). then i write 2 extra streams to apply some tranformations in order to give them the same schem...

Data Engineering
MERGE
streaming
  • 4416 Views
  • 1 replies
  • 0 kudos
Latest Reply
Remit
New Contributor III
  • 0 kudos

Solved the problem by changing the cluster settings. The whole thing works when disabling Photon Acceleration...

  • 0 kudos
nyck33
by New Contributor II
  • 4824 Views
  • 0 replies
  • 0 kudos

snowflake python connector import error

```--------------------------------------------------------------------------- ImportError Traceback (most recent call last) File <command-1961894174266859>:1 ----> 1 con = snowflake.connector.connect( 2 user=USER, 3 password=SNOWSQL_PWD, 4 account=A...

  • 4824 Views
  • 0 replies
  • 0 kudos
mwoods
by New Contributor III
  • 3319 Views
  • 2 replies
  • 2 kudos

Delta Live Tables error with Kafka SSL

We have a spark streaming job that consumes data from a Kafka topic and writes out to delta tables in Unity Catalog.Looking to refactor it to use Delta Live Tables, but it appears that it is not possible at present to have a DLT Pipeline that can acc...

  • 3319 Views
  • 2 replies
  • 2 kudos
Latest Reply
gabriall
New Contributor II
  • 2 kudos

Indeed its already patched. you just have to configure your pipeline on the "preview" channel.

  • 2 kudos
1 More Replies
Noosphera
by New Contributor III
  • 9748 Views
  • 0 replies
  • 0 kudos

Resolved! How to reinstantiate the Cloudformation template for AWS

Hi Everyone!I am new to Databricks, and had chosen to use the Cloudformation template to create my AWS Workspace. I regretfully must admit I felt creative in the process and varied the suggested stackname and that must have created errors which ended...

Data Engineering
AWS
Cloudformation template
Unity Catalog
  • 9748 Views
  • 0 replies
  • 0 kudos
Erik
by Valued Contributor III
  • 3511 Views
  • 0 replies
  • 0 kudos

Why not enable "decommissioning" in spark?

You can enable "decommissioning" in spark, which causes it to remove work from a worker when it gets a notification from the cloud that the instance goes away (e.g. SPOT instances). This is dissabled by default, but it seems like such a no-brainer to...

  • 3511 Views
  • 0 replies
  • 0 kudos
jimbo
by New Contributor II
  • 9478 Views
  • 0 replies
  • 0 kudos

Pyspark datatype missing microsecond precision last three SSS: h:mm:ss:SSSSSS - datetype

Hi all,We are having issues with the datetype data type in spark when ingesting files.Effectively the source data has 6 microseconds worth of precision but the most we can extract from the datatype is three. For example 12:03:23.123, but what is requ...

Data Engineering
pyspark datetype precision missing
  • 9478 Views
  • 0 replies
  • 0 kudos
Sangram
by New Contributor III
  • 2571 Views
  • 0 replies
  • 0 kudos

Unable to mount ADLS gen2 to databricks file system

I am unable to mount ADLS gen2 storage path into databricks storage path.It is throwing error as unsupported azure scheme:abfssMay I know the reason.Below are the steps that I followed: -1. create a service principal2. store the service principal's s...

Sangram_0-1700274947304.png
  • 2571 Views
  • 0 replies
  • 0 kudos
Rdipak
by New Contributor II
  • 1742 Views
  • 2 replies
  • 0 kudos

Delta live table blocks pipeline autoloader rate limit

I have created a ETL pipeline with DLT. My first step is to ingest into raw delta table using autoloader file notification. when I have 20k notification pipe line run well across all stages. But when we have surge in number of messages pipeline waits...

  • 1742 Views
  • 2 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

Did you try following options:.option('cloudFiles.maxFilesPerTrigger', 10000) or maxBytesPerTrigger ?

  • 0 kudos
1 More Replies
kulkpd
by Contributor
  • 2911 Views
  • 2 replies
  • 2 kudos

Resolved! Autoloader with filenotification

I am using DLT with filenotification and DLT job is just fetching 1 notification from SQS queue at a time. My pipeline is expected to process 500K notifications per day but it running hours behind. Any recommendations?spark.readStream.format("cloudFi...

  • 2911 Views
  • 2 replies
  • 2 kudos
Latest Reply
Rdipak
New Contributor II
  • 2 kudos

Can you set this value to higher number and trycloudFiles.fetchParallelism its 1 by default

  • 2 kudos
1 More Replies
AndrewSilver
by New Contributor II
  • 1551 Views
  • 1 replies
  • 1 kudos

Uncertainty on Databricks job variables: {{run_id}}, {{parent_run_id}}.

In Azure's Databricks jobs, {{run_id}} and {{parent_run_id}} serve as variables. In jobs with multiple tasks, {{run_id}} aligns with task_run_id, while {{parent_run_id}} matches job_run_id. In single-task jobs, {{parent_run_id}} aligns with task_run_...

  • 1551 Views
  • 1 replies
  • 1 kudos
Latest Reply
kulkpd
Contributor
  • 1 kudos

I am using job with single task and multiple retry.Upon job retry the run_id get changed, I tried to using  {{parent_run_id}} but never worked so switched to val parentRunId = dbutils.notebook.getContext.tags("jobRunOriginalAttempt")

  • 1 kudos
Shawn_Eary
by Contributor
  • 1795 Views
  • 0 replies
  • 0 kudos

Streaming Delta Live Tables Cluster Management

If I use code like this: -- 8:56 -- https://youtu.be/PIFL7W3DmaY?si=MWDSiC_bftoCh4sH&t=536 CREATE STREAMING LIVE TABLE report AS SELECT * FROM cloud_files("/mydata", "json") To create a STREAMING Delta Live Table though the Workflows Section of...

  • 1795 Views
  • 0 replies
  • 0 kudos
Direo
by Contributor II
  • 9469 Views
  • 2 replies
  • 3 kudos

Resolved! JavaPackage object is not callable - pydeequ

Hi!When I run a notebook on databricks, it throws error - " 'JavaPackage' object is not callable" which points to pydeequ library:/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/che...

  • 9469 Views
  • 2 replies
  • 3 kudos
Latest Reply
JSatiro
New Contributor II
  • 3 kudos

Hi. If you are struggling like I was, these were the steps I followed to make it work:1 - Created a cluster with Runtime 10.4 LTS, which has spark version 3.2.1 (it should work with more recent runtimes, but be aware of the spark version)2 - When cre...

  • 3 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels