cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jimbo
by New Contributor II
  • 9615 Views
  • 0 replies
  • 0 kudos

Pyspark datatype missing microsecond precision last three SSS: h:mm:ss:SSSSSS - datetype

Hi all,We are having issues with the datetype data type in spark when ingesting files.Effectively the source data has 6 microseconds worth of precision but the most we can extract from the datatype is three. For example 12:03:23.123, but what is requ...

Data Engineering
pyspark datetype precision missing
  • 9615 Views
  • 0 replies
  • 0 kudos
Sangram
by New Contributor III
  • 2709 Views
  • 0 replies
  • 0 kudos

Unable to mount ADLS gen2 to databricks file system

I am unable to mount ADLS gen2 storage path into databricks storage path.It is throwing error as unsupported azure scheme:abfssMay I know the reason.Below are the steps that I followed: -1. create a service principal2. store the service principal's s...

Sangram_0-1700274947304.png
  • 2709 Views
  • 0 replies
  • 0 kudos
Rdipak
by New Contributor II
  • 1838 Views
  • 2 replies
  • 0 kudos

Delta live table blocks pipeline autoloader rate limit

I have created a ETL pipeline with DLT. My first step is to ingest into raw delta table using autoloader file notification. when I have 20k notification pipe line run well across all stages. But when we have surge in number of messages pipeline waits...

  • 1838 Views
  • 2 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

Did you try following options:.option('cloudFiles.maxFilesPerTrigger', 10000) or maxBytesPerTrigger ?

  • 0 kudos
1 More Replies
kulkpd
by Contributor
  • 3094 Views
  • 2 replies
  • 2 kudos

Resolved! Autoloader with filenotification

I am using DLT with filenotification and DLT job is just fetching 1 notification from SQS queue at a time. My pipeline is expected to process 500K notifications per day but it running hours behind. Any recommendations?spark.readStream.format("cloudFi...

  • 3094 Views
  • 2 replies
  • 2 kudos
Latest Reply
Rdipak
New Contributor II
  • 2 kudos

Can you set this value to higher number and trycloudFiles.fetchParallelism its 1 by default

  • 2 kudos
1 More Replies
AndrewSilver
by New Contributor II
  • 1657 Views
  • 1 replies
  • 1 kudos

Uncertainty on Databricks job variables: {{run_id}}, {{parent_run_id}}.

In Azure's Databricks jobs, {{run_id}} and {{parent_run_id}} serve as variables. In jobs with multiple tasks, {{run_id}} aligns with task_run_id, while {{parent_run_id}} matches job_run_id. In single-task jobs, {{parent_run_id}} aligns with task_run_...

  • 1657 Views
  • 1 replies
  • 1 kudos
Latest Reply
kulkpd
Contributor
  • 1 kudos

I am using job with single task and multiple retry.Upon job retry the run_id get changed, I tried to using  {{parent_run_id}} but never worked so switched to val parentRunId = dbutils.notebook.getContext.tags("jobRunOriginalAttempt")

  • 1 kudos
Shawn_Eary
by Contributor
  • 1884 Views
  • 0 replies
  • 0 kudos

Streaming Delta Live Tables Cluster Management

If I use code like this: -- 8:56 -- https://youtu.be/PIFL7W3DmaY?si=MWDSiC_bftoCh4sH&t=536 CREATE STREAMING LIVE TABLE report AS SELECT * FROM cloud_files("/mydata", "json") To create a STREAMING Delta Live Table though the Workflows Section of...

  • 1884 Views
  • 0 replies
  • 0 kudos
Direo
by Contributor II
  • 9802 Views
  • 2 replies
  • 3 kudos

Resolved! JavaPackage object is not callable - pydeequ

Hi!When I run a notebook on databricks, it throws error - " 'JavaPackage' object is not callable" which points to pydeequ library:/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/che...

  • 9802 Views
  • 2 replies
  • 3 kudos
Latest Reply
JSatiro
New Contributor II
  • 3 kudos

Hi. If you are struggling like I was, these were the steps I followed to make it work:1 - Created a cluster with Runtime 10.4 LTS, which has spark version 3.2.1 (it should work with more recent runtimes, but be aware of the spark version)2 - When cre...

  • 3 kudos
1 More Replies
DatabricksNIN
by New Contributor II
  • 1849 Views
  • 1 replies
  • 0 kudos

Pulling data from Azure Boards (Specifically 'Analytics Views' into databricks

Building upon a previous post/topic from one year ago.. I am looking for best practises/examples on how to pull data  from Azure Boards and specifically from 'Analytics Views' into databricks for analysis.I have succeeded in doing so with 'Work Items...

  • 1849 Views
  • 1 replies
  • 0 kudos
VtotheG
by New Contributor
  • 6156 Views
  • 0 replies
  • 0 kudos

Problem Visual Studio Plugin with custom modules

We are using the Databricks Visual Studio Plugin to write our python / spark code.We are using the upload file to databricks functionality because our organisation has turned unity catelog off. We are now running into a weird bug with custom modules....

Data Engineering
databricks visual studio plug in
visual studio code
  • 6156 Views
  • 0 replies
  • 0 kudos
aseufert
by New Contributor III
  • 9098 Views
  • 7 replies
  • 4 kudos

Dynamic Value References Not Working

I can't get the dynamic value references to work in my jobs. I can use the deprecate references (e.g. job_id) but not the new references (e.g. job.id). As a test, I set a text widget called MyJobID following the example that will receive the dynamic ...

  • 9098 Views
  • 7 replies
  • 4 kudos
Latest Reply
themattmorris
New Contributor III
  • 4 kudos

For what it's worth, it looks like job-level parameters were added with this update as well. I was wondering why I was unable to use those, but those are also working for me now.

  • 4 kudos
6 More Replies
thibault
by Contributor III
  • 4429 Views
  • 1 replies
  • 0 kudos

Resolved! Import notebook content into a python file

Hi, I have a workflow based on python scripts. How can I import the content of a notebook where a class and functions are defined?I know how to import python files into notebooks, but the other way around doesn't seem as straight forward.

  • 4429 Views
  • 1 replies
  • 0 kudos
Latest Reply
thibault
Contributor III
  • 0 kudos

Found a solution executing a notebook, using the databricks api to download the notebook content as bytes :1. set environment variables DATABRICKS_HOST and DATABRICKS_TOKEN2. w = WorkspaceClient()with w.workspace.download(notebook_path) as n: note...

  • 0 kudos
pgruetter
by Contributor
  • 4354 Views
  • 1 replies
  • 1 kudos

Help me understand streaming logic with Delta Tables

Hello allI have a delta table in bronze layer, let's call it BRZ. It contains 25B rows and many duplicates. It has a version 0 and a version 1, nothing else yet. I then create a silver table SLV by running one deduplication batch job. This creates ve...

  • 4354 Views
  • 1 replies
  • 1 kudos
Latest Reply
pgruetter
Contributor
  • 1 kudos

Thanks for the confirmation. Not sure I see everything as your text gets truncated, but it basically confirms that it should work.Anyway: It looks like the incremental load is working. The problem here is, that we receive late arriving facts that tou...

  • 1 kudos
alexiswl
by Contributor
  • 12270 Views
  • 3 replies
  • 0 kudos

Resolved! Create a UDF Table Function with DLT in UC

Hello, I am trying to generate a DLT but need to use a UDF Table Function in the process.  This is what I have so far, everything works (without e CREATE OR REFRESH LIVE TABLE wrapper)```sqlCREATE OR REPLACE FUNCTION silver.portal.get_workflows_from_...

  • 12270 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@alexiswl - could you please use CREATE OR REPLACE function instead of CREATE OR REFRESH LIVE table?

  • 0 kudos
2 More Replies
Labels