cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Shawn_Eary
by Contributor
  • 1108 Views
  • 1 replies
  • 1 kudos

Resolved! Streaming Delta Live Tables Cluster Management

If I use code like this: -- 8:56 -- https://youtu.be/PIFL7W3DmaY?si=MWDSiC_bftoCh4sH&t=536 CREATE STREAMING LIVE TABLE report AS SELECT * FROM cloud_files("/mydata", "json") To create a STREAMING Delta Live Table though the Workflows Section of...

  • 1108 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Shawn_Eary, When creating a STREAMING Delta Live Table through the Workflows section of Databricks, it’s essential to understand the associated costs and resource usage.    Let’s break it down:   Delta Live Tables (DLT) Pricing: DLT provides a de...

  • 1 kudos
kulkpd
by Contributor
  • 1679 Views
  • 3 replies
  • 2 kudos

Resolved! Autoloader with filenotification

I am using DLT with filenotification and DLT job is just fetching 1 notification from SQS queue at a time. My pipeline is expected to process 500K notifications per day but it running hours behind. Any recommendations?spark.readStream.format("cloudFi...

  • 1679 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 2 kudos
2 More Replies
Rdipak
by New Contributor II
  • 889 Views
  • 1 replies
  • 0 kudos

Autoloader pass spark configs?

Do we have option to pass spark config variables in terms of executors workers etc while using autoloader?

  • 889 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Rdipak , Certainly! When using the autoloader in Spark, you can configure various parameters related to executors, workers, and other settings.    Let’s explore some options:   Executor Configuration: You can set executor-related configurations u...

  • 0 kudos
jimbo
by New Contributor II
  • 7160 Views
  • 1 replies
  • 0 kudos

Pyspark datatype missing microsecond precision last three SSS: h:mm:ss:SSSSSS - datetype

Hi all,We are having issues with the datetype data type in spark when ingesting files.Effectively the source data has 6 microseconds worth of precision but the most we can extract from the datatype is three. For example 12:03:23.123, but what is requ...

Data Engineering
pyspark datetype precision missing
  • 7160 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @jimbo, Handling high-precision timestamps in Spark can be tricky, especially when you need to preserve microsecond-level precision.    Let’s explore some strategies to achieve your desired timestamp format.   Custom Format String: You’re currentl...

  • 0 kudos
prasadvaze
by Valued Contributor II
  • 19720 Views
  • 2 replies
  • 1 kudos

Resolved! How to start local/city databricks user group?

Hello Lindsey, I would like to start Richmond, VA databricks user group (chapter) . How do I go about doing this? 

  • 19720 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @prasad_vaze, Thank you for your interest in starting a Databricks user group in Richmond, VA! It’s a great initiative to foster collaboration and knowledge sharing among Databricks enthusiasts. I will let my team reach out to you on the same.

  • 1 kudos
1 More Replies
Rdipak
by New Contributor II
  • 986 Views
  • 2 replies
  • 0 kudos

Delta live table blocks pipeline autoloader rate limit

I have created a ETL pipeline with DLT. My first step is to ingest into raw delta table using autoloader file notification. when I have 20k notification pipe line run well across all stages. But when we have surge in number of messages pipeline waits...

  • 986 Views
  • 2 replies
  • 0 kudos
Latest Reply
kulkpd
Contributor
  • 0 kudos

Did you try following options:.option('cloudFiles.maxFilesPerTrigger', 10000) or maxBytesPerTrigger ?

  • 0 kudos
1 More Replies
AndrewSilver
by New Contributor II
  • 799 Views
  • 1 replies
  • 1 kudos

Uncertainty on Databricks job variables: {{run_id}}, {{parent_run_id}}.

In Azure's Databricks jobs, {{run_id}} and {{parent_run_id}} serve as variables. In jobs with multiple tasks, {{run_id}} aligns with task_run_id, while {{parent_run_id}} matches job_run_id. In single-task jobs, {{parent_run_id}} aligns with task_run_...

  • 799 Views
  • 1 replies
  • 1 kudos
Latest Reply
kulkpd
Contributor
  • 1 kudos

I am using job with single task and multiple retry.Upon job retry the run_id get changed, I tried to using  {{parent_run_id}} but never worked so switched to val parentRunId = dbutils.notebook.getContext.tags("jobRunOriginalAttempt")

  • 1 kudos
Direo
by Contributor
  • 6835 Views
  • 3 replies
  • 1 kudos

Resolved! JavaPackage object is not callable - pydeequ

Hi!When I run a notebook on databricks, it throws error - " 'JavaPackage' object is not callable" which points to pydeequ library:/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/che...

  • 6835 Views
  • 3 replies
  • 1 kudos
Latest Reply
JSatiro
New Contributor II
  • 1 kudos

Hi. If you are struggling like I was, these were the steps I followed to make it work:1 - Created a cluster with Runtime 10.4 LTS, which has spark version 3.2.1 (it should work with more recent runtimes, but be aware of the spark version)2 - When cre...

  • 1 kudos
2 More Replies
NathanE
by New Contributor II
  • 1471 Views
  • 1 replies
  • 1 kudos

Time travel on views

Hello,At my company, we design an application to analyze data, and we can do so on top of external databases such as Databricks. Our application cache some data in-memory and to avoid synchronization issues with the data on Databricks, we rely heavil...

  • 1471 Views
  • 1 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@NathanE As you said, based on below article it may not support currenlty https://docs.databricks.com/en/sql/user/materialized-views.html, but at the same time looks as Materialized View is built on top of table and It is synchronous operation ( when...

  • 1 kudos
DatabricksNIN
by New Contributor II
  • 1110 Views
  • 2 replies
  • 0 kudos

Pulling data from Azure Boards (Specifically 'Analytics Views' into databricks

Building upon a previous post/topic from one year ago.. I am looking for best practises/examples on how to pull data  from Azure Boards and specifically from 'Analytics Views' into databricks for analysis.I have succeeded in doing so with 'Work Items...

  • 1110 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @DatabricksNIN , To pull data from Azure Boards and specifically from ‘Analytics Views’ into Databricks for analysis, you can use the Azure DevOps REST API.

  • 0 kudos
1 More Replies
erigaud
by Honored Contributor
  • 2324 Views
  • 3 replies
  • 0 kudos

Combining DLT and workflow - MATERIALIZED_VIEW_OPERATION_NOT_ALLOWED

Hello everyone !I currently have a DLT pipeline that loads into several Delta LIVE tables (both streaming and materialized view).The end table of my DLT pipeline is a materialized view called "silver.my_view".In a later step I need to join/union/merg...

  • 2324 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @erigaud , To read a table from a DLT pipeline with a regular non-shared cluster, you can use the dlt.table function in Databricks.  This function reads data from a table registered in the Hive metastore.

  • 0 kudos
2 More Replies
chari
by Contributor
  • 4716 Views
  • 3 replies
  • 1 kudos

Cant connect power BI desktop to Azure databricks

Hello,I am trying to connect Power BI desktop to azure databricks (source: delta table) by downloading a connection file from Databricks. I see an error message like below when I open the connection file with power BI. Repeated attempts have given th...

  • 4716 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @chari , To resolve this issue, I would recommend checking the following:   Ensure that the connection file you downloaded from Databricks is correct and up-to-date.Check if the Databricks server is up and running.Verify that the Databricks server...

  • 1 kudos
2 More Replies
Michael_Appiah
by Contributor
  • 8157 Views
  • 3 replies
  • 2 kudos

Resolved! Hashing Functions in PySpark

Hashes are commonly used in SCD2 merges to determine whether data has changed by comparing the hashes of the new rows in the source with the hashes of the existing rows in the target table. PySpark offers multiple different hashing functions like:MD5...

  • 8157 Views
  • 3 replies
  • 2 kudos
Latest Reply
Michael_Appiah
Contributor
  • 2 kudos

Hi @Kaniz_Fatma ,thank you for your comprehensive answer. What is your opinion on the trade-off between using a hash like xxHASH64 which returns a LongType column and thus would offer good performance when there is a need to join on the hash column v...

  • 2 kudos
2 More Replies
VtotheG
by New Contributor
  • 5498 Views
  • 0 replies
  • 0 kudos

Problem Visual Studio Plugin with custom modules

We are using the Databricks Visual Studio Plugin to write our python / spark code.We are using the upload file to databricks functionality because our organisation has turned unity catelog off. We are now running into a weird bug with custom modules....

Data Engineering
databricks visual studio plug in
visual studio code
  • 5498 Views
  • 0 replies
  • 0 kudos
alj_a
by New Contributor III
  • 1195 Views
  • 1 replies
  • 1 kudos

Connect databricks delta lake which is hosted in AWS from PowerBI - conn str/push dataset

Hi,I have a requirement. databricks has been hosted in AWS. and, i need to read the delta table from powerbi. tried push dataset but not working. is there any way to connect.we are using Active Directory as company wide

Data Engineering
aws databrics
  • 1195 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @alj_a, it is possible to connect Power BI to Delta Lake tables hosted on Databricks on AWS. You can use the Azure Databricks Power BI connector to connect Power BI Desktop to your Azure Databricks clusters and Databricks SQL warehouses 12.   Here...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels