cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

sashikanth
by New Contributor II
  • 293 Views
  • 1 replies
  • 2 kudos

Liquid clustering within partitions

As the tables are already partitioned. Is it possible to have liquid clustering within a partition or recreating the table is the only option?

  • 293 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 2 kudos

Hi @sashikanth ,No, it's not possible to have liquid clustering within a partition. According to documentation:"You can enable liquid clustering on an existing table or during table creation. Clustering is not compatible with partitioning or ZORDER, ...

  • 2 kudos
dbx_deltaSharin
by New Contributor II
  • 465 Views
  • 3 replies
  • 4 kudos

Databricks job trigger in specific times

Hello,I have a Databricks notebook that processes data and generates a list of JSON objects called "list_json". Each JSON object contains an item called "time_to_send" (in UTC datetime format). I want to find the best way to send these JSON messages ...

  • 465 Views
  • 3 replies
  • 4 kudos
Latest Reply
dbx_deltaSharin
New Contributor II
  • 4 kudos

Hi everyone,Thank you for your responses to my question.@szymon_dybczak, if I understood correctly, your suggestion is based on running the Databricks job in continuous mode. However, this might incur significant costs if the cluster is running every...

  • 4 kudos
2 More Replies
LeoRickli
by New Contributor II
  • 214 Views
  • 1 replies
  • 0 kudos

Different GCP Service Account for cluster (compute) creation?

I have a Databricks workspace that is attached to a GCP Service Account from a project named "random-production-data". I want to create a cluster (compute) on Databricks that uses a different Service Account from another project for isolation purpose...

  • 214 Views
  • 1 replies
  • 0 kudos
Latest Reply
jennie258fitz
New Contributor III
  • 0 kudos

@LeoRickli wrote:I have a Databricks workspace that is attached to a GCP Service Account from a project named "random-production-data". I want to create a cluster (compute) on Databricks that uses a different Service Account from another project for ...

  • 0 kudos
monojmckvie
by New Contributor II
  • 232 Views
  • 1 replies
  • 0 kudos

Databricks Workflow File Based Trigger

Hi All,Is there any way to define multiple paths in file arrival trigger setting for Databricks job?For a single path it's working fine.

  • 232 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Hi @monojmckvie ,You can specify only 1 path as per documentation:https://docs.databricks.com/en/jobs/file-arrival-triggers.html 

  • 0 kudos
SharathE
by New Contributor III
  • 1373 Views
  • 4 replies
  • 0 kudos

Delta Live tables stream output to Kafka

Hello,Wanted to Know if we can write the stream output to a Kafka topic  in DLT pipeline?Please let me know.Thankyou.

  • 1373 Views
  • 4 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

Hi ! Ensure your code is set up to use these libraries. Here is the complete example:  Navigate to your cluster configuration:Go to your Databricks workspace.Click on "Clusters" and select your cluster.Go to the "Libraries" tab.  Install the necessar...

  • 0 kudos
3 More Replies
hyedesign
by New Contributor II
  • 2727 Views
  • 6 replies
  • 0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

  • 2727 Views
  • 6 replies
  • 0 kudos
Latest Reply
seans
New Contributor III
  • 0 kudos

Here is the full message  Exception has occurred: SparkConnectGrpcException (java.io.IOException) Connection reset by peer grpc._channel._MultiThreadedRendezvous: _MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.INTERNAL deta...

  • 0 kudos
5 More Replies
brianbraunstein
by New Contributor II
  • 1173 Views
  • 2 replies
  • 0 kudos

spark.sql not supporting kwargs as documented

This documentation https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.SparkSession.sql.html#pyspark.sql.SparkSession.sql claims that spark.sql() should be able to take kwargs, such that the following should work:display...

  • 1173 Views
  • 2 replies
  • 0 kudos
Latest Reply
adriennn
Contributor II
  • 0 kudos

It's working again in 15.4 LTS

  • 0 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1643 Views
  • 2 replies
  • 2 kudos

foreachBatch

With parameterized SQL queries in Structured Streaming's foreachBatch, there's no longer a need to create temp views for the MERGE command.

structured1.png
  • 1643 Views
  • 2 replies
  • 2 kudos
Latest Reply
adriennn
Contributor II
  • 2 kudos

Note that this functionality broke somewhere between DBR 13.3 and 15, so best bet is 15.4 LTSSolved: Parameterized spark.sql() not working - Databricks Community - 56510

  • 2 kudos
1 More Replies
Michael_Appiah
by Contributor
  • 6619 Views
  • 5 replies
  • 4 kudos

Parameterized spark.sql() not working

Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)Problem: I cannot run any of the examples provided in the PySpark...

Michael_Appiah_0-1704459542967.png Michael_Appiah_1-1704459570498.png
  • 6619 Views
  • 5 replies
  • 4 kudos
Latest Reply
adriennn
Contributor II
  • 4 kudos

Can confirm it's working again, tested on a job cluster with DBR 15.4 LTS. It failed on 14.3 LTS.

  • 4 kudos
4 More Replies
ekdz__
by New Contributor III
  • 5356 Views
  • 5 replies
  • 10 kudos

Is there any way to save the notebook in the "Results Only" view?

Hi! I'm looking for a solution to save a notebook in HTML format that has the "Results Only" view (without the executed code). Is there any possibility to do that?Thank you

  • 5356 Views
  • 5 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

Use option "+New dashboard" in the top menu (picture icon). Add results there (use display() in code to show data), and then you can export the dashboard to HTML.

  • 10 kudos
4 More Replies
Harish2122
by Contributor
  • 13390 Views
  • 9 replies
  • 12 kudos

Databricks SQL string_agg

Migrating some on-premise SQL views to Databricks and struggling to find conversions for some functions. the main one is the string_agg function.string_agg(field_name, ', ')​Anyone know how to convert that to Databricks SQL?​Thanks in advance.

  • 13390 Views
  • 9 replies
  • 12 kudos
Latest Reply
smueller
New Contributor II
  • 12 kudos

If not grouping by something else: SELECT array_join(collect_set(field_name), ',') field_list    FROM table

  • 12 kudos
8 More Replies
jonyvp
by New Contributor III
  • 1068 Views
  • 7 replies
  • 4 kudos

Resolved! Databricks Asset Bundles complex variable for cluster configuration substitute

Using this page of the DAB docs, I tried to substitute cluster configuration by a variable. That way, I want to predefine different job cluster configurations. Doing exactly what is used in the docs yields this error:Error: failed to load [...]/speci...

  • 1068 Views
  • 7 replies
  • 4 kudos
Latest Reply
filipniziol
Contributor III
  • 4 kudos

Amazing! Great!

  • 4 kudos
6 More Replies
nggianno
by New Contributor III
  • 3190 Views
  • 2 replies
  • 10 kudos

How can I activate enzyme for delta live tables (or dlt serverless) ?

Hi!I am using Delta Live Tables and especially materialized views and i want to run a dlt pipeline but not rerun the whole view that cost time, but rather only update and add only the values that have been changed. I saw that Enzyme and does this job...

  • 3190 Views
  • 2 replies
  • 10 kudos
Latest Reply
devmehta
New Contributor III
  • 10 kudos

I want to achieve the same as @nggianno. Please suggest a workaround

  • 10 kudos
1 More Replies
NC
by New Contributor III
  • 270 Views
  • 1 replies
  • 1 kudos

Logging in Databricks

Hi All,I am trying to create a log using python logging package.Is this allowed in databricks and any sample working code that you can share?Thank you for your guidance.Best regards,NC

  • 270 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 1 kudos

Hi @NC ,I recommend below video that guides you how to use logging in databricks:Introduction to Logging and Quality Control in Databricks (youtube.com)

  • 1 kudos
joshbuttler
by New Contributor
  • 298 Views
  • 1 replies
  • 1 kudos

Seeking Advice on Data Lakehouse Architecture with Databricks

I'm currently designing a data lakehouse architecture using Databricks and have a few questions. What are the best practices for efficiently ingesting both batch and streaming data into Delta Lake? Any recommended tools or approaches?

  • 298 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 1 kudos

Hi @joshbuttler,I think the best way is to use auto loader, which  provides a highly efficient way to incrementally process new data, while also guaranteeing each file is processed exactly once.It supports ingestion in a batch mode (Trigger.Available...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels