cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Bilel
by Visitor
  • 34 Views
  • 0 replies
  • 0 kudos

Python library not installed when compute is resized

 Hi,I have a python notebook workflow that uses a job cluster. The cluster lost at least a node (due to Spot Instance Termination) and did an upsize. After that I got an error in my job "Module not found", but the python module was being used before ...

  • 34 Views
  • 0 replies
  • 0 kudos
shsalami
by New Contributor
  • 31 Views
  • 1 replies
  • 0 kudos

Sample streaming table is failed

Running the following databricks sample code in the pipeline: CREATE OR REFRESH STREAMING TABLE customersAS SELECT * FROM cloud_files("/databricks-datasets/retail-org/customers/", "csv") I got error:org.apache.spark.sql.catalyst.ExtendedAnalysisExcep...

  • 31 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
New Contributor III
  • 0 kudos

Hi @shsalami ,The error you're encountering suggests you are attempting to read a Delta table as if it were a collection of CSV files using the cloud_files function.You may run "DESCRIBE DETAIL customers" and check whether this table exists and it is...

  • 0 kudos
sashikanth
by New Contributor
  • 38 Views
  • 1 replies
  • 2 kudos

Liquid clustering within partitions

As the tables are already partitioned. Is it possible to have liquid clustering within a partition or recreating the table is the only option?

  • 38 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Contributor
  • 2 kudos

Hi @sashikanth ,No, it's not possible to have liquid clustering within a partition. According to documentation:"You can enable liquid clustering on an existing table or during table creation. Clustering is not compatible with partitioning or ZORDER, ...

  • 2 kudos
dbx_deltaSharin
by New Contributor II
  • 63 Views
  • 3 replies
  • 4 kudos

Databricks job trigger in specific times

Hello,I have a Databricks notebook that processes data and generates a list of JSON objects called "list_json". Each JSON object contains an item called "time_to_send" (in UTC datetime format). I want to find the best way to send these JSON messages ...

  • 63 Views
  • 3 replies
  • 4 kudos
Latest Reply
dbx_deltaSharin
New Contributor II
  • 4 kudos

Hi everyone,Thank you for your responses to my question.@szymon_dybczak, if I understood correctly, your suggestion is based on running the Databricks job in continuous mode. However, this might incur significant costs if the cluster is running every...

  • 4 kudos
2 More Replies
LeoRickli
by New Contributor II
  • 28 Views
  • 1 replies
  • 0 kudos

Different GCP Service Account for cluster (compute) creation?

I have a Databricks workspace that is attached to a GCP Service Account from a project named "random-production-data". I want to create a cluster (compute) on Databricks that uses a different Service Account from another project for isolation purpose...

  • 28 Views
  • 1 replies
  • 0 kudos
Latest Reply
jennie258fitz
New Contributor
  • 0 kudos

@LeoRickli wrote:I have a Databricks workspace that is attached to a GCP Service Account from a project named "random-production-data". I want to create a cluster (compute) on Databricks that uses a different Service Account from another project for ...

  • 0 kudos
TamD
by New Contributor III
  • 53 Views
  • 3 replies
  • 0 kudos

SELECT from VIEW to CREATE a table or view

Hi; I'm new to Databricks, so apologies if this is a dumb question.I have a notebook with SQL cells that are selecting data from various Delta tables into temporary views.  Then I have a query that joins up the data from these temporary views.I'd lik...

  • 53 Views
  • 3 replies
  • 0 kudos
Latest Reply
Akshay_Petkar
New Contributor III
  • 0 kudos

Hi @TamD A materialized view looks for a permanent table. In your case, since the temporary view used to load the data is deleted after the session expires, the materialized view cannot find it when the session ends. For example, if you create a mate...

  • 0 kudos
2 More Replies
osas
by New Contributor
  • 68 Views
  • 1 replies
  • 1 kudos

databricks academy setup error -data engineering

am trying to run the set up notebook  "_COMMON" for my academy data engineering,am getting the below error: "Configuration dbacademy.deprecation.logging is not available."

  • 68 Views
  • 1 replies
  • 1 kudos
Latest Reply
FlorianC
New Contributor II
  • 1 kudos

Same issue here and the AI Assistant cannot help...JVM stacktrace: org.apache.spark.sql.AnalysisException at com.databricks.sql.connect.SparkConnectConfig$.assertConfigAllowedForRead(SparkConnectConfig.scala:203) at org.apache.spark.sql.connect.ser...

  • 1 kudos
monojmckvie
by New Contributor II
  • 37 Views
  • 1 replies
  • 0 kudos

Databricks Workflow File Based Trigger

Hi All,Is there any way to define multiple paths in file arrival trigger setting for Databricks job?For a single path it's working fine.

  • 37 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
New Contributor III
  • 0 kudos

Hi @monojmckvie ,You can specify only 1 path as per documentation:https://docs.databricks.com/en/jobs/file-arrival-triggers.html 

  • 0 kudos
LeoGaller
by New Contributor II
  • 1809 Views
  • 2 replies
  • 2 kudos

What are the options for "spark_conf.spark.databricks.cluster.profile"?

Hey guys, I'm trying to find what are the options we can pass to spark_conf.spark.databricks.cluster.profileI know looking around that some of the available configs are singleNode and serverless, but there are others?Where is the documentation of it?...

  • 1809 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @LeoGaller , The spark_conf.spark.databricks.cluster.profile configuration in Databricks allows you to specify the profile for a cluster.   Let’s explore the available options and where you can find the documentation. Available Profiles: Sing...

  • 2 kudos
1 More Replies
SharathE
by New Contributor III
  • 685 Views
  • 4 replies
  • 0 kudos

Delta Live tables stream output to Kafka

Hello,Wanted to Know if we can write the stream output to a Kafka topic  in DLT pipeline?Please let me know.Thankyou.

  • 685 Views
  • 4 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

Hi ! Ensure your code is set up to use these libraries. Here is the complete example:  Navigate to your cluster configuration:Go to your Databricks workspace.Click on "Clusters" and select your cluster.Go to the "Libraries" tab.  Install the necessar...

  • 0 kudos
3 More Replies
hyedesign
by New Contributor II
  • 1764 Views
  • 7 replies
  • 0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

  • 1764 Views
  • 7 replies
  • 0 kudos
Latest Reply
seans
New Contributor III
  • 0 kudos

Here is the full message  Exception has occurred: SparkConnectGrpcException (java.io.IOException) Connection reset by peer grpc._channel._MultiThreadedRendezvous: _MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.INTERNAL deta...

  • 0 kudos
6 More Replies
brianbraunstein
by New Contributor II
  • 793 Views
  • 2 replies
  • 0 kudos

spark.sql not supporting kwargs as documented

This documentation https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.SparkSession.sql.html#pyspark.sql.SparkSession.sql claims that spark.sql() should be able to take kwargs, such that the following should work:display...

  • 793 Views
  • 2 replies
  • 0 kudos
Latest Reply
adriennn
Contributor
  • 0 kudos

It's working again in 15.4 LTS

  • 0 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1332 Views
  • 2 replies
  • 2 kudos

foreachBatch

With parameterized SQL queries in Structured Streaming's foreachBatch, there's no longer a need to create temp views for the MERGE command.

structured1.png
  • 1332 Views
  • 2 replies
  • 2 kudos
Latest Reply
adriennn
Contributor
  • 2 kudos

Note that this functionality broke somewhere between DBR 13.3 and 15, so best bet is 15.4 LTSSolved: Parameterized spark.sql() not working - Databricks Community - 56510

  • 2 kudos
1 More Replies
Michael_Appiah
by Contributor
  • 5308 Views
  • 7 replies
  • 4 kudos

Resolved! Parameterized spark.sql() not working

Spark 3.4 introduced parameterized SQL queries and Databricks also discussed this new functionality in a recent blog post (https://www.databricks.com/blog/parameterized-queries-pyspark)Problem: I cannot run any of the examples provided in the PySpark...

Michael_Appiah_0-1704459542967.png Michael_Appiah_1-1704459570498.png
  • 5308 Views
  • 7 replies
  • 4 kudos
Latest Reply
adriennn
Contributor
  • 4 kudos

Can confirm it's working again, tested on a job cluster with DBR 15.4 LTS. It failed on 14.3 LTS.

  • 4 kudos
6 More Replies
sms101
by New Contributor
  • 29 Views
  • 0 replies
  • 0 kudos

Table lineage visibility in Databricks

I’ve observed differences in table lineage visibility in Databricks based on how data is referenced, and I would like to confirm if this is the expected behavior.1. When referencing a Delta table as the source in a query (e.g., df = spark.table("cata...

  • 29 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels