cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

drii_cavalcanti
by New Contributor III
  • 3799 Views
  • 3 replies
  • 0 kudos

DBUtils commands do not work on shared access mode clusters

Hi there,I am trying to upload a file to an s3 bucket. However, none of dbutils commands seem to work neither does the boto3 library. For clusters that have the configuration, except for the shared access mode, seem to work fine.Those are the error m...

  • 3799 Views
  • 3 replies
  • 0 kudos
Latest Reply
mvdilts1
New Contributor II
  • 0 kudos

I am encountering very similar behavior to drii_cavalcanti.  When I use a Shared cluster with an IAM Role specified I can verify that the aws cli is installed but when I run aws sts get-caller-identity I receive the error "Unable to locate credential...

  • 0 kudos
2 More Replies
ziafazal
by New Contributor II
  • 1307 Views
  • 3 replies
  • 0 kudos

How to stop a continuous pipeline which is set to RETRY on FAILURE and failing for some reason

I have created a pipeline which is continuous and set to RETRY on FAILURE. For some reason it keeps failing and retrying. Is there any way I can stop it. Hitting Stop button throws an error.

  • 1307 Views
  • 3 replies
  • 0 kudos
Latest Reply
ziafazal
New Contributor II
  • 0 kudos

Hi @szymon_dybczak I already tried to remove it via REST API but got same error as in the pipeline logs. Eventually, I had to remove workspace to get rid of it.

  • 0 kudos
2 More Replies
jen-metaplane
by New Contributor II
  • 1501 Views
  • 4 replies
  • 1 kudos

How to get catalog and schema from system query table

Hi,We are querying the system.query table to parse query history. If the table in the query is not fully qualified with its catalog and schema, how can we derive the catalog and schema?Thanks,Jen

  • 1501 Views
  • 4 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

There is no straightforward method to get this data. Run the query to check the defaults:SELECT current_catalog() AS default_catalog, current_schema() AS default_schema;Catalog and schema may be changed in the query, so if you have query text you...

  • 1 kudos
3 More Replies
sshukla
by New Contributor III
  • 2579 Views
  • 9 replies
  • 1 kudos

Java heap issue, GC allocation failure while writing data from mysql to adls

Hi Team,I am reading 60 million -80million data from mysql server and writing into ADLS in parquet format but i am getting java heap issue, GC allocation failure and out of memory issue.below are my cluster configuration  Driver - 56GB Ram, 16 coreWo...

  • 2579 Views
  • 9 replies
  • 1 kudos
Latest Reply
shaza606
New Contributor II
  • 1 kudos

Hello good man 

  • 1 kudos
8 More Replies
HansAdriaans
by New Contributor II
  • 2061 Views
  • 1 replies
  • 0 kudos

Can not open socket to local (127.0.0.1)

Hi, I'm running a databricks pipeline hourly using python notebooks checked out from git with on-demand compute (using r6gd.xlarge 32GB + 4 CPU's Gravaton). Most of the times the pipeline runs without problems. However, sometimes the first notebook f...

  • 2061 Views
  • 1 replies
  • 0 kudos
Latest Reply
HansAdriaans
New Contributor II
  • 0 kudos

Short update, I changed the script a bit by simply adding a display function just before the running the collect and this seems to work for now

  • 0 kudos
Dwight
by New Contributor II
  • 2048 Views
  • 3 replies
  • 0 kudos

FileAlreadyExistsException when restarting a structured stream with checkpoint (DBR 14.3)

Structured streams with a checkpoint location, which have been running fine for months, can no longer be restarted properly. When restarting they fail with a FileAlreadyExistsException.I reproduced the issue in the attached pdf. Has anyone else exper...

  • 2048 Views
  • 3 replies
  • 0 kudos
Latest Reply
Dwight
New Contributor II
  • 0 kudos

Apparently the issue was caused by a change in Azure, which was hotfixed cfr https://learn.microsoft.com/en-us/answers/questions/2007592/how-to-fix-the-specified-path-already-exists-issue?comment=question#newest-question-comment.Checkpoints work fine...

  • 0 kudos
2 More Replies
sashikanth
by New Contributor II
  • 836 Views
  • 1 replies
  • 2 kudos

Liquid clustering within partitions

As the tables are already partitioned. Is it possible to have liquid clustering within a partition or recreating the table is the only option?

  • 836 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @sashikanth ,No, it's not possible to have liquid clustering within a partition. According to documentation:"You can enable liquid clustering on an existing table or during table creation. Clustering is not compatible with partitioning or ZORDER, ...

  • 2 kudos
dbx_deltaSharin
by New Contributor II
  • 970 Views
  • 3 replies
  • 4 kudos

Databricks job trigger in specific times

Hello,I have a Databricks notebook that processes data and generates a list of JSON objects called "list_json". Each JSON object contains an item called "time_to_send" (in UTC datetime format). I want to find the best way to send these JSON messages ...

  • 970 Views
  • 3 replies
  • 4 kudos
Latest Reply
dbx_deltaSharin
New Contributor II
  • 4 kudos

Hi everyone,Thank you for your responses to my question.@szymon_dybczak, if I understood correctly, your suggestion is based on running the Databricks job in continuous mode. However, this might incur significant costs if the cluster is running every...

  • 4 kudos
2 More Replies
LeoRickli
by New Contributor II
  • 478 Views
  • 1 replies
  • 0 kudos

Different GCP Service Account for cluster (compute) creation?

I have a Databricks workspace that is attached to a GCP Service Account from a project named "random-production-data". I want to create a cluster (compute) on Databricks that uses a different Service Account from another project for isolation purpose...

  • 478 Views
  • 1 replies
  • 0 kudos
Latest Reply
jennie258fitz
New Contributor III
  • 0 kudos

@LeoRickli wrote:I have a Databricks workspace that is attached to a GCP Service Account from a project named "random-production-data". I want to create a cluster (compute) on Databricks that uses a different Service Account from another project for ...

  • 0 kudos
monojmckvie
by New Contributor II
  • 473 Views
  • 1 replies
  • 0 kudos

Databricks Workflow File Based Trigger

Hi All,Is there any way to define multiple paths in file arrival trigger setting for Databricks job?For a single path it's working fine.

  • 473 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @monojmckvie ,You can specify only 1 path as per documentation:https://docs.databricks.com/en/jobs/file-arrival-triggers.html 

  • 0 kudos
SharathE
by New Contributor III
  • 2978 Views
  • 4 replies
  • 0 kudos

Delta Live tables stream output to Kafka

Hello,Wanted to Know if we can write the stream output to a Kafka topic  in DLT pipeline?Please let me know.Thankyou.

  • 2978 Views
  • 4 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

Hi ! Ensure your code is set up to use these libraries. Here is the complete example:  Navigate to your cluster configuration:Go to your Databricks workspace.Click on "Clusters" and select your cluster.Go to the "Libraries" tab.  Install the necessar...

  • 0 kudos
3 More Replies
hyedesign
by New Contributor II
  • 4182 Views
  • 6 replies
  • 0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

  • 4182 Views
  • 6 replies
  • 0 kudos
Latest Reply
seans
New Contributor III
  • 0 kudos

Here is the full message  Exception has occurred: SparkConnectGrpcException (java.io.IOException) Connection reset by peer grpc._channel._MultiThreadedRendezvous: _MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.INTERNAL deta...

  • 0 kudos
5 More Replies
brianbraunstein
by New Contributor II
  • 1877 Views
  • 2 replies
  • 0 kudos

spark.sql not supporting kwargs as documented

This documentation https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.SparkSession.sql.html#pyspark.sql.SparkSession.sql claims that spark.sql() should be able to take kwargs, such that the following should work:display...

  • 1877 Views
  • 2 replies
  • 0 kudos
Latest Reply
adriennn
Valued Contributor
  • 0 kudos

It's working again in 15.4 LTS

  • 0 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 2321 Views
  • 2 replies
  • 2 kudos

foreachBatch

With parameterized SQL queries in Structured Streaming's foreachBatch, there's no longer a need to create temp views for the MERGE command.

structured1.png
  • 2321 Views
  • 2 replies
  • 2 kudos
Latest Reply
adriennn
Valued Contributor
  • 2 kudos

Note that this functionality broke somewhere between DBR 13.3 and 15, so best bet is 15.4 LTSSolved: Parameterized spark.sql() not working - Databricks Community - 56510

  • 2 kudos
1 More Replies
ekdz__
by New Contributor III
  • 6650 Views
  • 5 replies
  • 10 kudos

Is there any way to save the notebook in the "Results Only" view?

Hi! I'm looking for a solution to save a notebook in HTML format that has the "Results Only" view (without the executed code). Is there any possibility to do that?Thank you

  • 6650 Views
  • 5 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

Use option "+New dashboard" in the top menu (picture icon). Add results there (use display() in code to show data), and then you can export the dashboard to HTML.

  • 10 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels