cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

chethankumar
by New Contributor III
  • 1074 Views
  • 2 replies
  • 0 kudos

How to add existing recipient to existing delta share

I have created a recipient using the data bricks in the console, also, I have created data bricks using the console, Now I want to map the existing  recipient to the existing  delta share, Is there a way to do this using Terraform   

  • 1074 Views
  • 2 replies
  • 0 kudos
Latest Reply
chethankumar
New Contributor III
  • 0 kudos

@SathyaSDE Thank you for your response. I am looking into a Terraform-based approach for this.

  • 0 kudos
1 More Replies
Brad
by Contributor II
  • 1511 Views
  • 2 replies
  • 1 kudos

Resolved! dropDuplicates inside foreachBatch

Hi,If I use dropDuplicates inside foreachBatch, the dropDuplicates will become stateless and no state. It just drop duplicates for the current micro batch so I don't have to specify watermark. Is this true?Thanks

  • 1511 Views
  • 2 replies
  • 1 kudos
Latest Reply
navallyemul
New Contributor III
  • 1 kudos

Yes, you're correct! When using dropDuplicates within foreachBatch, it operates only on the current micro-batch, so it removes duplicates in a stateless manner for each batch independently. Since there's no continuous state tracking across batches, y...

  • 1 kudos
1 More Replies
Subhodeep
by New Contributor
  • 1491 Views
  • 2 replies
  • 0 kudos

Fetching queries submitted for review via Genie

Hi All,I wanted to know if there is a way to export the list of queries submitted for review via Genie using an API call?I know there is an API to fetch the query run history but what i need to fetch the list of reviews via Genie.I would be great if ...

  • 1491 Views
  • 2 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi Subhodeep, this new system table "system.access.assistant_events" is closer to what you are looking for ? But it looks like with this new system table introduced, seems like we can expect more around this sooner (in future release).More informatio...

  • 0 kudos
1 More Replies
mmenjivar
by New Contributor II
  • 1637 Views
  • 1 replies
  • 0 kudos

How to use SQL Streaming tables

We have been testing the usage of Streaming Tables in our pipelines with different results depending on the streaming sourceFor Streaming Tables reading from read_files everything works as expectedFor Streaming Tables reading from read_kafka we have ...

  • 1637 Views
  • 1 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi - Pls note:1) Structure streaming & Delta live table are two different options and have different syntax's.2) You cannot execute DLT code on notebook directly. It can be run as job.Pls refer belowhttps://docs.databricks.com/en/delta-live-tables/sq...

  • 0 kudos
alpine
by New Contributor
  • 9091 Views
  • 4 replies
  • 0 kudos

Deploy lock force acquired error when deploying asset bundle using databricks cli

I'm running this command on a DevOps pipeline.databricks bundle deploy -t devI receive this error and have tried using --force-lock but it still doesn't work.Error: deploy lock force acquired by name@company.com at 2024-02-20 16:38:34.99794209 +0000 ...

  • 9091 Views
  • 4 replies
  • 0 kudos
Latest Reply
manish1987c
New Contributor III
  • 0 kudos

you can use below commanddatabricks bundle deploy -t dev --force-lock

  • 0 kudos
3 More Replies
HeronPePrestSer
by New Contributor
  • 1030 Views
  • 1 replies
  • 0 kudos

EXECUTE IMMEDIATE works with JDBC connection ???

 Hello, i need help I am trying to use the EXECUTE IMMEDIATE command to perform DELETE or DROP operations on a table located on a remote SQL server (on-premises) using a JDBC connection from a notebook in the Databricks environment.While I can succes...

  • 1030 Views
  • 1 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi - what error are you getting ? Do you have sufficient permission to drop / delete a table ? 

  • 0 kudos
sebasv
by New Contributor II
  • 1365 Views
  • 2 replies
  • 0 kudos

Inconsistent behaviour in group by and order by

Consider this minimal example:with t as (select explode(sequence(1,10,1)) as id)select (id%2) as id from tgroup by idorder by idI would expect an ambiguous column name exception, since the grouping and sorting could apply to 2 different `id` columns....

  • 1365 Views
  • 2 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi,This is not an issue, pls understand order of execution of SQL queries. "Order by" clause will always refer to columns selected / displayed (as you are referring as id everywhere I guess there is a confusion).Ambiguous column name exception occurs...

  • 0 kudos
1 More Replies
sebasv
by New Contributor II
  • 1430 Views
  • 2 replies
  • 0 kudos

NullpointerException when creating a notebook widget

To reproduce, execute this line in a notebook (runtime 15.3):dbutils.widgets.multiselect("foo", None, [None])Exception raised: Py4JJavaError: An error occurred while calling o427.createMultiselectWidget. : java.lang.NullPointerException at com.databr...

  • 1430 Views
  • 2 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi - pls see belowI hope it helps !!

  • 0 kudos
1 More Replies
StephanKnox
by New Contributor III
  • 10999 Views
  • 4 replies
  • 2 kudos

Unit Testing with PyTest in Databricks - ModuleNotFoundError

Dear all,I am following the guide in this article: https://docs.databricks.com/en/notebooks/testing.htmlhowever I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/deadmanhide@gmail.com/test...

  • 10999 Views
  • 4 replies
  • 2 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 2 kudos

Hi,After trying a lot I could able to see some success , see if this is what you all are looking for :notebook_test.py   (this is python code file)from pyspark.sql import functions as Fdef sum_values(df    return df.agg(F.sum("value")).first()[0]def ...

  • 2 kudos
3 More Replies
nengen
by New Contributor II
  • 1450 Views
  • 3 replies
  • 0 kudos

Debugging difference between "task time" and execution time for SQL query

I have a pretty large SQL query that has the following stats from the query profiler:Tasks total time: 1.93sExecuting: 27sBased on the information in the query profiler this can be due to tasks waiting for available nodes.How should I approach this t...

  • 1450 Views
  • 3 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@nengen  Try using EXPLAIN EXTENDED: This provides a detailed breakdown of the logical and physical plan of a query in Spark SQL.Based on the EXPLAIN EXTENDED output, here are a few things to consider:Broadcast Exchange: If the join causes data skew,...

  • 0 kudos
2 More Replies
pranav_k1
by New Contributor III
  • 2755 Views
  • 2 replies
  • 2 kudos

Merging data into table using temp view

I am trying to append data into table which already exists with some data in it.I need to create view by joining multiple tables which later will be used to append data in final table .I am able to alter table schema and then run query to insert data...

  • 2755 Views
  • 2 replies
  • 2 kudos
Latest Reply
pranav_k1
New Contributor III
  • 2 kudos

Hi @filipniziol Thanks for your reply My issue is resolved as my fellow developer ran same commands with different name after sometime it worked successfully.FYI - I was running query in same notebook just in different cells also I was running cells ...

  • 2 kudos
1 More Replies
confused_dev
by New Contributor II
  • 43059 Views
  • 7 replies
  • 5 kudos

Python mocking dbutils in unittests

I am trying to write some unittests using pytest, but I am coming accross the problem of how to mock my dbutils method when dbutils isn't being defined in my notebook.Is there a way to do this so that I can unit test individual functions that are uti...

  • 43059 Views
  • 7 replies
  • 5 kudos
Latest Reply
pavlosskev
New Contributor III
  • 5 kudos

Fermin_vicente's answer is pretty good already. Below is how you can do something similar with conftest.py# conftest.py import pytest from unittest.mock import MagicMock from pyspark.sql import SparkSession @pytest.fixture(scope="session") def dbuti...

  • 5 kudos
6 More Replies
johnb1
by Contributor
  • 35143 Views
  • 16 replies
  • 15 kudos

Problems with pandas.read_parquet() and path

I am doing the "Data Engineering with Databricks V2" learning path.I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:%run ../Includes/Classroom-Setup-04.2Screenshot 1: Inside the setup note...

MicrosoftTeams-image MicrosoftTeams-image (1) Capture Capture_2
  • 35143 Views
  • 16 replies
  • 15 kudos
Latest Reply
hebied
New Contributor II
  • 15 kudos

Thanks for sharing bro ..It really helped.

  • 15 kudos
15 More Replies
SRK
by Contributor III
  • 5838 Views
  • 5 replies
  • 7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1.      I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2.      I am using Spark code to read data from Kafka and write into landing...

  • 5838 Views
  • 5 replies
  • 7 kudos
Latest Reply
maddy08
New Contributor II
  • 7 kudos

just to clarify, are you reading kafka and writing into adls in json files? like for each message from kafka is 1 json file in adls ?

  • 7 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels