cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bhuvnesh
by New Contributor
  • 2758 Views
  • 1 replies
  • 0 kudos

Unity Catalog

Hi,I have requirement to setup the athena tables. We have a unity catalog setup in databricks workspace and I would like know is there any possibility that Athen can be point to unity catalog so that all the tables are available in athena.whenever we...

  • 2758 Views
  • 1 replies
  • 0 kudos
Latest Reply
ArunKhandelwal
New Contributor II
  • 0 kudos

Unfortunately, as of now, there isn't a direct, seamless integration between Unity Catalog and Athena to automatically synchronize table updates.However, here are a few potential approaches to achieve your desired outcome:1. AWS Glue Data Catalog:Man...

  • 0 kudos
Leo_dass
by Databricks Partner
  • 1547 Views
  • 2 replies
  • 0 kudos

I need to use the ojdbc jar file which is in workspace>Driver inside my notebook

Hi, Currently I'm working on a project . Where I need to connect with Oracle database from databricks notebook. To do this I need Ojdbc jar file. But my client has installed that jar file in workspace>Driver and not in the cluster library. So to use ...

  • 1547 Views
  • 2 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 0 kudos

do this in your notebook:# Path to the ojdbc JAR file in the workspacejar_path = "/Workspace/Driver/ojdbc8.jar"# Install the JAR file from the workspace into the clusterdbutils.library.install(jar_path)# Restart the cluster to ensure the JAR file is ...

  • 0 kudos
1 More Replies
Fatimah-Tariq
by New Contributor III
  • 1904 Views
  • 5 replies
  • 0 kudos

Need help with DLT Pipeline

I have a DLT pipeline running daily for months and recently found out one issue in my silver layer code and as a result of that, now I have faulty data in my silver schema. Please note that the tables in Silver schema are streaming tables handled wit...

  • 1904 Views
  • 5 replies
  • 0 kudos
Latest Reply
AngadSingh
New Contributor III
  • 0 kudos

Hi Fatimah, You can delete the records from silver layer as long as those records don't get reloaded again (from bronze). More info is here  May I also understand that are you are using CDC apply_changes method for loading data to silver (like SCD 1)...

  • 0 kudos
4 More Replies
chethankumar
by New Contributor III
  • 1312 Views
  • 2 replies
  • 0 kudos

How to add existing recipient to existing delta share

I have created a recipient using the data bricks in the console, also, I have created data bricks using the console, Now I want to map the existing  recipient to the existing  delta share, Is there a way to do this using Terraform   

  • 1312 Views
  • 2 replies
  • 0 kudos
Latest Reply
chethankumar
New Contributor III
  • 0 kudos

@SathyaSDE Thank you for your response. I am looking into a Terraform-based approach for this.

  • 0 kudos
1 More Replies
MikeGo
by Contributor II
  • 1996 Views
  • 2 replies
  • 1 kudos

Resolved! dropDuplicates inside foreachBatch

Hi,If I use dropDuplicates inside foreachBatch, the dropDuplicates will become stateless and no state. It just drop duplicates for the current micro batch so I don't have to specify watermark. Is this true?Thanks

  • 1996 Views
  • 2 replies
  • 1 kudos
Latest Reply
navallyemul
New Contributor III
  • 1 kudos

Yes, you're correct! When using dropDuplicates within foreachBatch, it operates only on the current micro-batch, so it removes duplicates in a stateless manner for each batch independently. Since there's no continuous state tracking across batches, y...

  • 1 kudos
1 More Replies
Subhodeep
by New Contributor
  • 1863 Views
  • 2 replies
  • 0 kudos

Fetching queries submitted for review via Genie

Hi All,I wanted to know if there is a way to export the list of queries submitted for review via Genie using an API call?I know there is an API to fetch the query run history but what i need to fetch the list of reviews via Genie.I would be great if ...

  • 1863 Views
  • 2 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi Subhodeep, this new system table "system.access.assistant_events" is closer to what you are looking for ? But it looks like with this new system table introduced, seems like we can expect more around this sooner (in future release).More informatio...

  • 0 kudos
1 More Replies
mmenjivar
by New Contributor II
  • 2114 Views
  • 1 replies
  • 0 kudos

How to use SQL Streaming tables

We have been testing the usage of Streaming Tables in our pipelines with different results depending on the streaming sourceFor Streaming Tables reading from read_files everything works as expectedFor Streaming Tables reading from read_kafka we have ...

  • 2114 Views
  • 1 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi - Pls note:1) Structure streaming & Delta live table are two different options and have different syntax's.2) You cannot execute DLT code on notebook directly. It can be run as job.Pls refer belowhttps://docs.databricks.com/en/delta-live-tables/sq...

  • 0 kudos
alpine
by New Contributor
  • 10781 Views
  • 4 replies
  • 0 kudos

Deploy lock force acquired error when deploying asset bundle using databricks cli

I'm running this command on a DevOps pipeline.databricks bundle deploy -t devI receive this error and have tried using --force-lock but it still doesn't work.Error: deploy lock force acquired by name@company.com at 2024-02-20 16:38:34.99794209 +0000 ...

  • 10781 Views
  • 4 replies
  • 0 kudos
Latest Reply
manish1987c
New Contributor III
  • 0 kudos

you can use below commanddatabricks bundle deploy -t dev --force-lock

  • 0 kudos
3 More Replies
HeronPePrestSer
by New Contributor
  • 1350 Views
  • 1 replies
  • 0 kudos

EXECUTE IMMEDIATE works with JDBC connection ???

 Hello, i need help I am trying to use the EXECUTE IMMEDIATE command to perform DELETE or DROP operations on a table located on a remote SQL server (on-premises) using a JDBC connection from a notebook in the Databricks environment.While I can succes...

  • 1350 Views
  • 1 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi - what error are you getting ? Do you have sufficient permission to drop / delete a table ? 

  • 0 kudos
sebasv
by New Contributor II
  • 1779 Views
  • 2 replies
  • 0 kudos

Inconsistent behaviour in group by and order by

Consider this minimal example:with t as (select explode(sequence(1,10,1)) as id)select (id%2) as id from tgroup by idorder by idI would expect an ambiguous column name exception, since the grouping and sorting could apply to 2 different `id` columns....

  • 1779 Views
  • 2 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi,This is not an issue, pls understand order of execution of SQL queries. "Order by" clause will always refer to columns selected / displayed (as you are referring as id everywhere I guess there is a confusion).Ambiguous column name exception occurs...

  • 0 kudos
1 More Replies
sebasv
by New Contributor II
  • 1738 Views
  • 2 replies
  • 0 kudos

NullpointerException when creating a notebook widget

To reproduce, execute this line in a notebook (runtime 15.3):dbutils.widgets.multiselect("foo", None, [None])Exception raised: Py4JJavaError: An error occurred while calling o427.createMultiselectWidget. : java.lang.NullPointerException at com.databr...

  • 1738 Views
  • 2 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi - pls see belowI hope it helps !!

  • 0 kudos
1 More Replies
StephanKnox
by New Contributor III
  • 12222 Views
  • 4 replies
  • 2 kudos

Unit Testing with PyTest in Databricks - ModuleNotFoundError

Dear all,I am following the guide in this article: https://docs.databricks.com/en/notebooks/testing.htmlhowever I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/deadmanhide@gmail.com/test...

  • 12222 Views
  • 4 replies
  • 2 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 2 kudos

Hi,After trying a lot I could able to see some success , see if this is what you all are looking for :notebook_test.py   (this is python code file)from pyspark.sql import functions as Fdef sum_values(df    return df.agg(F.sum("value")).first()[0]def ...

  • 2 kudos
3 More Replies
nengen
by Databricks Partner
  • 1817 Views
  • 3 replies
  • 0 kudos

Debugging difference between "task time" and execution time for SQL query

I have a pretty large SQL query that has the following stats from the query profiler:Tasks total time: 1.93sExecuting: 27sBased on the information in the query profiler this can be due to tasks waiting for available nodes.How should I approach this t...

  • 1817 Views
  • 3 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@nengen  Try using EXPLAIN EXTENDED: This provides a detailed breakdown of the logical and physical plan of a query in Spark SQL.Based on the EXPLAIN EXTENDED output, here are a few things to consider:Broadcast Exchange: If the join causes data skew,...

  • 0 kudos
2 More Replies
pranav_k1
by New Contributor III
  • 4318 Views
  • 2 replies
  • 2 kudos

Merging data into table using temp view

I am trying to append data into table which already exists with some data in it.I need to create view by joining multiple tables which later will be used to append data in final table .I am able to alter table schema and then run query to insert data...

  • 4318 Views
  • 2 replies
  • 2 kudos
Latest Reply
pranav_k1
New Contributor III
  • 2 kudos

Hi @filipniziol Thanks for your reply My issue is resolved as my fellow developer ran same commands with different name after sometime it worked successfully.FYI - I was running query in same notebook just in different cells also I was running cells ...

  • 2 kudos
1 More Replies
Labels