cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

HeronPePrestSer
by New Contributor
  • 450 Views
  • 1 replies
  • 0 kudos

EXECUTE IMMEDIATE works with JDBC connection ???

 Hello, i need help I am trying to use the EXECUTE IMMEDIATE command to perform DELETE or DROP operations on a table located on a remote SQL server (on-premises) using a JDBC connection from a notebook in the Databricks environment.While I can succes...

  • 450 Views
  • 1 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi - what error are you getting ? Do you have sufficient permission to drop / delete a table ? 

  • 0 kudos
sebasv
by New Contributor II
  • 670 Views
  • 2 replies
  • 0 kudos

Inconsistent behaviour in group by and order by

Consider this minimal example:with t as (select explode(sequence(1,10,1)) as id)select (id%2) as id from tgroup by idorder by idI would expect an ambiguous column name exception, since the grouping and sorting could apply to 2 different `id` columns....

  • 670 Views
  • 2 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi,This is not an issue, pls understand order of execution of SQL queries. "Order by" clause will always refer to columns selected / displayed (as you are referring as id everywhere I guess there is a confusion).Ambiguous column name exception occurs...

  • 0 kudos
1 More Replies
sebasv
by New Contributor II
  • 489 Views
  • 2 replies
  • 0 kudos

NullpointerException when creating a notebook widget

To reproduce, execute this line in a notebook (runtime 15.3):dbutils.widgets.multiselect("foo", None, [None])Exception raised: Py4JJavaError: An error occurred while calling o427.createMultiselectWidget. : java.lang.NullPointerException at com.databr...

  • 489 Views
  • 2 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi - pls see belowI hope it helps !!

  • 0 kudos
1 More Replies
StephanKnox
by New Contributor III
  • 5647 Views
  • 4 replies
  • 2 kudos

Unit Testing with PyTest in Databricks - ModuleNotFoundError

Dear all,I am following the guide in this article: https://docs.databricks.com/en/notebooks/testing.htmlhowever I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/deadmanhide@gmail.com/test...

  • 5647 Views
  • 4 replies
  • 2 kudos
Latest Reply
saurabh18cs
Valued Contributor III
  • 2 kudos

Hi,After trying a lot I could able to see some success , see if this is what you all are looking for :notebook_test.py   (this is python code file)from pyspark.sql import functions as Fdef sum_values(df    return df.agg(F.sum("value")).first()[0]def ...

  • 2 kudos
3 More Replies
nengen
by New Contributor II
  • 623 Views
  • 3 replies
  • 0 kudos

Debugging difference between "task time" and execution time for SQL query

I have a pretty large SQL query that has the following stats from the query profiler:Tasks total time: 1.93sExecuting: 27sBased on the information in the query profiler this can be due to tasks waiting for available nodes.How should I approach this t...

  • 623 Views
  • 3 replies
  • 0 kudos
Latest Reply
Panda
Valued Contributor
  • 0 kudos

@nengen  Try using EXPLAIN EXTENDED: This provides a detailed breakdown of the logical and physical plan of a query in Spark SQL.Based on the EXPLAIN EXTENDED output, here are a few things to consider:Broadcast Exchange: If the join causes data skew,...

  • 0 kudos
2 More Replies
pranav_k1
by New Contributor III
  • 1117 Views
  • 2 replies
  • 2 kudos

Merging data into table using temp view

I am trying to append data into table which already exists with some data in it.I need to create view by joining multiple tables which later will be used to append data in final table .I am able to alter table schema and then run query to insert data...

  • 1117 Views
  • 2 replies
  • 2 kudos
Latest Reply
pranav_k1
New Contributor III
  • 2 kudos

Hi @filipniziol Thanks for your reply My issue is resolved as my fellow developer ran same commands with different name after sometime it worked successfully.FYI - I was running query in same notebook just in different cells also I was running cells ...

  • 2 kudos
1 More Replies
confused_dev
by New Contributor II
  • 38933 Views
  • 7 replies
  • 5 kudos

Python mocking dbutils in unittests

I am trying to write some unittests using pytest, but I am coming accross the problem of how to mock my dbutils method when dbutils isn't being defined in my notebook.Is there a way to do this so that I can unit test individual functions that are uti...

  • 38933 Views
  • 7 replies
  • 5 kudos
Latest Reply
pavlosskev
New Contributor III
  • 5 kudos

Fermin_vicente's answer is pretty good already. Below is how you can do something similar with conftest.py# conftest.py import pytest from unittest.mock import MagicMock from pyspark.sql import SparkSession @pytest.fixture(scope="session") def dbuti...

  • 5 kudos
6 More Replies
johnb1
by Contributor
  • 26602 Views
  • 16 replies
  • 15 kudos

Problems with pandas.read_parquet() and path

I am doing the "Data Engineering with Databricks V2" learning path.I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:%run ../Includes/Classroom-Setup-04.2Screenshot 1: Inside the setup note...

MicrosoftTeams-image MicrosoftTeams-image (1) Capture Capture_2
  • 26602 Views
  • 16 replies
  • 15 kudos
Latest Reply
hebied
New Contributor II
  • 15 kudos

Thanks for sharing bro ..It really helped.

  • 15 kudos
15 More Replies
SRK
by Contributor III
  • 3786 Views
  • 5 replies
  • 7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1.      I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2.      I am using Spark code to read data from Kafka and write into landing...

  • 3786 Views
  • 5 replies
  • 7 kudos
Latest Reply
maddy08
New Contributor II
  • 7 kudos

just to clarify, are you reading kafka and writing into adls in json files? like for each message from kafka is 1 json file in adls ?

  • 7 kudos
4 More Replies
rubenesanchez
by New Contributor II
  • 5778 Views
  • 6 replies
  • 0 kudos

How dynamically pass a string parameter to a Delta Live Table Pipeline when calling from Azure Data Factory using REST API

I want to pass some context information to the delta live tables pipeline when calling from Azure Data Factory. I know the body of the API call supports Full Refresh parameter but I wonder if I can add my own custom parameters and how this can be re...

  • 5778 Views
  • 6 replies
  • 0 kudos
Latest Reply
BLM
New Contributor II
  • 0 kudos

In case this helps anyone, I only could use the refresh_selection parameter setting it to [] by default. Then, in the notebook, I derived the custom parameter values from the refresh_selection value.

  • 0 kudos
5 More Replies
NickCBZ
by New Contributor III
  • 719 Views
  • 1 replies
  • 0 kudos

Resolved! AWS Config price raised after change to Job Compute

I was looking for opportunities to decrease the cost of my Databricks ETLs and, following the documentation, I started to use Job Computes on my ETLs.Earlier, I used only all-purpose compute to do the ETLs because I needed them to run every 10minutes...

  • 719 Views
  • 1 replies
  • 0 kudos
Latest Reply
NickCBZ
New Contributor III
  • 0 kudos

If someone have this problem in the future, the solution is simple: just disable AWS Config. That's all.

  • 0 kudos
Confused
by New Contributor III
  • 46949 Views
  • 6 replies
  • 3 kudos

Resolved! Configuring pip index-url and using artifacts-keyring

Hi I would like to use the azure artifact feed as my default index-url when doing a pip install on a Databricks cluster. I understand I can achieve this by updating the pip.conf file with my artifact feed as the index-url. Does anyone know where i...

  • 46949 Views
  • 6 replies
  • 3 kudos
Latest Reply
murtazahzaveri
New Contributor II
  • 3 kudos

For Authentication you can provide below config on cluster's Spark Environment Variables,PIP_EXTRA_INDEX_URL=https://username:password@pkgs.sample.com/sample/_packaging/artifactory_name/pypi/simple/.Also, you can store the value in Databricks secret

  • 3 kudos
5 More Replies
Brad
by Contributor II
  • 1020 Views
  • 2 replies
  • 0 kudos

why delta log checkpoint is created in different formats

Hi,I'm using runtime 15.4 LTS or 14.3 LTS. When loading a delta lake table from Kinesis, I found the delta log checkpoint is in mixing formats like:7616 00000000000003291896.checkpoint.b1c24725-....json 7616 00000000000003291906.checkpoint.873e1b3e-....

  • 1020 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brad
Contributor II
  • 0 kudos

Thanks. We use a job to load data from Kinesis to delta table. I added the spark.databricks.delta.checkpoint.writeFormat parquet spark.databricks.delta.checkpoint.writeStatsAsStruct truein job cluster, but the checkpoints still show different formats...

  • 0 kudos
1 More Replies
billykimber
by New Contributor
  • 337 Views
  • 1 replies
  • 0 kudos

Datamart creation

In a scenario where multiple teams access overlapping but not identical datasets from a shared data lake, is it better to create separate datamarts for each team (despite data redundancy) or to maintain a single datamart and use views for team-specif...

  • 337 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

IMO there is no single best scenario.It depends on the case I would say.  Both have pros and cons.If the difference between teams is really small, views could be a solution.But on the other hand, if you work on massive data, the views first have to b...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels