cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

noorbasha534
by Contributor III
  • 628 Views
  • 1 replies
  • 0 kudos

Spot instances usage in Azure Databricks

Hi all,as per the below article -https://community.databricks.com/t5/technical-blog/optimize-costs-for-your-data-and-ai-workloads-with-azure-and-aws/ba-p/662411. it is possible to choose the number of spot instances using 'availability' parameter. Bu...

  • 628 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @noorbasha534, Thanks for your question! 1. 'availability' Parameter: The 'availability' parameter in Azure Databricks controls whether the compute uses on-demand or spot instances. The values for this parameter are: ON_DEMAND_AZURE: This value...

  • 0 kudos
Shivaprasad
by New Contributor III
  • 1145 Views
  • 11 replies
  • 3 kudos

Accessing delta tables using API outside azure (Workiva)

I need to access delta tables with API outside azure using in a reporting tool workiva with using the connector. Can someone able to provide the details on how I can achieve it

  • 1145 Views
  • 11 replies
  • 3 kudos
Latest Reply
jack533
New Contributor III
  • 3 kudos

 I need to retrieve delta tables using an API outside of Azure without requiring a connection using a reporting tool named Workiva. I would appreciate it if someone could tell me exactly how to do that.

  • 3 kudos
10 More Replies
pvaz
by New Contributor II
  • 737 Views
  • 0 replies
  • 1 kudos

Performance issue when using structured streaming

Hi databricks community! Let me first apology for the long post.I'm implementing a system in databricks to read from a kafka stream into the bronze layer of a delta table. The idea is to do some operations on the data that is coming from kafka, mainl...

  • 737 Views
  • 0 replies
  • 1 kudos
negrinij
by New Contributor II
  • 31608 Views
  • 4 replies
  • 2 kudos

Understanding Used Memory in Databricks Cluster

Hello, I wonder if anyone could give me any insights regarding used memory and how could I change my code to "release" some memory as the code runs. I am using a Databricks Notebook.Basically, what we need to do is perform a query, create a spark sql...

image.png image
  • 31608 Views
  • 4 replies
  • 2 kudos
Latest Reply
loic
Contributor
  • 2 kudos

I have exactly the same kind of problem.I really do not understand why my driver goes out of memory meanwhile I do not cache anything in Spark.Since I don't cache anything, I expect references to objects that are not used anymore to be freed.Even a s...

  • 2 kudos
3 More Replies
asingamaneni
by New Contributor II
  • 2291 Views
  • 1 replies
  • 2 kudos

Brickflow - An opinionated python framework to help build and deploy Databricks workflows at scale

Hello, Databricks Community, I’m thrilled to introduce “Brickflow,” our innovative solution designed to streamline the development and deployment of workflows at scale, leveraging straightforward Python coding. This tool has already achieved a signif...

  • 2291 Views
  • 1 replies
  • 2 kudos
Latest Reply
matthiasg
New Contributor II
  • 2 kudos

Brickflow is indeed an awesome tool to work w/ Databricks!

  • 2 kudos
kasiviss42
by New Contributor III
  • 280 Views
  • 2 replies
  • 0 kudos

Query related to Z ordering

I have join on two large tables.If i apply Z ordering on 3 columns for both the tables ,I am joining two tables on the basis on the same 3 columns used for Z ordering ,Will i get any benefit of Z ordering on performance when i use Joins here .So as p...

  • 280 Views
  • 2 replies
  • 0 kudos
Latest Reply
kasiviss42
New Contributor III
  • 0 kudos

I have asked this query because w.r.t Predicate query push down to storage .So here as part of joins . Data needs to be loaded into memory first and then join is performed . So How does Z order help here if it can't skip the data being fetched from s...

  • 0 kudos
1 More Replies
kumarPatra_07
by New Contributor
  • 1113 Views
  • 1 replies
  • 0 kudos

Resolved! getting short of error while mount to storage account.

while mount to the storage account using this below code dbutils.fs.mount(  source=f"wasbs://{cointainer_name}@{storage_name}.blob.core.windows.net",  mount_point=f"/mnt/{cointainer_name}",  extra_configs={f"fs.azure.account.key.{storage_name}.blob.c...

  • 1113 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @kumarPatra_07 , Greetings! From the above code which you have shared, I could see that you are using a WASBS driver to mount the storage and as of now, WASB is already deprecated.  Reference document :  https://learn.microsoft.com/en-us/azure/dat...

  • 0 kudos
Karl
by New Contributor II
  • 1204 Views
  • 1 replies
  • 0 kudos

Resolved! DB2 JDBC Connection from Databricks cluster

Has anyone successfully connected to a DB2 database on ZOS from a Databricks cluster using a JDBC connection?I also need to specify an SSL certificate path and not sure if I need to use an init script on the cluster to do so.Any examples would be ver...

  • 1204 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @Karl , Greetings!  I've outlined the steps below to connect from Databricks to IBM DB2 using JDBC:Step 1: Obtain the DB2 JDBC Driver Visit the IBM website to download the appropriate JDBC driver for DB2 on z/OS.Reference Document: IBM DB2 JDBC Dr...

  • 0 kudos
ksenija
by Contributor
  • 825 Views
  • 1 replies
  • 0 kudos

Log data from reports in PowerBI

Where to find log data from PowerBI? I need to find what tables are being used in my PowerBI reports that are pointing to Databricks. I tried system.access.audit but I'm not finding new data when I refresh my report

  • 825 Views
  • 1 replies
  • 0 kudos
Latest Reply
Allia
Databricks Employee
  • 0 kudos

@ksenija   To enable ODBC logging in PowerBI, go to C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Simba Spark ODBC Driver folder and create or edit the file microsoft.sparkodbc.ini and update as below:   [Driver] LogLevel=6 LogPath=<...

  • 0 kudos
shahabm
by New Contributor III
  • 7603 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks job keep getting failed due to GC issue

There is a job that running successful but it's for more than a month we are experiencing long run which gets failed. In the stdout log file(attached), there are numerous following messages:[GC (Allocation Failure) [PSYoungGen:...]    and   [Full GC ...

  • 7603 Views
  • 5 replies
  • 2 kudos
Latest Reply
siddhu30
New Contributor II
  • 2 kudos

Thanks a lot @shahabm for your prompt response, appreciate it. I'll try to debug in this direction.Thanks again!

  • 2 kudos
4 More Replies
thib
by New Contributor III
  • 6422 Views
  • 4 replies
  • 3 kudos

Can we use multiple git repos for a job running multiple tasks?

I have a job running multiple tasks :Task 1 runs a machine learning pipeline from git repo 1Task 2 runs an ETL pipeline from git repo 1Task 2 is actually a generic pipeline and should not be checked in repo 1, and will be made available in another re...

image
  • 6422 Views
  • 4 replies
  • 3 kudos
Latest Reply
tors_r_us
New Contributor II
  • 3 kudos

Had this same problem. Fix was to have two workflows with no triggers, each pointing to the respective git repo. Then setup a 3rd workflow with appropriate triggers/schedule which calls the first 2 workflows. A workflow can run other workflows. 

  • 3 kudos
3 More Replies
Freshman
by New Contributor III
  • 895 Views
  • 4 replies
  • 2 kudos

Resolved! Timezone in silver tables

Hello,What is the best practice in Databricks for storing DateTime data in silver layer tables, considering the source data is in AEST and we store it in UTC by default?Thanks

  • 895 Views
  • 4 replies
  • 2 kudos
Latest Reply
robert154
New Contributor III
  • 2 kudos

@Freshman wrote:Hello,What is the best practice in Databricks for storing DateTime data in silver layer tables, considering the source data is in AEST and we store it in UTC by default?ThanksThe best practice for storing DateTime data in the Silver l...

  • 2 kudos
3 More Replies
dc-rnc
by New Contributor III
  • 409 Views
  • 1 replies
  • 1 kudos

Resolved! if-else statement in DAB YAML file

Hi.Is it possible to use a "better" way to override the "git_branch" key's value on the right file (which is the resource-yaml-file)?Or a different way, like an "if-else" statement. I'd like to have it all in the resource-yaml-file instead of overrid...

dcrnc_0-1738601704946.png
  • 409 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

I think your approach is correct using targets and ${var.git_branch} variables. At the moment there is no if-else statement, but will investigate internally if there is a better way.

  • 1 kudos
DineshOjha
by New Contributor II
  • 189 Views
  • 2 replies
  • 0 kudos

Read task level parameters in python

I am creating Databricks jobs and tasks using a python package. I have defined a task level parameter and would like to reference it in my script using sys.argv. How can I do that ?

  • 189 Views
  • 2 replies
  • 0 kudos
Latest Reply
DineshOjha
New Contributor II
  • 0 kudos

Thanks, but the link works for notebook. I have a python package run as python wheel and am wondering how to access the parameters. When I run the job its not able to understand the task level parameters in sys.argv 

  • 0 kudos
1 More Replies
garciargs
by New Contributor III
  • 468 Views
  • 2 replies
  • 3 kudos

DLT multiple source table to single silver table generating unexpected result

Hi,I´ve been trying this all day long. I'm build a POC of a pipeline that would be used on my everyday ETL.I have two initial tables, vendas and produtos, and they are as the following:vendas_rawvenda_idproduto_iddata_vendaquantidadevalor_totaldth_in...

  • 468 Views
  • 2 replies
  • 3 kudos
Latest Reply
NandiniN
Databricks Employee
  • 3 kudos

When dealing with Change Data Capture (CDC) in Delta Live Tables, it's crucial to handle out-of-order data correctly. You can use the APPLY CHANGES API to manage this. The APPLY CHANGES API ensures that the most recent data is used by specifying a co...

  • 3 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels