cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

radothede
by Contributor III
  • 1687 Views
  • 1 replies
  • 0 kudos

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

I have a cluster pool with max capacity. I run multiple jobs against that cluster pool.Can on-demand clusters, created within this cluster pool, be shared across multiple different jobs, at the same time?The reason I'm asking is I can see a downgrade...

  • 1687 Views
  • 1 replies
  • 0 kudos
gabe123
by New Contributor
  • 1114 Views
  • 0 replies
  • 0 kudos

Strange Error with custom module in delta live table pipeline

The chunk of code in questionsys.path.append( spark.conf.get("util_path", "/Workspace/Repos/Production/loch-ness/utils/") ) from broker_utils import extract_day_with_suffix, proper_case_address_udf, proper_case_last_name_first_udf, proper_case_ud...

  • 1114 Views
  • 0 replies
  • 0 kudos
AKUMAR_DEngg
by New Contributor II
  • 2144 Views
  • 0 replies
  • 0 kudos

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

I have Data Engineering Pipeline workload that run on Databricks.Job cluster has following configuration :- Worker  i3.4xlarge with 122 GB memory and 16 coresDriver i3.4xlarge with 122 GB memory and 16 cores ,Min Worker -4 and Max Worker 8 We noticed...

Data Engineering
Databricks
  • 2144 Views
  • 0 replies
  • 0 kudos
RicardoS
by New Contributor II
  • 9110 Views
  • 3 replies
  • 1 kudos

Value of SQL variable in IF statement using Spark SQL

Hi there,I am new to Spark SQL and would like to know if it possible to reproduce the below T-SQL query in Databricks. This is a sample query, but I want to determine if a query needs to be executed or not. DECLARE       @VariableA AS INT ,     @Vari...

  • 9110 Views
  • 3 replies
  • 1 kudos
Latest Reply
Edthehead
Contributor III
  • 1 kudos

Since you are looking for a single value back, you can use the CASE function to achieve what you need.%sqlSET var.myvarA = (SELECT 6);SET var.myvarB = (SELECT 7);SELECT CASE WHEN ${var.myvarA} = ${var.myvarB} THEN 'Equal' ELSE 'Not equal' END AS resu...

  • 1 kudos
2 More Replies
Fnazar
by New Contributor II
  • 1825 Views
  • 1 replies
  • 0 kudos

Billing of Databricks Job clusters

Hi All,Please help me understand how the billing is calculated for using the Job cluster.Document says they are charged hourly basis, so if my job ran for 1hr 30mins then will be charged for the 30mins based on the hourly rate or it will be charged f...

  • 1825 Views
  • 1 replies
  • 0 kudos
Latest Reply
PL_db
Databricks Employee
  • 0 kudos

Job clusters consume DBUs per hour depending on the VM size. The Databricks billing happens at "per second granularity", see here. That means if you run your job for 1.5 hours, you will be charged DBUs/hour*1.5*SKU_price; accordingly, if you run your...

  • 0 kudos
Kayl669
by New Contributor III
  • 3500 Views
  • 5 replies
  • 0 kudos

SQL code against tables with '>' in headers suddenly failing?

Just want to post this issue we're experiencing here in case other people are facing something similar. Below is the wording of the support ticket request I've raised:SQL code that has been working is suddenly failing due to syntax errors today. Ther...

  • 3500 Views
  • 5 replies
  • 0 kudos
Latest Reply
Kayl669
New Contributor III
  • 0 kudos

The point that we've got to with this is that MS Support / Databricks have acknowledged that they did something and are working on a fix. "The issue occurred due to the regression in the recent DBR maintenance release...Our engineering team is workin...

  • 0 kudos
4 More Replies
Red1
by New Contributor III
  • 4539 Views
  • 6 replies
  • 2 kudos

Autoingest not working with Unity Catalog in DLT pipeline

Hey Everyone,I've built a very simple pipeline with a single DLT using auto ingest, and it works, provided I don't specify the output location. When I build the same pipeline but set UC as the output location, it fails when setting up S3 notification...

  • 4539 Views
  • 6 replies
  • 2 kudos
Latest Reply
Red1
New Contributor III
  • 2 kudos

Hey @Babu_Krishnan I was! I had to reach out to my Databricks support engineer directly and the resolution was to add "cloudfiles.awsAccessKey" and "cloudfiles.awsSecretKey" to the params as in the screenshot below (apologies, i don't know why the sc...

  • 2 kudos
5 More Replies
Mado
by Valued Contributor II
  • 16553 Views
  • 4 replies
  • 3 kudos

Resolved! Using "Select Expr" and "Stack" to Unpivot PySpark DataFrame doesn't produce expected results

I am trying to unpivot a PySpark DataFrame, but I don't get the correct results.Sample dataset:# Prepare Data data = [("Spain", 101, 201, 301), \ ("Taiwan", 102, 202, 302), \ ("Italy", 103, 203, 303), \ ("China", 104, 204, 304...

image image
  • 16553 Views
  • 4 replies
  • 3 kudos
Latest Reply
lukeoz
New Contributor III
  • 3 kudos

You can also use backticks around the column names that would otherwise be recognised as numbers.from pyspark.sql import functions as F   unpivotExpr = "stack(3, '2018', `2018`, '2019', `2019`, '2020', `2020`) as (Year, CPI)" unPivotDF = df.select("C...

  • 3 kudos
3 More Replies
6502
by New Contributor III
  • 2466 Views
  • 1 replies
  • 0 kudos

Delete on streaming table and starting startingVersion

I deleted for mistake some records from a streaming table, and of course, the streaming job stopped working. So I restored the table at the version before the delete was done, and attempted to restart the job using the startingVersion to the new vers...

  • 2466 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 0 kudos

Hello @6502, It appears you've used the `startingVersion` parameter in your streaming query, which causes the stream to begin processing data from the version prior to the DELETE operation version. However, the DELETE operation will still be processe...

  • 0 kudos
Erik_L
by Contributor II
  • 1124 Views
  • 0 replies
  • 0 kudos

BUG: Unity Catalog kills UDF

We have UDFs in a few locations and today we noticed they died in performance. This seems to be caused by Unity Catalog.Test environment 1:Databricks Runtime Environment: 14.3 / 15.1Compute: 1 master, 4 nodesPolicy: UnrestrictedAccess Mode: SharedTes...

  • 1124 Views
  • 0 replies
  • 0 kudos
nilton
by New Contributor II
  • 1580 Views
  • 2 replies
  • 0 kudos

Query table based on table_name from information_schema

Hi,I have one table that changes the name every 60 days. The name simple increases the number version, for example:* Firtst 60 days: table_name_v1. After 60 days: table_name_v2 and so on.What i want is to query the table wich name returned in the que...

  • 1580 Views
  • 2 replies
  • 0 kudos
Latest Reply
radothede
Contributor III
  • 0 kudos

The simpliest way would be propably using spark.sql%py tbl_name = 'table_v1' df = spark.sql(f'select * from {tbl_name}') display(df) From there, You can simply create temporary view:%py df.createOrReplaceTempView('table_act')and query it using SQL st...

  • 0 kudos
1 More Replies
rt-slowth
by Contributor
  • 3381 Views
  • 5 replies
  • 2 kudos

AutoLoader File notification mode Configuration with AWS

   from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/XXXXX/' S3_SCHEMA = 's3://datalake-lab/XXXXX/schemas/' ...

  • 3381 Views
  • 5 replies
  • 2 kudos
Latest Reply
djhs
New Contributor III
  • 2 kudos

Was this resolved? I run into the same issue

  • 2 kudos
4 More Replies
185369
by New Contributor II
  • 3297 Views
  • 4 replies
  • 1 kudos

Resolved! DLT with UC Access Denied sqs

I am going to use the newly released DLT with UC.But it keeps getting access denied. As I keep tracking the reasons, it seems that an account. ID other than my account ID or Databricks account ID is being requested.I cannot use '*' in principal attri...

  • 3297 Views
  • 4 replies
  • 1 kudos
Latest Reply
Priyag1
Honored Contributor II
  • 1 kudos

Every service on AWS, an SQS queue, and all the other services in your stack using that queue will be configured with minimal permissions, leading to access issues. So, make sure you get your IAM policies set up correctly before deploying to producti...

  • 1 kudos
3 More Replies
QuantumFries
by New Contributor II
  • 5229 Views
  • 4 replies
  • 3 kudos

Change {{job.start_time.[iso_date]}} Timezone

I am trying to schedule some jobs using workflows and leveraging dynamic variables. One caveat is that when I try to use {{job.start_time.[iso_date]}} it seems to be defaulted to UTC, is there a way to change it?

  • 5229 Views
  • 4 replies
  • 3 kudos
Latest Reply
artsheiko
Databricks Employee
  • 3 kudos

Hi, all the dynamic values are in UTC (documentation). Maybe you can use the code like the one presented below + pass the variables between tasks (see Share information between tasks in a Databricks job) ? %python from datetime import datetime, timed...

  • 3 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels