cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ElaPG1
by New Contributor
  • 342 Views
  • 1 replies
  • 0 kudos

all-purpose compute for Oracle queries

Hi,I am looking for any guidelines, best practices regarding compute configuration for extracting data from Oracle db and saving it as parquet files. Right now I have a DBR workflow with for each task, concurrency = 31 (as I need to copy the data fro...

  • 342 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @ElaPG1 , While the cluster sounds like a pretty good one with Autoscaling, it depends on the workload too. The Standard_D8s_v5 instances you are using have 32GB memory and 8 cores. While these are generally good, you might want to experiment with...

  • 0 kudos
Garrus990
by New Contributor II
  • 223 Views
  • 1 replies
  • 0 kudos

Passing UNIX-based parameter to a task

Hey,I would like to pass to a task a parameter that is based on a UNIX function. Concretely, I would like to specify dates - dynamically calculated with respect to the date of running my job. I wanted to it like that:["--period-start", "$(date -d '-7...

  • 223 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @Garrus990 , To pass a parameter to a task that is based on a UNIX function, you can use the Databricks Jobs API to dynamically calculate dates with respect to the date of running your job.  Use a Notebook to Calculate Dates: Create a notebook tha...

  • 0 kudos
mlopsuser
by New Contributor
  • 383 Views
  • 1 replies
  • 0 kudos

Databricks Asset Bundles and MLOps Structure for different model training -1 model per DAB or 1 DAB

I have two different datasets that will be used to train two separate regression models Each dataset has its own preprocessing steps, and the models will have independent training pipelines.What best practice approach for organizing Databricks Asset ...

  • 383 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @mlopsuser , For organizing Databricks Asset Bundles (DABs) in your scenario with two separate regression models and datasets, it is generally recommended to create one DAB per model and dataset. This approach aligns with best practices for modula...

  • 0 kudos
olivier-soucy
by Contributor
  • 844 Views
  • 4 replies
  • 1 kudos

Resolved! Spark Streaming foreachBatch with Databricks connect

I'm trying to use the foreachBatch method of a Spark Streaming DataFrame with databricks-connect. Given that spark connect supported was added to  `foreachBatch` in 3.5.0, I was expecting this to work.Configuration:- DBR 15.4 (Spark 3.5.0)- databrick...

  • 844 Views
  • 4 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@olivier-soucy Are you sure that you're using DBR 15.4 and databricks-connect 15.4.2?I've seen this issue when using databricks-connect 15.4.x with DBR 14.3LTS.Anyway, I've just tested that with the same versions you've provided and it works on my en...

  • 1 kudos
3 More Replies
SharathE
by New Contributor III
  • 973 Views
  • 2 replies
  • 0 kudos

Incremental refresh of materialized view in serverless DLT

Hello, Every time that I run a delta live table materialized view in serverless , I get a log of "COMPLETE RECOMPUTE" . How can I achieve incremental refresh in serverless in DLT pipelines?

  • 973 Views
  • 2 replies
  • 0 kudos
Latest Reply
drewipson
New Contributor III
  • 0 kudos

Make sure you are using the aggregates and SQL restrictions outlined in this article. https://docs.databricks.com/en/optimizations/incremental-refresh.htmlIf a SQL function is non-deterministic (current_timestamp() is a common one) you will have a CO...

  • 0 kudos
1 More Replies
deng_dev
by New Contributor III
  • 346 Views
  • 1 replies
  • 2 kudos

Autoloader File Notifications mode S3 Access Denied error

Hi everyone!We are reading json files from cross-account S3 bucket using Autoloader and decided to switch from directory listing mode to files notification mode.We have set up all permissions mentioned here in our IAM role. But now the pipeline is fa...

  • 346 Views
  • 1 replies
  • 2 kudos
Latest Reply
drewipson
New Contributor III
  • 2 kudos

You need to be sure you have an instance profile configured with PassRole permissions so that it can assume the cross account role to access the bucket and file notification resources. I found this technical blog helpful: https://community.databricks...

  • 2 kudos
Livingstone
by New Contributor II
  • 836 Views
  • 1 replies
  • 1 kudos

Install maven package to serverless cluster

My task is to export data from CSV/SQL into Excel format with minimal latency. To achieve this, I used a Serverless cluster.Since PySpark does not support saving in XLSX format, it is necessary to install the Maven package spark-excel_2.12. However, ...

  • 836 Views
  • 1 replies
  • 1 kudos
Latest Reply
Nurota
New Contributor II
  • 1 kudos

I have a similar issue: how to install maven package in the notebook when running with  a serverless cluster?I need to install com.crealytics:spark-excel_2.12:3.4.2_0.20.3 in the notebook like the way pypl libraries installed in the notebook. e.g. %p...

  • 1 kudos
AntonPera
by New Contributor
  • 294 Views
  • 1 replies
  • 0 kudos

Lakehouse Monitoring - change profile type

I recently started to experiment with Lakehouse Monitoring. Created a monitor based on profile type of Time Series. However, want to change from Time Series to Snapshot. I have deleted the previously created two table drift_metrics and profile_metric...

  • 294 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @AntonPera , If the dropdown to change the profile type is disabled, you might need to create a new monitor from scratch. Here’s how you can do it: Go to the Lakehouse Monitoring section in Databricks.Create a new monitor and select the Snapshot p...

  • 0 kudos
Pnascima
by New Contributor
  • 326 Views
  • 1 replies
  • 0 kudos

Help - For Each Workflows Performance Use Case

Hey guys, I've been going through a performance problem in my current Workflow. Here's my use case:We have several Notebooks, each one is responsible for calculating a specific metric (just like AOV, GMV, etc)I made a pipeline that creates a datafram...

  • 326 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @Pnascima , When using the Serverless cluster, what was the tshirt sizing? Looking at your issue with dedicated cluster, it sounds to me like a resource issue (hoping no data volume changes) You would have to find a comparable size of the interact...

  • 0 kudos
ShakirHossain
by New Contributor
  • 842 Views
  • 1 replies
  • 0 kudos

curl: (35) error:0A000126:SSL routines::unexpected eof while reading

Hello,I am new to Databricks and have new workspace created. I get this error msg in my bash terminal even when I run Databricks --help command. What am i missing and how should I configure it. Please let me know if any furhter details is needed  

  • 842 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Are you referring to the databricks-cli? If so, then are you probably sitting behind a firewall or proxy? If you are, then you may need to export the proxy settings in your terminal (export HTTP_PROXY=$proxy; export HTTPS_PROXY=$proxy, with their cor...

  • 0 kudos
Garrus990
by New Contributor II
  • 479 Views
  • 1 replies
  • 1 kudos

How to run a python task that uses click for CLI operations

Hey,in my application I am using click to facilitate CLI operations. It works locally, in notebooks, when scripts are run locally, but it fails in Databricks. I defined a task that, as an entrypoint, accepts the file where the click-decorated functio...

  • 479 Views
  • 1 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

The SystemExit issue you’re seeing is typical with Click, as it’s designed for standalone CLI applications and automatically calls sys.exit() after running a command. This behavior can trigger SystemExit exceptions in non-CLI environments, like Datab...

  • 1 kudos
Dp15
by Contributor
  • 309 Views
  • 1 replies
  • 0 kudos

Databricks JDBC Insert into Array field

hi, I am trying to insert some data into a databricks table which has Array<String> fields (field1 & field2). I am using JDBC for the connection and my POJO class looks like this public class A{ private Long id; private String[] field1; priv...

  • 309 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

The error you're encountering, [Databricks][JDBC](11500) Given type does not match given object: [Ljava.lang.String;@3e1346b0, indicates that the JDBC driver is not recognizing the Java String[] array as a valid SQL array type. This is a common issue...

  • 0 kudos
Vivek_Singh
by New Contributor III
  • 244 Views
  • 1 replies
  • 0 kudos

Getting error :USER_DEFINED_FUNCTIONS.CORRELATED_REFERENCES_IN_SQL_UDF_CALLS_IN_DML_COMMANDS_NOT_IMP

Hello Focus,need help, implemented Row level security at Unity Catalog, it is working as expected however while deleting the record getting error as enclosed detail [USER_DEFINED_FUNCTIONS.CORRELATED_REFERENCES_IN_SQL_UDF_CALLS_IN_DML_COMMANDS_NOT_IM...

  • 244 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

The correlated subqueries within SQL User-Defined Functions (UDFs) used for row-level security are currently not supported for DELETE operations in Unity Catalog. You will need to adjust your row_filter_countryid_source_table UDF to avoid correlated ...

  • 0 kudos
SankaraiahNaray
by New Contributor II
  • 1742 Views
  • 1 replies
  • 1 kudos

default auth: cannot configure default credentials

 I'm trying to use dbutils from WorkspaceClient and i tried to run this code from databricks notebook.But i get this errorError:ValueError: default auth: cannot configure default credentials Code:from databricks.sdk import WorkspaceClientw = Workspac...

  • 1742 Views
  • 1 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

To resolve the ValueError: default auth: cannot configure default credentials error when using dbutils from WorkspaceClient in a Databricks notebook, follow these steps: Ensure SDK Installation: Make sure the Databricks SDK for Python is installed. ...

  • 1 kudos
SakuraDev1
by New Contributor II
  • 338 Views
  • 1 replies
  • 0 kudos

autoloader cache and buffer utilization error

Hey guys,I'm encountering an issue with a project that uses Auto Loader for data ingestion. The production cluster is shutting down due to the error: The Driver restarted - possibly due to an OutOfMemoryError - and this stream has been stopped.I’ve i...

SakuraDev1_0-1729271704783.png SakuraDev1_0-1729271834424.png
  • 338 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

The error message is sometimes generic "possibly due to an OutOfMemoryError" There is memory pressure indeed, but try to correlate those graph metrics with the Driver's STDOUT file content and check if the GC/FullGCs are able to work properly and rec...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels