cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ganesh_raskar
by Databricks Partner
  • 578 Views
  • 5 replies
  • 0 kudos

Installing Custom Packages on Serverless Compute via Databricks Connect

I have a custom Python package that provides a PySpark DataSource implementation. I'm using Databricks Connect (16.4.10) and need to understand package installation options for serverless compute.Works: Traditional Compute ClusterCustom package pre-i...

Data Engineering
data-engineering
databricks-connect
  • 578 Views
  • 5 replies
  • 0 kudos
Latest Reply
Sanjeeb2024
Valued Contributor
  • 0 kudos

Hi @ganesh_raskar - If you can provide which custom package and exact code and error, I can try to replicate at my end and explore the suitable option. 

  • 0 kudos
4 More Replies
Anonymous
by Not applicable
  • 22520 Views
  • 9 replies
  • 17 kudos

Resolved! MetadataChangedException

A delta lake table is created with identity column and I'm not able to load the data parallelly from four process. i'm getting the metadata exception error.I don't want to load the data in temp table . Need to load directly and parallelly in to delta...

  • 22520 Views
  • 9 replies
  • 17 kudos
Latest Reply
lprevost
Contributor III
  • 17 kudos

I'm also having the same problem.   I'm using autoloader to load many files into a delta table with an identity column.  What used to work now dies with this problem -- after running for a long time!!

  • 17 kudos
8 More Replies
siva_pusarla
by Databricks Partner
  • 807 Views
  • 6 replies
  • 0 kudos

workspace notebook path not recognized by dbutils.notebook.run() when running from a workflow/job

result = dbutils.notebooks.run("/Workspace/YourFolder/NotebookA", timeout_seconds=600, arguments={"param1": "value1"}) print(result)I was able to execute the above code manually from a notebook.But when i run the same notebook as a job, it fails stat...

  • 807 Views
  • 6 replies
  • 0 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 0 kudos

@siva_pusarla: We use the following pattern and it works,1) Calling notebook - constant location used by Job.            + src/framework                   + notebook_executor.py2) Callee notebooks - dynamic            + src/app/notebooks             ...

  • 0 kudos
5 More Replies
Gaurav_784295
by New Contributor III
  • 3899 Views
  • 4 replies
  • 1 kudos

pyspark.sql.utils.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets

pyspark.sql.utils.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/DatasetsGetting this error while writing can any one please tell how we can resolve it

  • 3899 Views
  • 4 replies
  • 1 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 1 kudos

I share the same perspective as @preetmdata on this

  • 1 kudos
3 More Replies
dj4
by New Contributor II
  • 918 Views
  • 4 replies
  • 2 kudos

Azure Databricks UI consuming way too much memory & laggy

This especially happens when the notebook is large with many cells. Even if I clear all the outputs scrolling the notebook is way too laggy. When I start running the code the memory consumption is 3-4GB minimum even if I am not displaying any data/ta...

  • 918 Views
  • 4 replies
  • 2 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 2 kudos

@dj4: Are you in a corporate proxy environment?Databricks Browser UI uses Web Sockets and sometimes the performance issues happen due to the security checks in the traffic. 

  • 2 kudos
3 More Replies
jeremy98
by Honored Contributor
  • 3796 Views
  • 12 replies
  • 1 kudos

restarting the cluster always running doesn't free the memory?

Hello community,I was working on optimising the driver memory, since there are code that are not optimised for spark, and I was planning temporary to restart the cluster to free up the memory.that could be a potential solution, since if the cluster i...

Screenshot 2025-03-04 at 14.49.44.png
  • 3796 Views
  • 12 replies
  • 1 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 1 kudos

@jeremy98 : Please review the cluster's event logs to understand the trend of the GC related issues. Example in below snapshot.Typically, productive jobs are executed using Job clusters; and they stop as soon as the work is completed. Could you pleas...

  • 1 kudos
11 More Replies
amekojc
by New Contributor II
  • 319 Views
  • 1 replies
  • 1 kudos

How to not make tab headers show when embedding dashboard

When embedding the AI BI dashboard, is there a way to not make the tabs show and instead use our own UI tab to navigate the tabs?Currently, there are two tab headers - one in the databricks dashboard and then another tab section in our embedding webp...

  • 319 Views
  • 1 replies
  • 1 kudos
Latest Reply
mukul1409
Contributor II
  • 1 kudos

Hi @amekojc At the moment, Databricks AI BI Dashboards do not support hiding or disabling the native dashboard tabs when embedding. The embedded dashboard always renders with its own tab headers, and there is no configuration or API to control tab vi...

  • 1 kudos
libpekin
by New Contributor II
  • 445 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks Free Edition - Accessing files in S3

Hello,Attempting read/write files from s3 but got the error below. I am on the free edition (serverless by default). I'm  using access_key and secret_key. Has anyone done this successfully? Thanks!Directly accessing the underlying Spark driver JVM us...

  • 445 Views
  • 2 replies
  • 2 kudos
Latest Reply
libpekin
New Contributor II
  • 2 kudos

Thank @Sanjeeb2024 I was able to confirm as well

  • 2 kudos
1 More Replies
espenol
by Databricks Partner
  • 28941 Views
  • 11 replies
  • 13 kudos

input_file_name() not supported in Unity Catalog

Hey, so our notebooks reading a bunch of json files from storage typically use a input_file_name() when moving from raw to bronze, but after upgrading to Unity Catalog we get an error message:AnalysisException: [UC_COMMAND_NOT_SUPPORTED] input_file_n...

  • 28941 Views
  • 11 replies
  • 13 kudos
Latest Reply
ramanpreet
New Contributor II
  • 13 kudos

The reason why the 'input_file_name' is not supported because this function was available in older versions of Databricks runtime. It got deprecated from Databricks Runtime 13.3 LTS onwards

  • 13 kudos
10 More Replies
mydefaultlogin
by New Contributor II
  • 1130 Views
  • 2 replies
  • 0 kudos

Inconsistent PYTHONPATH, Git folders vs DAB

Hello Databricks Community,I'm encountering an issue related to Python paths when working with notebooks in Databricks. I have a following structure in my project:my_notebooks - my_notebook.py /my_package - __init__.py - hello.py databricks.yml...

  • 1130 Views
  • 2 replies
  • 0 kudos
Latest Reply
kenny_hero
New Contributor III
  • 0 kudos

I have a related question.I'm new to Databricks platform. I struggle with PYTHONPATH issue as the original poster raised. I understand using sys.path.append(...) is one approach for notebook. This is acceptable for ad-hoc interactive session, but thi...

  • 0 kudos
1 More Replies
kALYAN5
by Databricks Partner
  • 499 Views
  • 4 replies
  • 3 kudos

Service Principal

Can two service principal have same name,but unique id's ?

  • 499 Views
  • 4 replies
  • 3 kudos
Latest Reply
emma_s
Databricks Employee
  • 3 kudos

Hi @kALYAN5,  Here is an explanation of why service principals share a name but IDs are unique: Names Are for Human Readability: Organizations use human-friendly names like "automation-batch-job" or "databricks-ci-cd" to make it easy for admins to re...

  • 3 kudos
3 More Replies
Askenm
by New Contributor
  • 1584 Views
  • 6 replies
  • 4 kudos

Docker tab missing in create compute

I am running databricks premium and looking to create a compute running conda. It seems that the best way to do this is to boot the compute from a docker image. However, in the ```create_compute > advanced``` I cannot see the the docker option nor ca...

Data Engineering
conda
Docker
  • 1584 Views
  • 6 replies
  • 4 kudos
Latest Reply
mukul1409
Contributor II
  • 4 kudos

Hi @Askenm In Databricks Premium, the Docker option for custom images is not available on all compute types and is not controlled by user level permissions. Custom Docker images are only supported on Databricks clusters that use the legacy VM based c...

  • 4 kudos
5 More Replies
CHorton
by New Contributor II
  • 597 Views
  • 3 replies
  • 2 kudos

Resolved! Calling a function with parameters via Spark ODBC driver

Hi All,I am having an issue with calling a Databricks SQL user defined function with parameters from my client application using the Spark ODBC driver.I have been able to execute a straight SQL statement using parameters e.g. SELECT * FROM Customer W...

  • 597 Views
  • 3 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

Hi @CHorton The Databricks SQL engine does not support positional (?) parameters inside SQL UDF calls.  When Spark SQL parses GetCustomerData(?), the parameter is unresolved at analysis time, so you get [UNBOUND_SQL_PARAMETER]. This is not an ODBC bu...

  • 2 kudos
2 More Replies
Harun
by Honored Contributor
  • 13109 Views
  • 2 replies
  • 4 kudos

How to change the number of executors instances in databricks

I know that Databricks runs one executor per worker node. Can i change the no.of.exectors by adding params (spark.executor.instances) in the cluster advance option? and also can i pass this parameter when i schedule a task, so that particular task wi...

  • 13109 Views
  • 2 replies
  • 4 kudos
Latest Reply
RandiMacGyver
New Contributor II
  • 4 kudos

In Databricks, the executor model is largely managed by the platform itself. On Databricks clusters, each worker node typically runs a single Spark executor, and this behavior is intentional.

  • 4 kudos
1 More Replies
liquibricks
by Databricks Partner
  • 557 Views
  • 3 replies
  • 3 kudos

Resolved! Spark verison errors in "Build an ETL pipeline with Lakeflow Spark Declarative Pipelines"

I'm trying to define a job for a pipeline using the Asset Bundle Python SDK. I created the pipeline first (using the SDK) and i'm now trying to add the Job. The DAB validates and deploys successfully, but when I run the Job i get an error: UNAUTHORIZ...

  • 557 Views
  • 3 replies
  • 3 kudos
Latest Reply
mukul1409
Contributor II
  • 3 kudos

This happens because the job is not actually linked to the deployed pipeline and the pipeline id is None at runtime. When using Asset Bundles, the pipeline id is only resolved after deployment, so referencing my_pipeline.id in code does not work. Ins...

  • 3 kudos
2 More Replies
Labels