cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rgomez
by New Contributor
  • 290 Views
  • 2 replies
  • 1 kudos

Install notebook dependency via terraform for serverless notebook tasks

I am trying to install a wheel file as a dependency for a serverless notebook task via terraform. According to https://docs.databricks.com/en/compute/serverless/dependencies.html , dependencies in serverless notebooks can be configured via the base e...

  • 290 Views
  • 2 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Currently, the databricks_job resource in Terraform does not support configuring the environment for notebook tasks directly. You can upload the YAML file and configure the environment as mentioned in https://docs.databricks.com/en/compute/serverless...

  • 1 kudos
1 More Replies
TjommeV-Vlaio
by New Contributor III
  • 363 Views
  • 10 replies
  • 0 kudos

Which process is eating up my driver memory?

Hi,We're running DBR 14.3 on a shared multi-node cluster.When checking the metrics of the driver, I see that the Memory utilization and Memory swap utilization are increasing a lot and are almost never decreasing. Even if no processes are running any...

  • 363 Views
  • 10 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

On OS level you will not see notebooks, you will see the mem consumption of the spark application (so this is all notebooks).For that there is the spark ui.I'd look for collect(), broadcast() statements. Python code outside of spark, tons of graphics...

  • 0 kudos
9 More Replies
jeremy98
by Contributor
  • 162 Views
  • 3 replies
  • 0 kudos

Resolved! Problem with installing Python WHEEL in an existed cluster

Hi community,I was running a workflow based on different tasks but also taking into account the existed cluster to execute those tasks, but I was getting error in configurations: run failed with error message Library installation failed for library d...

  • 162 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

You can do it by following steps in https://docs.databricks.com/en/compute/serverless/dependencies.html 

  • 0 kudos
2 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 4848 Views
  • 5 replies
  • 5 kudos

Support of running multiple cells at a time in databricks notebook Hi all,Now databricks notebook supports parallel run of commands in a single notebo...

Support of running multiple cells at a time in databricks notebookHi all,Now databricks notebook supports parallel run of commands in a single notebook that will help run ad hoc queries simultaneously without creating a separate notebook.Once you run...

image.png image
  • 4848 Views
  • 5 replies
  • 5 kudos
Latest Reply
SunilUIIT
New Contributor II
  • 5 kudos

Hi Team,I am observing that the functionality is not working as expected in the Trial workspace of Databricks. Is there a setting that needs to be enabled to allow independent SQL cells in a Databricks notebook to run in parallel, while dependent cel...

  • 5 kudos
4 More Replies
amarnathpal
by New Contributor II
  • 257 Views
  • 4 replies
  • 0 kudos

Resolved! Integrating PySpark DataFrame into SQL Dashboard for Enhanced Visualization

I have created a DataFrame in a notebook using PySpark and am considering creating a fully-featured dashboard in SQL. My question is whether I need to first store the DataFrame as a table in order to use it in the dashboard, or if it's possible to di...

  • 257 Views
  • 4 replies
  • 0 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 0 kudos

Sorry, I vaugely remember we used to create persistent views on dataframe earlier.Currently, spark dataframe doesn't allow you to create pesistent view on dataframe, rather you have to create table to use it in SQL warehouse.# Assuming there is an ex...

  • 0 kudos
3 More Replies
RiyazAli
by Valued Contributor II
  • 116 Views
  • 3 replies
  • 1 kudos

Requirement to remove/skip column(s) in the downstream tables/views while PII data masking

Hi there,As a compliance measure, I'm tasked with masking the PII data starting from bronze to silver and all the tables and views downstream. I suggested my clients to use row filters and column masks as mentioned in the doc.However, when a user who...

  • 116 Views
  • 3 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

You are right, on this case we might need to open a feature request through https://docs.databricks.com/en/resources/ideas.html#ideas 

  • 1 kudos
2 More Replies
MrJava
by New Contributor III
  • 9824 Views
  • 16 replies
  • 12 kudos

How to know, who started a job run?

Hi there!We have different jobs/workflows configured in our Databricks workspace running on AWS and would like to know who actually started the job run? Are they started by a user or a service principle using curl?Currently one can only see, who is t...

  • 9824 Views
  • 16 replies
  • 12 kudos
Latest Reply
Ayush_Arora
New Contributor II
  • 12 kudos

The system table solution works only when the job is manually triggered each time. I have a job which is triggered using the job scheduler on databricks. So once someone resumes the trigger, the job goes into execution. After this, the audit tables d...

  • 12 kudos
15 More Replies
DataGeek_JT
by New Contributor II
  • 2760 Views
  • 1 replies
  • 0 kudos

[SQL_CONF_NOT_FOUND] The SQL config "/Volumes/xxx...." canot be found. Please verify that the confi

I am getting the below error when trying to stream data from Azure Storage path to a Delta Live Table ([PATH] is the path to my files which I have redacted here):[SQL_CONF_NOT_FOUND] The SQL config "/Volumes/[PATH]" cannot be found. Please verify tha...

  • 2760 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

I believe you are not setting  spark.conf.set("/Volumes/[PATH]", "your_actual_path_here") hence when you try to get the conf, it fails.  data_source_path = spark.conf.get("/Volumes/[PATH]") "/Volumes/[PATH]" becomes the conf name, you would not want ...

  • 0 kudos
meystingray
by New Contributor II
  • 3759 Views
  • 1 replies
  • 0 kudos

Azure Databricks: Cannot create volumes or tables

If I try to create a Volume, I get this error:Failed to access cloud storage: AbfsRestOperationException exceptionTraceId=fa207c57-db1a-406e-926f-4a7ff0e4afddWhen i try to create a table, I get this error:Error creating table[RequestId=4b8fedcf-24b3-...

  • 3759 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

It seems like you are encountering issues with accessing cloud storage while trying to create a volume and a table in Databricks on Azure. The errors you are seeing, AbfsRestOperationException and INVALID_STATE.UC_CLOUD_STORAGE_ACCESS_FAILURE, indica...

  • 0 kudos
ruoyuqian
by New Contributor II
  • 713 Views
  • 1 replies
  • 0 kudos

dbt writing parquet from Volumes to Catalog schema

I have ran into a weird situation, so I uploaded few parquet files (about 10) for my sales data into the Volume in my catalog, and run dbt againt it , dbt went successful and table was able to be created however when i upload a lot more parquet files...

  • 713 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

When dealing with a large number of Parquet files (about 2500 in your case), the system might be running into resource limitations or timeouts. This can happen due to the sheer volume of data being processed at once. The failure might be due to insuf...

  • 0 kudos
Cami
by Contributor III
  • 1803 Views
  • 2 replies
  • 0 kudos

VIEW JSON result value in view which based on volume

Hello guys!I have the following case:It has been decided that the json file will be read from a following definition ( from volume) , which more or less looks like this: CREATE OR REPLACE VIEW [catalog_name].[schema_name].v_[object_name] AS SELECT r...

  • 1803 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

You must be getting the below error: [CONFIG_NOT_AVAILABLE] Configuration spark.sql.legacy.json.allowEmptyString.enabled is not available. that's because in a warehouse this config is not configurable. SQL editor won't be the best choice for this.   ...

  • 0 kudos
1 More Replies
DylanStout
by New Contributor III
  • 3116 Views
  • 3 replies
  • 0 kudos

UC Volumes: writing xlsx file to volume

How to write a DataFrame to a Volume in a catalog?We tried the following code with our pandas Dataframe:dbutils.fs.put('dbfs:/Volumes/xxxx/default/input_bestanden/x test.xlsx', pandasDf.to_excel('/Volumes/xxxx/default/input_bestanden/x test.xlsx')) T...

  • 3116 Views
  • 3 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

I was able to upload, using  dbutils.fs.cp('/FileStore/excel-1.xlsx', 'dbfs:/Volumes/xxx/default/xxx/x_test.xlsx') Maybe space in the name is causing an issue for you.        

  • 0 kudos
2 More Replies
analytics_eng
by New Contributor
  • 149 Views
  • 1 replies
  • 0 kudos

Connection reset by peer logging when importing custom package

Hi! I'm trying to import a custom package I published to Azure Artifacts, but I keep seeing the INFO logging below, which I don't want to display. The package was installed correctly on the cluster, and it imports successfully, but the log still appe...

  • 149 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Based on the context provided, here are some potential causes and solutions for the "Connection reset by peer" error Network Issues: The error might be due to transient network issues. It is recommended to check the network stability and ensure that...

  • 0 kudos
Akash_Wadhankar
by New Contributor III
  • 81 Views
  • 0 replies
  • 0 kudos

Databricks cluster selection

Compute is one of the largest portions of cost in Databricks ETL. There is not written rule to handle this. Based on experience I have put some thumb rule to set the right cluster. Please check below. https://medium.com/@infinitylearnings1201/a-compr...

  • 81 Views
  • 0 replies
  • 0 kudos
Nastia
by New Contributor III
  • 1977 Views
  • 1 replies
  • 0 kudos

I am getting NoneType error when running a query from API on cluster

When I am running a query on Databricks itself from notebook, it is running fine and giving me results. But the same query when executed from FastAPI (Python, using databricks library) is giving me "TypeError: 'NoneType' object is not iterable".I can...

  • 1977 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @Nastia , can you please share the entire stacktrace and the query that you are running.  There is currently not much detail with which I can help you understand this. But it is totally possible it is a bug that's causing this, because there shoul...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels