cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

rgooch_cfa
by New Contributor
  • 96 Views
  • 4 replies
  • 1 kudos

Override ruff linter settings for notebook cells

How can I override the ruff linter settings for my notebooks?I have various projects/git folders in my workspace, and oftentimes, they represent different teams and thus different sets of code formatting patterns. I would like to override the default...

  • 96 Views
  • 4 replies
  • 1 kudos
Latest Reply
BigRoux
Databricks Employee
  • 1 kudos

If your `pyproject.toml` file is not being picked up by Ruff in your Databricks notebooks, there are a few potential reasons and solutions to address the issue: Common Causes and Solutions 1. Ruff Version Compatibility:- Ensure you are using a recent...

  • 1 kudos
3 More Replies
srtiemann
by New Contributor
  • 93 Views
  • 5 replies
  • 0 kudos

Shouldn't the global statement_timeout parameter prevail over the session parameter?

How can I block the use of statement_timeout at the session level in Databricks? I want the global parameter to be enforced even if a SET statement_timeout has been executed in Databricks notebooks." 

  • 93 Views
  • 5 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

To revalidate this, are you using Serverless SQL Warehouse or Serverless Compute which are different? Can you share an screenshot of the compute being used to make sure we are aligned?

  • 0 kudos
4 More Replies
lauraxyz
by New Contributor III
  • 156 Views
  • 4 replies
  • 0 kudos

Put file into volume within Databricks

Hi!  From a Databricks job, i want to copy a workspace file into volume.  how can i do that?I tried`dbutils.fs.cp("/Workspace/path/to/the/file", "/Volumes/path/to/destination")`but got Public DBFS root is disabled. Access is denied on path: /Workspac...

  • 156 Views
  • 4 replies
  • 0 kudos
Latest Reply
lauraxyz
New Contributor III
  • 0 kudos

Found the reason!  It's the runtime, it doesn't work on Databricks Runtime Version 15.4 LTS, but started to work after changing to 16.0.   Maybe this is something supported from the latest version?

  • 0 kudos
3 More Replies
GS_S
by New Contributor
  • 160 Views
  • 7 replies
  • 0 kudos

Resolved! Error during merge operation: 'NoneType' object has no attribute 'collect'

Why does merge.collect() not return results in access mode: SINGLE_USER, but it does in USER_ISOLATION? I need to log the affected rows (inserted and updated) and can’t find a simple way to get this data in SINGLE_USER mode. Is there a solution or an...

  • 160 Views
  • 7 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned  This data filtering is performed behind the scenes using serverless compute. In terms of costs:Customers are charged for ...

  • 0 kudos
6 More Replies
htu
by New Contributor III
  • 5629 Views
  • 13 replies
  • 20 kudos

Installing Databricks Connect breaks pyspark local cluster mode

Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remot...

  • 5629 Views
  • 13 replies
  • 20 kudos
Latest Reply
mslow
New Contributor
  • 20 kudos

I think in case you're deliberately installing databricks-connect, then you need to handle the local spark session creation.My issue is that I'm using databricks-dlt package which installs databricks-connect as a dependency. In the latest package ver...

  • 20 kudos
12 More Replies
manojpatil04
by New Contributor II
  • 93 Views
  • 5 replies
  • 0 kudos

External dependency on serverless job from Airflow is not working on s3 path and workspace

I am working on use case where we have to run python script from serverless job through Airflow. when we are trying to trigger serverless job and passing external dependency as wheel from s3 path or workspace path it is not working, but on volume it ...

  • 93 Views
  • 5 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

As per serverless compute limitations I can see the following:  Task libraries are not supported for notebook tasks. Use notebook-scoped libraries instead. See Notebook-scoped Python libraries.

  • 0 kudos
4 More Replies
stadelmannkevin
by New Contributor
  • 169 Views
  • 4 replies
  • 2 kudos

init_script breaks Notebooks

 Hi everyoneWe would like to use our private company Python repository for installing Python libraries with pip install.To achieve this, I created a simple script which sets the index-url configuration of pip to our private repoI set this script as a...

  • 169 Views
  • 4 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

Did you also try cloning the cluster or using other cluster for the testing? The metastore down is normally a Hive Metastore issue, should not be impacting here, but you could check for more details on the error on the log4j under Driver logs.

  • 2 kudos
3 More Replies
sensanjoy
by Contributor
  • 16919 Views
  • 6 replies
  • 1 kudos

Resolved! Performance issue with pyspark udf function calling rest api

Hi All,I am facing some performance issue with one of pyspark udf function that post data to REST API(uses cosmos db backend to store the data).Please find the details below: # The spark dataframe(df) contains near about 30-40k data. # I am using pyt...

  • 16919 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Sanjoy Sen​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback w...

  • 1 kudos
5 More Replies
jv_v
by Contributor
  • 235 Views
  • 5 replies
  • 0 kudos

Resolved! Issue with Installing Remorph Reconcile Tool and Compatibility Clarification

I am currently working on a table migration project from a source Hive Metastore workspace to a target Unity Catalog workspace. After migrating the tables, I intend to write table validation scripts using the Remorph Reconcile tool. However, I am enc...

  • 235 Views
  • 5 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

You could try following steps in https://github.com/databrickslabs/remorph?tab=readme-ov-file#environment-setup 

  • 0 kudos
4 More Replies
wi11iamr
by New Contributor
  • 180 Views
  • 5 replies
  • 0 kudos

PowerBI Connection: Possible to use ADOMDClient (or alternative)?

I wish to extract from PowerBI Datasets the metadata of all Measures, Relationships and Entities.In VSCode I have a python script that connects to the PowerBI API using the Pyadomd module connecting via the XMLA endpoint. After much trial and error I...

  • 180 Views
  • 5 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

I understand, yes it seems that this is currently not possible, only option will be to export your dataset as a csv file and import it in Databricks.

  • 0 kudos
4 More Replies
shusharin_anton
by New Contributor
  • 88 Views
  • 1 replies
  • 1 kudos

Resolved! Sort after update on DWH

Running query on serverless DWH:UPDATEcatalog.schema.tableSETcol_tmp = CAST(col as DECIMAL(30, 15))In query profiling, it has some sort and shuffle stages in graph.Table has partition by partition_date columnSome details in sort node mentions that so...

  • 88 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @shusharin_anton, The sort and shuffle stages in your query profile are likely triggered by the need to redistribute and order the data based on the partition_date column. This behavior can be attributed to the way Spark handles data partitioning ...

  • 1 kudos
rai00
by New Contributor
  • 68 Views
  • 1 replies
  • 0 kudos

Mock user doesn't have the required privileges to access catalog `remorph` while running 'make test'

Utility : Remorph (Databricks)Issue  : 'User `me@example.com` doesn't have required privileges :: ``to access catalog `remorph`' while running 'make test' cmdI am encountering an issue while running tests for Databricks Labs Remorph using 'make test'...

  • 68 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @rai00, Ensure that the mock user me@example.com has the necessary privileges at both the catalog and schema levels. The user needs specific privileges such as USE_SCHEMA and CREATE_VOLUME   Use the WorkspaceClient to check the effective privilege...

  • 0 kudos
JothyGanesan
by New Contributor II
  • 66 Views
  • 2 replies
  • 0 kudos

DLT Merge tables into Delta

We are trying to load a Delta table from streaming tables using DLT. This target table needs a MERGE of 3 source tables. But when we use the DLT command with merge it says Merge is not supported. Is this anything related to DLT version? Please help u...

  • 66 Views
  • 2 replies
  • 0 kudos
Latest Reply
JothyGanesan
New Contributor II
  • 0 kudos

@Alberto_Umana Thank you for the quick reply. But how are we to use the above, this looks like structured streaming with CDF mode.But currently our tables being in Unity catalog, finding the start version and end version is taking huge time as the ta...

  • 0 kudos
1 More Replies
SparkMaster
by New Contributor III
  • 8355 Views
  • 11 replies
  • 2 kudos

Why can't I delete experiments without deleting the notebook? Or better Organize experiments into folders?

My Databricks Experiments is cluttered with a whole lot of experiments. Many of them are notebooks which are showing there for some reason (even though they didn't have an MLflow run associated with it). I would like to delete the experiments, but it...

  • 8355 Views
  • 11 replies
  • 2 kudos
Latest Reply
mhiltner
Databricks Employee
  • 2 kudos

Hey @Debayan @SparkMaster  A bit late here, but I believe this is being caused by a click on the right side experiments icon. This may look like a meaningless click but it actually triggers a run. 

  • 2 kudos
10 More Replies
jeremy98
by New Contributor III
  • 69 Views
  • 1 replies
  • 0 kudos

Resolved! Can we modify the constraint of a primary key in an existed table?

 Hello Community,Is it possible to modify the schema of an existing table that currently has an ID column without any constraints? I would like to update the schema to make the ID column a primary key with auto-increment starting by the maximum id al...

  • 69 Views
  • 1 replies
  • 0 kudos
Latest Reply
PiotrMi
New Contributor II
  • 0 kudos

Hey @jeremy98 Based on some old article it looks it cannot be done:There are a few caveats you should keep in mind when adopting this new feature. Identity columns cannot be added to existing tables; the tables will need to be recreated with the new ...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels