Data Engineering

Forum Posts

Sorted by:

by Nandini • New Contributor II

12-05-2022 12:19:47 AM

15646 Views
12 replies
7 kudos

Pyspark: You cannot use dbutils within a spark job

I am trying to parallelise the execution of file copy in Databricks. Making use of multiple executors is one way. So, this is the piece of code that I wrote in pyspark.def parallel_copy_execution(src_path: str, target_path: str): files_in_path = db...

Data Engineering

15646 Views
12 replies
7 kudos

12-05-2022 12:19:47 AM

View Replies

Latest Reply

Etyr
Contributor

01-11-2023 2:33:17 AM

7 kudos

If you have spark session, you can use Spark hidden File System:# Get FileSystem from SparkSession fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) # Get Path class to convert string path to FS path path = spark._...

7 kudos

01-11-2023 2:33:17 AM

11 More Replies

by Ligaya • New Contributor II

04-12-2023 3:11:24 AM

54891 Views
4 replies
2 kudos

ValueError: not enough values to unpack (expected 2, got 1)

Code:Writer.jdbc_writer("Economy",economy,conf=CONF.MSSQL.to_dict(), modified_by=JOB_ID['Economy'])The problem arises when i try to run the code, in the specified databricks notebook, An error of "ValueError: not enough values to unpack (expected 2, ...

Data Engineering

54891 Views
4 replies
2 kudos

04-12-2023 3:11:24 AM

View Replies

Latest Reply

Sheilaschaffer
New Contributor II

05-26-2025 2:53:22 AM

2 kudos

The error you're encountering, *"ValueError: not enough values to unpack (expected 2, got 1)"*, typically occurs when the code attempts to split a string expecting two parts but only gets one. In your case, `table_name.split('.')` is expecting a sche...

2 kudos

05-26-2025 2:53:22 AM

3 More Replies

by FranPérez • New Contributor III

08-01-2022 12:37:10 AM

14886 Views
8 replies
4 kudos

set PYTHONPATH when executing workflows

I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. { "task_key": "prepare", "spark_python_task": { "python_file": "file...

Data Engineering

14886 Views
8 replies
4 kudos

08-01-2022 12:37:10 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

08-30-2022 10:07:06 AM

4 kudos

Hi @Fran Pérez,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

4 kudos

08-30-2022 10:07:06 AM

7 More Replies

by Roy • New Contributor II

11-18-2019 12:59:25 PM

69492 Views
6 replies
0 kudos

Resolved! dbutils.notebook.exit() executing from except in try/except block even if there is no error.

I am using Python notebooks as part of a concurrently running workflow with Databricks Runtime 6.1. Within the notebooks I am using try/except blocks to return an error message to the main concurrent notebook if a section of code fails. However I h...

Data Engineering

69492 Views
6 replies
0 kudos

11-18-2019 12:59:25 PM

View Replies

Latest Reply

tonyliken
New Contributor II

01-16-2025 8:26:32 AM

0 kudos

because the dbutils.notebook.exit() is an 'Exception' it will always trigger the except Exception as e: part of the code. When can use this to our advantage to solve the problem by adding an 'if else' to the except block. query = "SELECT 'a' as Colum...

0 kudos

01-16-2025 8:26:32 AM

5 More Replies

by Data_Analytics1 • Contributor III

02-01-2023 2:59:38 AM

35753 Views
10 replies
10 kudos

Failure starting repl. How to resolve this error? I got this error in a job which is running.

Failure starting repl. Try detaching and re-attaching the notebook.java.lang.Exception: Python repl did not start in 30 seconds. at com.databricks.backend.daemon.driver.IpykernelUtils$.startIpyKernel(JupyterDriverLocal.scala:1442) at com.databricks.b...

Data Engineering

35753 Views
10 replies
10 kudos

02-01-2023 2:59:38 AM

View Replies

Latest Reply

PabloCSD
Valued Contributor II

11-28-2024 11:03:50 AM

10 kudos

I have had this problem many times, today I made a copy of the cluster and it got "de-saturated", it could help someone in the future

10 kudos

11-28-2024 11:03:50 AM

9 More Replies

by Hubert-Dudek • Esteemed Contributor III

11-17-2021 6:16:14 AM

23871 Views
12 replies
12 kudos

Resolved! dbutils or other magic way to get notebook name or cell title inside notebook cell

Not sure it exists but maybe there is some trick to get directly from python code:NotebookNameCellTitlejust working on some logger script shared between notebooks and it could make my life a bit easier

Data Engineering

23871 Views
12 replies
12 kudos

11-17-2021 6:16:14 AM

View Replies

Latest Reply

rtullis
New Contributor II

11-21-2024 12:22:16 PM

12 kudos

I got the solution to work in terms of printing the notebook that I was running; however, what if you have notebook A that calls a function that prints the notebook name, and you run notebook B that %runs notebook A? I get the notebook B's name when...

12 kudos

11-21-2024 12:22:16 PM

11 More Replies

by JonHMDavis • New Contributor II

12-10-2021 1:00:39 AM

6148 Views
5 replies
2 kudos

Graphframes not importing on Databricks 9.1 LTS ML

Is Graphframes for python meant to be installed by default on Databricks 9.1 LTS ML? Previously I was running the attached python command on 7.3 LTS ML with no issue, however now I am getting "no module named graphframes" when trying to import the pa...

Data Engineering

6148 Views
5 replies
2 kudos

12-10-2021 1:00:39 AM

View Replies

Latest Reply

malz
New Contributor II

11-07-2024 10:25:53 PM

2 kudos

Hi @MuthuLakshmi , As per the documentation it was mentioned that graphframes comes preinstalled in databricks runtime for machine learning. but when trying to import the python module of graphframes, getting no module found error.from graphframes i...

2 kudos

11-07-2024 10:25:53 PM

4 More Replies

by maranBH • New Contributor III

10-19-2021 1:41:18 PM

29272 Views
5 replies
11 kudos

Resolved! How to import a function to another notebook using Repos without %run?

Hi all,I was reading the Repos documentation: https://docs.databricks.com/repos.html#migrate-from-run-commandsIt is explained that, one advantage of Repos is no longer necessary to use %run magic command to make funcions available in one notebook to ...

Data Engineering

29272 Views
5 replies
11 kudos

10-19-2021 1:41:18 PM

View Replies

Latest Reply

JakubSkibicki
Contributor

11-06-2024 2:25:26 AM

11 kudos

Due to new functionalies in Runtime 16.0 regarding autoload i came across this autoload. Performaed a practical test. It works. However had some problems at first.As in solution the key was that definitions are places in a file.py not a notebook.

11 kudos

11-06-2024 2:25:26 AM

4 More Replies

by kidexp • New Contributor II

04-14-2015 2:58:01 PM

27309 Views
7 replies
2 kudos

Resolved! How to install python package on spark cluster

Hi, How can I install python packages on spark cluster? in local, I can use pip install. I want to use some external packages which is not installed on was spark cluster. Thanks for any suggestions.

Data Engineering

27309 Views
7 replies
2 kudos

04-14-2015 2:58:01 PM

View Replies

Latest Reply

Mikejerere
New Contributor II

10-31-2024 11:53:54 PM

2 kudos

If --py-files doesn’t work, try this shorter method:Create a Conda Environment: Install your packages.conda create -n myenv python=3.xconda activate myenvpip install your-packagePackage and Submit: Use conda-pack and spark-submit with --archives.cond...

2 kudos

10-31-2024 11:53:54 PM

6 More Replies

by tanjil • New Contributor III

03-10-2022 7:41:15 AM

19347 Views
7 replies
6 kudos

Resolved! Downloading sharepoint lists using python

Hello, I am trying to download lists from SharePoint into a pandas dataframe. However I cannot get any information successfully. I have attempted many solution mentioned in stackoverflow. Below is one of those attempts: # https://pypi.org/project/sha...

Data Engineering

19347 Views
7 replies
6 kudos

03-10-2022 7:41:15 AM

View Replies

Latest Reply

tanjil
New Contributor III

04-27-2022 1:56:18 AM

6 kudos

Hello, I have gotten the code to work by using office365 library instead.

6 kudos

04-27-2022 1:56:18 AM

6 More Replies

by confused_dev • New Contributor II

10-31-2022 12:43:52 PM

42334 Views
7 replies
5 kudos

Python mocking dbutils in unittests

I am trying to write some unittests using pytest, but I am coming accross the problem of how to mock my dbutils method when dbutils isn't being defined in my notebook.Is there a way to do this so that I can unit test individual functions that are uti...

Data Engineering

42334 Views
7 replies
5 kudos

10-31-2022 12:43:52 PM

View Replies

Latest Reply

pavlosskev
New Contributor III

10-25-2024 2:44:06 AM

5 kudos

Fermin_vicente's answer is pretty good already. Below is how you can do something similar with conftest.py# conftest.py import pytest from unittest.mock import MagicMock from pyspark.sql import SparkSession @pytest.fixture(scope="session") def dbuti...

5 kudos

10-25-2024 2:44:06 AM

6 More Replies

by Constantine • Contributor III

11-04-2021 1:09:59 PM

16547 Views
3 replies
7 kudos

Resolved! collect_list by preserving order based on another variable - Spark SQL

I am using databricks sql notebook to run these queries. I have a Python UDF like %python from pyspark.sql.functions import udf from pyspark.sql.types import StringType, DoubleType, DateType def get_sell_price(sale_prices): return sale_...

Data Engineering

16547 Views
3 replies
7 kudos

11-04-2021 1:09:59 PM

View Replies

Latest Reply

villi77
New Contributor II

10-14-2024 11:22:34 AM

7 kudos

I had a similar situation where I was trying to order the days of the week from Monday to Sunday. I saw solutions that use Python but was wanting to do it all in SQL. My original attempt was to use: CONCAT_WS(',', COLLECT_LIST(DISTINCT t.LOAD_ORIG_...

7 kudos

10-14-2024 11:22:34 AM

2 More Replies

by AsfandQ • New Contributor III

06-18-2022 4:02:53 AM

20350 Views
7 replies
6 kudos

Resolved! Delta tables: Cannot set default column mapping mode to "name" in Python for delta tables

Hello,I am trying to write Delta files for some CSV data. When I docsv_dataframe.write.format("delta").save("/path/to/table.delta")I get: AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourschema.Having look...

Data Engineering

20350 Views
7 replies
6 kudos

06-18-2022 4:02:53 AM

View Replies

Latest Reply

Personal1
New Contributor II

10-01-2024 4:38:58 PM

6 kudos

I still get the error when I try any method. The column names with spaces are throwing error [DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.df1.write.format("delta") \ .mo...

6 kudos

10-01-2024 4:38:58 PM

6 More Replies

by del1000 • New Contributor III

09-14-2021 8:19:28 AM

21622 Views
8 replies
3 kudos

Resolved! Is it possible to passthrough job's parameters to variable?

Scenario:I tried to run notebook_primary as a job with same parameters' map. This notebook is orchestrator for notebooks_sec_1, notebooks_sec_2, and notebooks_sec_3 and next. I run them by dbutils.notebook.run(path, timeout, arguments) function.So ho...

Data Engineering

21622 Views
8 replies
3 kudos

09-14-2021 8:19:28 AM

View Replies

Latest Reply

nnalla
New Contributor II

09-16-2024 4:36:57 PM

3 kudos

I am using getCurrentBindings(), but it returns an empty dictionary even though I passed parameters. I am running it in a scheduled workflow job

3 kudos

09-16-2024 4:36:57 PM

7 More Replies

by FG • New Contributor II

04-09-2023 4:13:09 PM

12836 Views
5 replies
1 kudos

Running unit tests from a different notebook (using Python unittest package) doesn't produce output (can't discover the test files)

I have a test file (test_transforms.py) which has a series of tests running using Python's unittest package. I can successfully run the tests inside of the file with expected output. But when I try to run this test file from a different notebook (run...

Data Engineering

12836 Views
5 replies
1 kudos

04-09-2023 4:13:09 PM

View Replies

Latest Reply

SpaceDC
New Contributor II

09-16-2024 8:53:09 AM

1 kudos

Hello, I have exactly the same issue.In my case, using the ipytest library from Databricks clusters, this is the error that occurs when I try to run the tests:EEEEE [100%]============================================== ERRORS =========================...

1 kudos

09-16-2024 8:53:09 AM

4 More Replies