cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Nandini
by New Contributor II
  • 15280 Views
  • 12 replies
  • 7 kudos

Pyspark: You cannot use dbutils within a spark job

I am trying to parallelise the execution of file copy in Databricks. Making use of multiple executors is one way. So, this is the piece of code that I wrote in pyspark.def parallel_copy_execution(src_path: str, target_path: str): files_in_path = db...

  • 15280 Views
  • 12 replies
  • 7 kudos
Latest Reply
Etyr
Contributor
  • 7 kudos

If you have spark session, you can use Spark hidden File System:# Get FileSystem from SparkSession fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) # Get Path class to convert string path to FS path path = spark._...

  • 7 kudos
11 More Replies
Ligaya
by New Contributor II
  • 54413 Views
  • 4 replies
  • 2 kudos

ValueError: not enough values to unpack (expected 2, got 1)

Code:Writer.jdbc_writer("Economy",economy,conf=CONF.MSSQL.to_dict(), modified_by=JOB_ID['Economy'])The problem arises when i try to run the code, in the specified databricks notebook, An error of "ValueError: not enough values to unpack (expected 2, ...

  • 54413 Views
  • 4 replies
  • 2 kudos
Latest Reply
Sheilaschaffer
New Contributor II
  • 2 kudos

The error you're encountering, *"ValueError: not enough values to unpack (expected 2, got 1)"*, typically occurs when the code attempts to split a string expecting two parts but only gets one. In your case, `table_name.split('.')` is expecting a sche...

  • 2 kudos
3 More Replies
FranPérez
by New Contributor III
  • 14468 Views
  • 8 replies
  • 4 kudos

set PYTHONPATH when executing workflows

I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. { "task_key": "prepare", "spark_python_task": { "python_file": "file...

  • 14468 Views
  • 8 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Fran Pérez​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 4 kudos
7 More Replies
Roy
by New Contributor II
  • 68587 Views
  • 6 replies
  • 0 kudos

Resolved! dbutils.notebook.exit() executing from except in try/except block even if there is no error.

I am using Python notebooks as part of a concurrently running workflow with Databricks Runtime 6.1. Within the notebooks I am using try/except blocks to return an error message to the main concurrent notebook if a section of code fails. However I h...

  • 68587 Views
  • 6 replies
  • 0 kudos
Latest Reply
tonyliken
New Contributor II
  • 0 kudos

because the dbutils.notebook.exit() is an 'Exception' it will always trigger the except Exception as e: part of the code. When can use this to our advantage to solve the problem by adding an 'if else' to the except block. query = "SELECT 'a' as Colum...

  • 0 kudos
5 More Replies
Data_Analytics1
by Contributor III
  • 35527 Views
  • 10 replies
  • 10 kudos

Failure starting repl. How to resolve this error? I got this error in a job which is running.

Failure starting repl. Try detaching and re-attaching the notebook.java.lang.Exception: Python repl did not start in 30 seconds. at com.databricks.backend.daemon.driver.IpykernelUtils$.startIpyKernel(JupyterDriverLocal.scala:1442) at com.databricks.b...

  • 35527 Views
  • 10 replies
  • 10 kudos
Latest Reply
PabloCSD
Valued Contributor II
  • 10 kudos

I have had this problem many times, today I made a copy of the cluster and it got "de-saturated", it could help someone in the future

  • 10 kudos
9 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 23337 Views
  • 12 replies
  • 12 kudos

Resolved! dbutils or other magic way to get notebook name or cell title inside notebook cell

Not sure it exists but maybe there is some trick to get directly from python code:NotebookNameCellTitlejust working on some logger script shared between notebooks and it could make my life a bit easier

  • 23337 Views
  • 12 replies
  • 12 kudos
Latest Reply
rtullis
New Contributor II
  • 12 kudos

I got the solution to work in terms of printing the notebook that I was running; however, what if you have notebook A that calls a function that prints the notebook name, and you run notebook B that %runs notebook A?  I get the notebook B's name when...

  • 12 kudos
11 More Replies
JonHMDavis
by New Contributor II
  • 6024 Views
  • 5 replies
  • 2 kudos

Graphframes not importing on Databricks 9.1 LTS ML

Is Graphframes for python meant to be installed by default on Databricks 9.1 LTS ML? Previously I was running the attached python command on 7.3 LTS ML with no issue, however now I am getting "no module named graphframes" when trying to import the pa...

  • 6024 Views
  • 5 replies
  • 2 kudos
Latest Reply
malz
New Contributor II
  • 2 kudos

Hi @MuthuLakshmi ,  As per the documentation it was mentioned that graphframes comes preinstalled in databricks runtime for machine learning. but when trying to import the python module of graphframes, getting no module found error.from graphframes i...

  • 2 kudos
4 More Replies
maranBH
by New Contributor III
  • 29052 Views
  • 5 replies
  • 11 kudos

Resolved! How to import a function to another notebook using Repos without %run?

Hi all,I was reading the Repos documentation: https://docs.databricks.com/repos.html#migrate-from-run-commandsIt is explained that, one advantage of Repos is no longer necessary to use %run magic command to make funcions available in one notebook to ...

  • 29052 Views
  • 5 replies
  • 11 kudos
Latest Reply
JakubSkibicki
Contributor
  • 11 kudos

Due to new functionalies in Runtime 16.0 regarding autoload i came across this autoload. Performaed a practical test. It works. However had some problems at first.As in solution the key was that definitions are places in a file.py not a notebook.

  • 11 kudos
4 More Replies
kidexp
by New Contributor II
  • 26983 Views
  • 7 replies
  • 2 kudos

Resolved! How to install python package on spark cluster

Hi, How can I install python packages on spark cluster? in local, I can use pip install. I want to use some external packages which is not installed on was spark cluster. Thanks for any suggestions.

  • 26983 Views
  • 7 replies
  • 2 kudos
Latest Reply
Mikejerere
New Contributor II
  • 2 kudos

If --py-files doesn’t work, try this shorter method:Create a Conda Environment: Install your packages.conda create -n myenv python=3.xconda activate myenvpip install your-packagePackage and Submit: Use conda-pack and spark-submit with --archives.cond...

  • 2 kudos
6 More Replies
tanjil
by New Contributor III
  • 19014 Views
  • 7 replies
  • 6 kudos

Resolved! Downloading sharepoint lists using python

Hello, I am trying to download lists from SharePoint into a pandas dataframe. However I cannot get any information successfully. I have attempted many solution mentioned in stackoverflow. Below is one of those attempts: # https://pypi.org/project/sha...

  • 19014 Views
  • 7 replies
  • 6 kudos
Latest Reply
tanjil
New Contributor III
  • 6 kudos

Hello, I have gotten the code to work by using office365 library instead.

  • 6 kudos
6 More Replies
confused_dev
by New Contributor II
  • 41950 Views
  • 7 replies
  • 5 kudos

Python mocking dbutils in unittests

I am trying to write some unittests using pytest, but I am coming accross the problem of how to mock my dbutils method when dbutils isn't being defined in my notebook.Is there a way to do this so that I can unit test individual functions that are uti...

  • 41950 Views
  • 7 replies
  • 5 kudos
Latest Reply
pavlosskev
New Contributor III
  • 5 kudos

Fermin_vicente's answer is pretty good already. Below is how you can do something similar with conftest.py# conftest.py import pytest from unittest.mock import MagicMock from pyspark.sql import SparkSession @pytest.fixture(scope="session") def dbuti...

  • 5 kudos
6 More Replies
Constantine
by Contributor III
  • 16184 Views
  • 3 replies
  • 7 kudos

Resolved! collect_list by preserving order based on another variable - Spark SQL

I am using databricks sql notebook to run these queries. I have a Python UDF like   %python   from pyspark.sql.functions import udf from pyspark.sql.types import StringType, DoubleType, DateType   def get_sell_price(sale_prices): return sale_...

  • 16184 Views
  • 3 replies
  • 7 kudos
Latest Reply
villi77
New Contributor II
  • 7 kudos

I had a similar situation where I was trying to order the days of the week from Monday to Sunday.  I saw solutions that use Python but was wanting to do it all in SQL.  My original attempt was to use: CONCAT_WS(',', COLLECT_LIST(DISTINCT t.LOAD_ORIG_...

  • 7 kudos
2 More Replies
AsfandQ
by New Contributor III
  • 20050 Views
  • 7 replies
  • 6 kudos

Resolved! Delta tables: Cannot set default column mapping mode to "name" in Python for delta tables

Hello,I am trying to write Delta files for some CSV data. When I docsv_dataframe.write.format("delta").save("/path/to/table.delta")I get: AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourschema.Having look...

  • 20050 Views
  • 7 replies
  • 6 kudos
Latest Reply
Personal1
New Contributor II
  • 6 kudos

I still get the error when I try any method. The column names with spaces are throwing error [DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.df1.write.format("delta") \ .mo...

  • 6 kudos
6 More Replies
del1000
by New Contributor III
  • 21471 Views
  • 8 replies
  • 3 kudos

Resolved! Is it possible to passthrough job's parameters to variable?

Scenario:I tried to run notebook_primary as a job with same parameters' map. This notebook is orchestrator for notebooks_sec_1, notebooks_sec_2, and notebooks_sec_3 and next. I run them by dbutils.notebook.run(path, timeout, arguments) function.So ho...

  • 21471 Views
  • 8 replies
  • 3 kudos
Latest Reply
nnalla
New Contributor II
  • 3 kudos

I am using getCurrentBindings(), but it returns an empty dictionary even though I passed parameters. I am running it in a scheduled workflow job

  • 3 kudos
7 More Replies
FG
by New Contributor II
  • 12550 Views
  • 5 replies
  • 1 kudos

Running unit tests from a different notebook (using Python unittest package) doesn't produce output (can't discover the test files)

I have a test file (test_transforms.py) which has a series of tests running using Python's unittest package. I can successfully run the tests inside of the file with expected output. But when I try to run this test file from a different notebook (run...

image.png image
  • 12550 Views
  • 5 replies
  • 1 kudos
Latest Reply
SpaceDC
New Contributor II
  • 1 kudos

Hello, I have exactly the same issue.In my case, using the ipytest library from Databricks clusters, this is the error that occurs when I try to run the tests:EEEEE [100%]============================================== ERRORS =========================...

  • 1 kudos
4 More Replies
Labels