Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am using Python notebooks as part of a concurrently running workflow with Databricks Runtime 6.1.
Within the notebooks I am using try/except blocks to return an error message to the main concurrent notebook if a section of code fails. However I h...
because the dbutils.notebook.exit() is an 'Exception' it will always trigger the except Exception as e: part of the code. When can use this to our advantage to solve the problem by adding an 'if else' to the except block. query = "SELECT 'a' as Colum...
Failure starting repl. Try detaching and re-attaching the notebook.java.lang.Exception: Python repl did not start in 30 seconds. at com.databricks.backend.daemon.driver.IpykernelUtils$.startIpyKernel(JupyterDriverLocal.scala:1442) at com.databricks.b...
Not sure it exists but maybe there is some trick to get directly from python code:NotebookNameCellTitlejust working on some logger script shared between notebooks and it could make my life a bit easier
I got the solution to work in terms of printing the notebook that I was running; however, what if you have notebook A that calls a function that prints the notebook name, and you run notebook B that %runs notebook A? I get the notebook B's name when...
Is Graphframes for python meant to be installed by default on Databricks 9.1 LTS ML? Previously I was running the attached python command on 7.3 LTS ML with no issue, however now I am getting "no module named graphframes" when trying to import the pa...
Hi @MuthuLakshmi , As per the documentation it was mentioned that graphframes comes preinstalled in databricks runtime for machine learning. but when trying to import the python module of graphframes, getting no module found error.from graphframes i...
Hi all,I was reading the Repos documentation: https://docs.databricks.com/repos.html#migrate-from-run-commandsIt is explained that, one advantage of Repos is no longer necessary to use %run magic command to make funcions available in one notebook to ...
Due to new functionalies in Runtime 16.0 regarding autoload i came across this autoload. Performaed a practical test. It works. However had some problems at first.As in solution the key was that definitions are places in a file.py not a notebook.
Hi,
How can I install python packages on spark cluster? in local, I can use pip install.
I want to use some external packages which is not installed on was spark cluster.
Thanks for any suggestions.
If --py-files doesn’t work, try this shorter method:Create a Conda Environment: Install your packages.conda create -n myenv python=3.xconda activate myenvpip install your-packagePackage and Submit: Use conda-pack and spark-submit with --archives.cond...
Hello, I am trying to download lists from SharePoint into a pandas dataframe. However I cannot get any information successfully. I have attempted many solution mentioned in stackoverflow. Below is one of those attempts: # https://pypi.org/project/sha...
The error "<urlopen error [Errno -2] Name or service not known>" suggests that there's an issue with the server URL or network connectivity. Double-check the server URL to ensure it's correct and accessible. Also, verify that your network connection ...
I am trying to write some unittests using pytest, but I am coming accross the problem of how to mock my dbutils method when dbutils isn't being defined in my notebook.Is there a way to do this so that I can unit test individual functions that are uti...
Fermin_vicente's answer is pretty good already. Below is how you can do something similar with conftest.py# conftest.py
import pytest
from unittest.mock import MagicMock
from pyspark.sql import SparkSession
@pytest.fixture(scope="session")
def dbuti...
I am using databricks sql notebook to run these queries. I have a Python UDF like %python
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType, DoubleType, DateType
def get_sell_price(sale_prices):
return sale_...
I had a similar situation where I was trying to order the days of the week from Monday to Sunday. I saw solutions that use Python but was wanting to do it all in SQL. My original attempt was to use: CONCAT_WS(',', COLLECT_LIST(DISTINCT t.LOAD_ORIG_...
Hello,I am trying to write Delta files for some CSV data. When I docsv_dataframe.write.format("delta").save("/path/to/table.delta")I get: AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourschema.Having look...
I still get the error when I try any method. The column names with spaces are throwing error [DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.df1.write.format("delta") \
.mo...
Scenario:I tried to run notebook_primary as a job with same parameters' map. This notebook is orchestrator for notebooks_sec_1, notebooks_sec_2, and notebooks_sec_3 and next. I run them by dbutils.notebook.run(path, timeout, arguments) function.So ho...
I have a test file (test_transforms.py) which has a series of tests running using Python's unittest package. I can successfully run the tests inside of the file with expected output. But when I try to run this test file from a different notebook (run...
Hello, I have exactly the same issue.In my case, using the ipytest library from Databricks clusters, this is the error that occurs when I try to run the tests:EEEEE [100%]============================================== ERRORS =========================...
Code:Writer.jdbc_writer("Economy",economy,conf=CONF.MSSQL.to_dict(), modified_by=JOB_ID['Economy'])The problem arises when i try to run the code, in the specified databricks notebook, An error of "ValueError: not enough values to unpack (expected 2, ...
Hey Databricks Community,The error "ValueError: not enough values to unpack (expected 2, got 1)" typically occurs when Python is trying to unpack a certain number of values, but the data it is processing does not contain the expected number. This err...
Due to dependencies, if one of our cells errors then we want the notebook to stop executing.We've noticed some odd behaviour when executing notebooks depending on if "Run all cells in this notebook" is selected from the header versus "Run All Below"....
Has this been implemented? I have created a job using notebook. My notebook has 6 cells and if the code in first cell fails it should not run the rest of the cells
When I save+commit+push my .ipynb file to my linked git repo, I noticed that only the cell inputs are saved, not the output. This differs from the .ipynb file I get when I choose "File / Export / iPython Notebook". Is there a way to save the cell o...