cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Data_Analytics1
by Contributor III
  • 30539 Views
  • 10 replies
  • 10 kudos

Failure starting repl. How to resolve this error? I got this error in a job which is running.

Failure starting repl. Try detaching and re-attaching the notebook.java.lang.Exception: Python repl did not start in 30 seconds. at com.databricks.backend.daemon.driver.IpykernelUtils$.startIpyKernel(JupyterDriverLocal.scala:1442) at com.databricks.b...

  • 30539 Views
  • 10 replies
  • 10 kudos
Latest Reply
PabloCSD
Contributor III
  • 10 kudos

I have had this problem many times, today I made a copy of the cluster and it got "de-saturated", it could help someone in the future

  • 10 kudos
9 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 17161 Views
  • 12 replies
  • 12 kudos

Resolved! dbutils or other magic way to get notebook name or cell title inside notebook cell

Not sure it exists but maybe there is some trick to get directly from python code:NotebookNameCellTitlejust working on some logger script shared between notebooks and it could make my life a bit easier

  • 17161 Views
  • 12 replies
  • 12 kudos
Latest Reply
rtullis
New Contributor II
  • 12 kudos

I got the solution to work in terms of printing the notebook that I was running; however, what if you have notebook A that calls a function that prints the notebook name, and you run notebook B that %runs notebook A?  I get the notebook B's name when...

  • 12 kudos
11 More Replies
JonHMDavis
by New Contributor II
  • 4707 Views
  • 5 replies
  • 2 kudos

Graphframes not importing on Databricks 9.1 LTS ML

Is Graphframes for python meant to be installed by default on Databricks 9.1 LTS ML? Previously I was running the attached python command on 7.3 LTS ML with no issue, however now I am getting "no module named graphframes" when trying to import the pa...

  • 4707 Views
  • 5 replies
  • 2 kudos
Latest Reply
malz
New Contributor II
  • 2 kudos

Hi @MuthuLakshmi ,  As per the documentation it was mentioned that graphframes comes preinstalled in databricks runtime for machine learning. but when trying to import the python module of graphframes, getting no module found error.from graphframes i...

  • 2 kudos
4 More Replies
maranBH
by New Contributor III
  • 25798 Views
  • 5 replies
  • 11 kudos

Resolved! How to import a function to another notebook using Repos without %run?

Hi all,I was reading the Repos documentation: https://docs.databricks.com/repos.html#migrate-from-run-commandsIt is explained that, one advantage of Repos is no longer necessary to use %run magic command to make funcions available in one notebook to ...

  • 25798 Views
  • 5 replies
  • 11 kudos
Latest Reply
JakubSkibicki
New Contributor III
  • 11 kudos

Due to new functionalies in Runtime 16.0 regarding autoload i came across this autoload. Performaed a practical test. It works. However had some problems at first.As in solution the key was that definitions are places in a file.py not a notebook.

  • 11 kudos
4 More Replies
kidexp
by New Contributor II
  • 22510 Views
  • 7 replies
  • 2 kudos

Resolved! How to install python package on spark cluster

Hi, How can I install python packages on spark cluster? in local, I can use pip install. I want to use some external packages which is not installed on was spark cluster. Thanks for any suggestions.

  • 22510 Views
  • 7 replies
  • 2 kudos
Latest Reply
Mikejerere
New Contributor II
  • 2 kudos

If --py-files doesn’t work, try this shorter method:Create a Conda Environment: Install your packages.conda create -n myenv python=3.xconda activate myenvpip install your-packagePackage and Submit: Use conda-pack and spark-submit with --archives.cond...

  • 2 kudos
6 More Replies
tanjil
by New Contributor III
  • 13911 Views
  • 9 replies
  • 6 kudos

Resolved! Downloading sharepoint lists using python

Hello, I am trying to download lists from SharePoint into a pandas dataframe. However I cannot get any information successfully. I have attempted many solution mentioned in stackoverflow. Below is one of those attempts: # https://pypi.org/project/sha...

  • 13911 Views
  • 9 replies
  • 6 kudos
Latest Reply
huntaccess
New Contributor II
  • 6 kudos

The error "<urlopen error [Errno -2] Name or service not known>" suggests that there's an issue with the server URL or network connectivity. Double-check the server URL to ensure it's correct and accessible. Also, verify that your network connection ...

  • 6 kudos
8 More Replies
confused_dev
by New Contributor II
  • 31828 Views
  • 7 replies
  • 5 kudos

Python mocking dbutils in unittests

I am trying to write some unittests using pytest, but I am coming accross the problem of how to mock my dbutils method when dbutils isn't being defined in my notebook.Is there a way to do this so that I can unit test individual functions that are uti...

  • 31828 Views
  • 7 replies
  • 5 kudos
Latest Reply
pavlosskev
New Contributor III
  • 5 kudos

Fermin_vicente's answer is pretty good already. Below is how you can do something similar with conftest.py# conftest.py import pytest from unittest.mock import MagicMock from pyspark.sql import SparkSession @pytest.fixture(scope="session") def dbuti...

  • 5 kudos
6 More Replies
Constantine
by Contributor III
  • 11134 Views
  • 3 replies
  • 7 kudos

Resolved! collect_list by preserving order based on another variable - Spark SQL

I am using databricks sql notebook to run these queries. I have a Python UDF like   %python   from pyspark.sql.functions import udf from pyspark.sql.types import StringType, DoubleType, DateType   def get_sell_price(sale_prices): return sale_...

  • 11134 Views
  • 3 replies
  • 7 kudos
Latest Reply
villi77
New Contributor II
  • 7 kudos

I had a similar situation where I was trying to order the days of the week from Monday to Sunday.  I saw solutions that use Python but was wanting to do it all in SQL.  My original attempt was to use: CONCAT_WS(',', COLLECT_LIST(DISTINCT t.LOAD_ORIG_...

  • 7 kudos
2 More Replies
AsfandQ
by New Contributor III
  • 16106 Views
  • 7 replies
  • 6 kudos

Resolved! Delta tables: Cannot set default column mapping mode to "name" in Python for delta tables

Hello,I am trying to write Delta files for some CSV data. When I docsv_dataframe.write.format("delta").save("/path/to/table.delta")I get: AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourschema.Having look...

  • 16106 Views
  • 7 replies
  • 6 kudos
Latest Reply
Personal1
New Contributor II
  • 6 kudos

I still get the error when I try any method. The column names with spaces are throwing error [DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.df1.write.format("delta") \ .mo...

  • 6 kudos
6 More Replies
del1000
by New Contributor III
  • 18512 Views
  • 8 replies
  • 3 kudos

Resolved! Is it possible to passthrough job's parameters to variable?

Scenario:I tried to run notebook_primary as a job with same parameters' map. This notebook is orchestrator for notebooks_sec_1, notebooks_sec_2, and notebooks_sec_3 and next. I run them by dbutils.notebook.run(path, timeout, arguments) function.So ho...

  • 18512 Views
  • 8 replies
  • 3 kudos
Latest Reply
nnalla
New Contributor II
  • 3 kudos

I am using getCurrentBindings(), but it returns an empty dictionary even though I passed parameters. I am running it in a scheduled workflow job

  • 3 kudos
7 More Replies
FG
by New Contributor II
  • 9780 Views
  • 5 replies
  • 1 kudos

Running unit tests from a different notebook (using Python unittest package) doesn't produce output (can't discover the test files)

I have a test file (test_transforms.py) which has a series of tests running using Python's unittest package. I can successfully run the tests inside of the file with expected output. But when I try to run this test file from a different notebook (run...

image.png image
  • 9780 Views
  • 5 replies
  • 1 kudos
Latest Reply
SpaceDC
New Contributor II
  • 1 kudos

Hello, I have exactly the same issue.In my case, using the ipytest library from Databricks clusters, this is the error that occurs when I try to run the tests:EEEEE [100%]============================================== ERRORS =========================...

  • 1 kudos
4 More Replies
Ligaya
by New Contributor II
  • 37819 Views
  • 4 replies
  • 2 kudos

ValueError: not enough values to unpack (expected 2, got 1)

Code:Writer.jdbc_writer("Economy",economy,conf=CONF.MSSQL.to_dict(), modified_by=JOB_ID['Economy'])The problem arises when i try to run the code, in the specified databricks notebook, An error of "ValueError: not enough values to unpack (expected 2, ...

  • 37819 Views
  • 4 replies
  • 2 kudos
Latest Reply
veraelmore
New Contributor II
  • 2 kudos

Hey Databricks Community,The error "ValueError: not enough values to unpack (expected 2, got 1)" typically occurs when Python is trying to unpack a certain number of values, but the data it is processing does not contain the expected number. This err...

  • 2 kudos
3 More Replies
lei_armstrong
by New Contributor II
  • 9855 Views
  • 6 replies
  • 6 kudos

Resolved! Executing Notebooks - Run All Cells vs Run All Below

Due to dependencies, if one of our cells errors then we want the notebook to stop executing.We've noticed some odd behaviour when executing notebooks depending on if "Run all cells in this notebook" is selected from the header versus "Run All Below"....

  • 9855 Views
  • 6 replies
  • 6 kudos
Latest Reply
sukanya09
New Contributor II
  • 6 kudos

Has this been implemented? I have created a job using notebook. My notebook has 6 cells and if the code in first cell fails it should not run the rest of the cells 

  • 6 kudos
5 More Replies
Braxx
by Contributor II
  • 9507 Views
  • 6 replies
  • 2 kudos

Resolved! issue with group by

I am trying to group by a data frame by "PRODUCT", "MARKET" and aggregate the rest ones specified in col_list. There are much more column in the list but for simplification lets take the example below.Unfortunatelly I am getting the error:"TypeError:...

  • 9507 Views
  • 6 replies
  • 2 kudos
Latest Reply
Ralphma
New Contributor II
  • 2 kudos

The error you're encountering, "TypeError: unhashable type: 'Column'," is likely due to the way you're defining exprs. In Python, sets use curly braces {}, but they require their items to be hashable. Since the result of sum(x).alias(x) is not hashab...

  • 2 kudos
5 More Replies
Serhii
by Contributor
  • 8050 Views
  • 7 replies
  • 4 kudos

Resolved! Saving complete notebooks to GitHub from Databricks repos.

When saving notebook to GiHub repo, it is stripped to Python source code. Is it possible to save it in the ipynb formt?

  • 8050 Views
  • 7 replies
  • 4 kudos
Latest Reply
GlennStrycker
New Contributor III
  • 4 kudos

When I save+commit+push my .ipynb file to my linked git repo, I noticed that only the cell inputs are saved, not the output.  This differs from the .ipynb file I get when I choose "File / Export / iPython Notebook".  Is there a way to save the cell o...

  • 4 kudos
6 More Replies
Labels