cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Serhii
by Contributor
  • 8808 Views
  • 7 replies
  • 4 kudos

Resolved! Saving complete notebooks to GitHub from Databricks repos.

When saving notebook to GiHub repo, it is stripped to Python source code. Is it possible to save it in the ipynb formt?

  • 8808 Views
  • 7 replies
  • 4 kudos
Latest Reply
GlennStrycker
New Contributor III
  • 4 kudos

When I save+commit+push my .ipynb file to my linked git repo, I noticed that only the cell inputs are saved, not the output.  This differs from the .ipynb file I get when I choose "File / Export / iPython Notebook".  Is there a way to save the cell o...

  • 4 kudos
6 More Replies
MCosta
by New Contributor III
  • 11313 Views
  • 10 replies
  • 19 kudos

Resolved! Debugging!

Hi ML folks, We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland. Debugging in Databricks is awkward. We ended up do...

  • 11313 Views
  • 10 replies
  • 19 kudos
Latest Reply
petern
New Contributor II
  • 19 kudos

Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.Variable explorer can be used and pdb, but not the same really..

  • 19 kudos
9 More Replies
PHorniak
by New Contributor II
  • 16931 Views
  • 3 replies
  • 4 kudos

Resolved! AttributeError: 'DataFrame' object has no attribute 'rename'

Hello, I am doing the Data Science and Machine Learning course. The Boston housing has unintuitive column names. I want to rename them, e.g. so 'zn' becomes 'Zoning'. When I run this command: df_bostonLegible = df_boston.rename({'zn':'Zoning'}, axi...

  • 16931 Views
  • 3 replies
  • 4 kudos
Latest Reply
KrunalLathiya
New Contributor II
  • 4 kudos

If df_boston is a DataFrame, but you still face issues, try an alternative syntax: df_boston = df_boston.rename(columns={'zn': 'Zoning'}).Make sure df_boston is a proper DataFrame and you're using a recent version of Pandas.

  • 4 kudos
2 More Replies
Rajaniesh
by New Contributor III
  • 2676 Views
  • 2 replies
  • 1 kudos

URGENT HELP NEEDED: Python functions deployed in the cluster throwing the error

Hi,I have created a python wheel with the following code. And the package name is rule_engine"""The entry point of the Python Wheel"""import sysfrom pyspark.sql.functions import expr, coldef get_rules(tag): """  loads data quality rules from a table ...

  • 2676 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

You can find more details and examples here https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#use-a-python-wheel-in-a-databricks-job

  • 1 kudos
1 More Replies
Smitha1
by Valued Contributor II
  • 4201 Views
  • 9 replies
  • 3 kudos

Databricks Certified Associate Developer for Apache Spark 3.0

Databricks Certified Associate Developer for Apache Spark 3.0

  • 4201 Views
  • 9 replies
  • 3 kudos
Latest Reply
Shivam_Patil
New Contributor II
  • 3 kudos

Hey I am looking for sample papers for the above exam other than the one provided by databricks do any one have any idea about it

  • 3 kudos
8 More Replies
houstonamoeba
by New Contributor III
  • 4494 Views
  • 7 replies
  • 1 kudos

Resolved! examples on python sdk for install libraries

Hi Everyone,I'm planning to use databricks python cli "install_libraries"can some one pls post examples on function install_libraries https://github.com/databricks/databricks-cli/blob/main/databricks_cli/libraries/api.py

  • 4494 Views
  • 7 replies
  • 1 kudos
Latest Reply
Loop-Insist
New Contributor II
  • 1 kudos

Here you go using Python SDKfrom databricks.sdk import WorkspaceClientfrom databricks.sdk.service import computew = WorkspaceClient(host="yourhost", token="yourtoken")# Create an array of Library objects to be installedlibraries_to_install = [compute...

  • 1 kudos
6 More Replies
shiv4050
by New Contributor
  • 3610 Views
  • 4 replies
  • 0 kudos

Execute databricks notebook form a python source code.

Hello,I 'm trying to execute databricks notebook form a python source code but getting error.source code below------------------from databricks_api import DatabricksAPI   # Create a Databricks API client api = DatabricksAPI(host='databrick_host', tok...

  • 3610 Views
  • 4 replies
  • 0 kudos
Latest Reply
sewl
New Contributor II
  • 0 kudos

The error you are encountering indicates that there is an issue with establishing a connection to the Databricks host specified in your code. Specifically, the error message "getaddrinfo failed" suggests that the hostname or IP address you provided f...

  • 0 kudos
3 More Replies
T_1
by New Contributor III
  • 28197 Views
  • 13 replies
  • 3 kudos

Resolved! displayHTML can't seem to be used from Python code, only hand typed into a cell???

Trying to use displayHTML from w/in a Python module gets a Python exception:NameError: name 'displayHTML' is not definedand I've found no way around this. It seems to be something at the UI layer or something, not a Python function that can be refere...

  • 28197 Views
  • 13 replies
  • 3 kudos
Latest Reply
T_1
New Contributor III
  • 3 kudos

Holy Guacamole Batman! It works finally!!!! Wow, thanks @ptweir That's awesome! I can go back and update my doc (and code, to just use databricks the same, now, and Jupyter!) and it'll work by default. It's great they fixed it, shame they never told ...

  • 3 kudos
12 More Replies
Braxx
by Contributor II
  • 12048 Views
  • 3 replies
  • 1 kudos

Resolved! How to kill the execution of a notebook on specyfic cell?

Let's say I want to check if a condition is false then stop the execution of the rest of the script. I tried with two approaches:1) raising exceptionif not data_input_cols.issubset(data.columns): raise Exception("Missing column or column's name mis...

  • 12048 Views
  • 3 replies
  • 1 kudos
Latest Reply
Invasioned
New Contributor II
  • 1 kudos

In Jupyter notebooks or similar environments, you can stop the execution of a notebook at a specific cell by raising an exception. However, you need to handle the exception properly to ensure the execution stops. The issue you're encountering could b...

  • 1 kudos
2 More Replies
MichaelO
by New Contributor III
  • 3987 Views
  • 4 replies
  • 2 kudos

Resolved! Call python image function in pyspark

I have a function for rotating images written in python:from PIL import Image def rotate_image(image, rotation_angle): im = Image.open(image) out = im.rotate(rotation_angle, expand = True) return outI now want to use this function as a pyspark ...

  • 3987 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raluka
New Contributor III
  • 2 kudos

Stock photos, I've come to realize, are the catalysts of imagination. This website's vast reservoir of images new york seal sparks ideas that ripple through my projects. They empower me to envision the previously unimagined, helping me breathe life i...

  • 2 kudos
3 More Replies
ehpogue
by New Contributor III
  • 15513 Views
  • 9 replies
  • 3 kudos

how do i re-enable tab complete / autocomplete?

yesterday all of my notebooks seemingly changed to have python formatting (which seems to be in this week's release), but the unintended consequence is that shift + tab (which used to show docstrings in python) now just un-indents code, and tab inser...

  • 15513 Views
  • 9 replies
  • 3 kudos
Latest Reply
Data_33
New Contributor II
  • 3 kudos

i also facing the same in databricks now.

  • 3 kudos
8 More Replies
sourander
by New Contributor III
  • 16047 Views
  • 13 replies
  • 7 kudos

Resolved! Protobuf deserialization in Databricks

Hi,​Let's assume I have these things:Binary column containing protobuf-serialized dataThe .proto file including message definition​What different approaches have Databricks users chosen to deserialize the data? Python is the programming language that...

  • 16047 Views
  • 13 replies
  • 7 kudos
Latest Reply
Amou
Databricks Employee
  • 7 kudos

We've now added a native connector with parsing directly with Spark Dataframes. https://docs.databricks.com/en/structured-streaming/protocol-buffers.htmlfrom pyspark.sql.protobuf.functions import to_protobuf, from_protobuf schema_registry_options = ...

  • 7 kudos
12 More Replies
amanpreetkaur
by New Contributor III
  • 59822 Views
  • 14 replies
  • 8 kudos

How to import one databricks python notebook into another?

I have a python notebook A in Azure Databricks having import statement as below: import xyz, datetime,... I have another notebook xyz being imported in notebook A as shown in above code. When I run notebook A, it throws the following error: ImportEr...

  • 59822 Views
  • 14 replies
  • 8 kudos
Latest Reply
artsheiko
Databricks Employee
  • 8 kudos

Create a repository containing an __init__.py fileAdd your library as .py file(s). Let's imagine that our library is composed by multiple sub-folders consolidated in "my_folder", one of sub-folders is named as "math_library" and contains my_awesome_l...

  • 8 kudos
13 More Replies
KKo
by Contributor III
  • 13999 Views
  • 3 replies
  • 2 kudos

Resolved! Union Multiple dataframes in loop, with different schema

With in a loop I have few dataframes created. I can union them with out an issue if they have same schema using (df_unioned = reduce(DataFrame.unionAll, df_list). Now my problem is how to union them if one of the dataframe in df_list has different nu...

  • 13999 Views
  • 3 replies
  • 2 kudos
Latest Reply
anoopunni
New Contributor II
  • 2 kudos

Hi,I have come across same scenario, using reduce() and unionByname we can implement the solution as below:val lstDF: List[Datframe] = List(df1,df2,df3,df4,df5)val combinedDF = lstDF.reduce((df1, df2) => df1.unionByName(df2, allowMissingColumns = tru...

  • 2 kudos
2 More Replies
User16765131552
by Contributor III
  • 6689 Views
  • 5 replies
  • 1 kudos

How to register a JDBC Spark dialect in Python?

I am trying to read from a databricks table. I have used the url from a cluster in the databricks. I am getting this error: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.After these statements:jdbcConnUrl= "jdbc:spark:...

  • 6689 Views
  • 5 replies
  • 1 kudos
Latest Reply
KKDataEngineer
New Contributor III
  • 1 kudos

is there a solution for this?

  • 1 kudos
4 More Replies
Labels