cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Braxx
by Contributor II
  • 10712 Views
  • 3 replies
  • 1 kudos

Resolved! How to kill the execution of a notebook on specyfic cell?

Let's say I want to check if a condition is false then stop the execution of the rest of the script. I tried with two approaches:1) raising exceptionif not data_input_cols.issubset(data.columns): raise Exception("Missing column or column's name mis...

  • 10712 Views
  • 3 replies
  • 1 kudos
Latest Reply
Invasioned
New Contributor II
  • 1 kudos

In Jupyter notebooks or similar environments, you can stop the execution of a notebook at a specific cell by raising an exception. However, you need to handle the exception properly to ensure the execution stops. The issue you're encountering could b...

  • 1 kudos
2 More Replies
MichaelO
by New Contributor III
  • 3348 Views
  • 4 replies
  • 2 kudos

Resolved! Call python image function in pyspark

I have a function for rotating images written in python:from PIL import Image def rotate_image(image, rotation_angle): im = Image.open(image) out = im.rotate(rotation_angle, expand = True) return outI now want to use this function as a pyspark ...

  • 3348 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raluka
New Contributor III
  • 2 kudos

Stock photos, I've come to realize, are the catalysts of imagination. This website's vast reservoir of images new york seal sparks ideas that ripple through my projects. They empower me to envision the previously unimagined, helping me breathe life i...

  • 2 kudos
3 More Replies
ehpogue
by New Contributor II
  • 13510 Views
  • 9 replies
  • 3 kudos

how do i re-enable tab complete / autocomplete?

yesterday all of my notebooks seemingly changed to have python formatting (which seems to be in this week's release), but the unintended consequence is that shift + tab (which used to show docstrings in python) now just un-indents code, and tab inser...

  • 13510 Views
  • 9 replies
  • 3 kudos
Latest Reply
Data_33
New Contributor II
  • 3 kudos

i also facing the same in databricks now.

  • 3 kudos
8 More Replies
sourander
by New Contributor III
  • 13964 Views
  • 13 replies
  • 7 kudos

Resolved! Protobuf deserialization in Databricks

Hi,​Let's assume I have these things:Binary column containing protobuf-serialized dataThe .proto file including message definition​What different approaches have Databricks users chosen to deserialize the data? Python is the programming language that...

  • 13964 Views
  • 13 replies
  • 7 kudos
Latest Reply
Amou
New Contributor II
  • 7 kudos

We've now added a native connector with parsing directly with Spark Dataframes. https://docs.databricks.com/en/structured-streaming/protocol-buffers.htmlfrom pyspark.sql.protobuf.functions import to_protobuf, from_protobuf schema_registry_options = ...

  • 7 kudos
12 More Replies
amanpreetkaur
by New Contributor III
  • 52999 Views
  • 14 replies
  • 7 kudos

How to import one databricks python notebook into another?

I have a python notebook A in Azure Databricks having import statement as below: import xyz, datetime,... I have another notebook xyz being imported in notebook A as shown in above code. When I run notebook A, it throws the following error: ImportEr...

  • 52999 Views
  • 14 replies
  • 7 kudos
Latest Reply
artsheiko
Honored Contributor
  • 7 kudos

Create a repository containing an __init__.py fileAdd your library as .py file(s). Let's imagine that our library is composed by multiple sub-folders consolidated in "my_folder", one of sub-folders is named as "math_library" and contains my_awesome_l...

  • 7 kudos
13 More Replies
KKo
by Contributor III
  • 11627 Views
  • 3 replies
  • 2 kudos

Resolved! Union Multiple dataframes in loop, with different schema

With in a loop I have few dataframes created. I can union them with out an issue if they have same schema using (df_unioned = reduce(DataFrame.unionAll, df_list). Now my problem is how to union them if one of the dataframe in df_list has different nu...

  • 11627 Views
  • 3 replies
  • 2 kudos
Latest Reply
anoopunni
New Contributor II
  • 2 kudos

Hi,I have come across same scenario, using reduce() and unionByname we can implement the solution as below:val lstDF: List[Datframe] = List(df1,df2,df3,df4,df5)val combinedDF = lstDF.reduce((df1, df2) => df1.unionByName(df2, allowMissingColumns = tru...

  • 2 kudos
2 More Replies
User16765131552
by Contributor III
  • 5982 Views
  • 5 replies
  • 1 kudos

How to register a JDBC Spark dialect in Python?

I am trying to read from a databricks table. I have used the url from a cluster in the databricks. I am getting this error: java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.After these statements:jdbcConnUrl= "jdbc:spark:...

  • 5982 Views
  • 5 replies
  • 1 kudos
Latest Reply
KKDataEngineer
New Contributor III
  • 1 kudos

is there a solution for this?

  • 1 kudos
4 More Replies
kidexp
by New Contributor II
  • 21051 Views
  • 5 replies
  • 2 kudos

Resolved! How to install python package on spark cluster

Hi, How can I install python packages on spark cluster? in local, I can use pip install. I want to use some external packages which is not installed on was spark cluster. Thanks for any suggestions.

  • 21051 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Introduce Python bundle on flash groupMake a virtualenv only for your Flash hubs.Each time you run a Flash work, run a new pip introduce of all your own in-house Python libraries. ...Zoom up the site-bundles dir of the virtualenv. ...Pass the single ...

  • 2 kudos
4 More Replies
bshirdi
by New Contributor II
  • 7828 Views
  • 1 replies
  • 2 kudos

Getting HTTP 502 bad gateway error!

Hello all,I am suddenly getting an HTTP 502 and DRIVER_LIBRARY_INSTALLATION_FAILURE error during the Python library installation when the cluster gets initialized. I have around 10 Python packages out of which 2-3, packages always failed to install a...

image.png
  • 7828 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Bhargav Shir​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
marksachin_k
by New Contributor
  • 1791 Views
  • 1 replies
  • 0 kudos

Python custom Logging on Databricks

I am planning to introduce a custom logging to the databricks workload. To achieve this I am using a python logging module. I am storing logs in driver memory "file:/tmp/" directory before I move those logs to blob storage. In my personal databricks ...

  • 1791 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @MARKSACHIN K​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
PrawnballNightm
by New Contributor III
  • 4911 Views
  • 4 replies
  • 0 kudos

Resolved! Cannot configure VS code databricks extension with a non-standard databricks URL: not a databricks host.

Hello,I'm trying to connect to our databricks instance using the vscode extension. However, when following this guide we cannot get the configuration to proceed past the point that it asks for our instance URL. The prompt appears to expect a URL of t...

databricks_error
  • 4911 Views
  • 4 replies
  • 0 kudos
Latest Reply
PrawnballNightm
New Contributor III
  • 0 kudos

Hello,Yes, the databricks team shared a modified version of the vs code plugin which did not include the URL matching logic. It connects successfully. However, our custom URL is as it is because our organisation is hosting its own instance of Databri...

  • 0 kudos
3 More Replies
carlosst01
by New Contributor II
  • 1764 Views
  • 2 replies
  • 2 kudos

Resolved! Running Libraries and/or modules in Databricks' lifecycle?

Hi, i have had this question for some weeks and didn't find any information about the topic. Specifically, my doubt is: what is the 'lifecycle' or cycle or steps to be able to use a new Python library in Databricks in terms of compatibility? For exam...

  • 1764 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Carlos Caravantes​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 2 kudos
1 More Replies
Taha_Hussain
by Valued Contributor II
  • 7497 Views
  • 5 replies
  • 8 kudos

Ask your technical questions at Databricks Office Hours! Register here for any of our upcoming dates:May 10 - 11:00 AM - 12:00 PM PTMay 17 - 8:00 AM -...

Ask your technical questions at Databricks Office Hours! Register here for any of our upcoming dates:May 10 - 11:00 AM - 12:00 PM PTMay 17 - 8:00 AM - 9:00 AM PTMay 24 - 9:00 AM - 10:00 AM GMTDatabricks Office Hours connects you directly with experts...

  • 7497 Views
  • 5 replies
  • 8 kudos
Latest Reply
Priyag1
Honored Contributor II
  • 8 kudos

Thanks for this info

  • 8 kudos
4 More Replies
PriyaV
by New Contributor II
  • 11048 Views
  • 5 replies
  • 10 kudos

Suppress output in python notebooks

My dilemma is this - We use PySpark to connect to external data sources via jdbc from within databricks. Every time we issue a spark command, it spits out the connection options including the username, url and password which is not advisable. So, is ...

  • 11048 Views
  • 5 replies
  • 10 kudos
Latest Reply
Pabeggetur
New Contributor II
  • 10 kudos

Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic.youi contact hoursuber eats complaints

  • 10 kudos
4 More Replies
Labels