cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

bshirdi
by New Contributor II
  • 8307 Views
  • 1 replies
  • 2 kudos

Getting HTTP 502 bad gateway error!

Hello all,I am suddenly getting an HTTP 502 and DRIVER_LIBRARY_INSTALLATION_FAILURE error during the Python library installation when the cluster gets initialized. I have around 10 Python packages out of which 2-3, packages always failed to install a...

image.png
  • 8307 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Bhargav Shir​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
marksachin_k
by New Contributor
  • 2656 Views
  • 1 replies
  • 0 kudos

Python custom Logging on Databricks

I am planning to introduce a custom logging to the databricks workload. To achieve this I am using a python logging module. I am storing logs in driver memory "file:/tmp/" directory before I move those logs to blob storage. In my personal databricks ...

  • 2656 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @MARKSACHIN K​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
PrawnballNightm
by New Contributor III
  • 6729 Views
  • 4 replies
  • 0 kudos

Resolved! Cannot configure VS code databricks extension with a non-standard databricks URL: not a databricks host.

Hello,I'm trying to connect to our databricks instance using the vscode extension. However, when following this guide we cannot get the configuration to proceed past the point that it asks for our instance URL. The prompt appears to expect a URL of t...

databricks_error
  • 6729 Views
  • 4 replies
  • 0 kudos
Latest Reply
PrawnballNightm
New Contributor III
  • 0 kudos

Hello,Yes, the databricks team shared a modified version of the vs code plugin which did not include the URL matching logic. It connects successfully. However, our custom URL is as it is because our organisation is hosting its own instance of Databri...

  • 0 kudos
3 More Replies
carlosst01
by New Contributor II
  • 2124 Views
  • 2 replies
  • 2 kudos

Resolved! Running Libraries and/or modules in Databricks' lifecycle?

Hi, i have had this question for some weeks and didn't find any information about the topic. Specifically, my doubt is: what is the 'lifecycle' or cycle or steps to be able to use a new Python library in Databricks in terms of compatibility? For exam...

  • 2124 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Carlos Caravantes​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 2 kudos
1 More Replies
Taha_Hussain
by Databricks Employee
  • 7801 Views
  • 5 replies
  • 8 kudos

Ask your technical questions at Databricks Office Hours! Register here for any of our upcoming dates:May 10 - 11:00 AM - 12:00 PM PTMay 17 - 8:00 AM -...

Ask your technical questions at Databricks Office Hours! Register here for any of our upcoming dates:May 10 - 11:00 AM - 12:00 PM PTMay 17 - 8:00 AM - 9:00 AM PTMay 24 - 9:00 AM - 10:00 AM GMTDatabricks Office Hours connects you directly with experts...

  • 7801 Views
  • 5 replies
  • 8 kudos
Latest Reply
Priyag1
Honored Contributor II
  • 8 kudos

Thanks for this info

  • 8 kudos
4 More Replies
PriyaV
by New Contributor II
  • 13944 Views
  • 5 replies
  • 10 kudos

Suppress output in python notebooks

My dilemma is this - We use PySpark to connect to external data sources via jdbc from within databricks. Every time we issue a spark command, it spits out the connection options including the username, url and password which is not advisable. So, is ...

  • 13944 Views
  • 5 replies
  • 10 kudos
Latest Reply
Pabeggetur
New Contributor II
  • 10 kudos

Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic.youi contact hoursuber eats complaints

  • 10 kudos
4 More Replies
Yash_542965
by New Contributor II
  • 8144 Views
  • 2 replies
  • 3 kudos

Resolved! Access Excel file in delta live pipeline

I'm having an issue accessing the excel through dlt pipeline. the file is in ADLS I'm using pandas to read the Excel. It seems pandas are not able to understand abfss protocol is there any way to read Excel with pandas in dlt pipeline?I'm getting thi...

  • 8144 Views
  • 2 replies
  • 3 kudos
Latest Reply
Yash_542965
New Contributor II
  • 3 kudos

Thanks for the info. It works just need to install an additional library using "%pip install openpyxl".

  • 3 kudos
1 More Replies
pablociu
by New Contributor
  • 1325 Views
  • 2 replies
  • 0 kudos

How to define write Option in a DLT using Python?

In a normal notebook I would save metadata to my Delta table using the following code:( df.write .format("delta") .mode("overwrite") .option("userMetadata", user_meta_data) .saveAsTable("my_table") )But I couldn't find online how c...

  • 1325 Views
  • 2 replies
  • 0 kudos
Latest Reply
United_Communit
New Contributor II
  • 0 kudos

In Delta lab you can set up User MetaData so i will give you some tips from delta import DeltaTable# Create or load your Delta tabledelta_table = DeltaTable.forPath(spark, "path_to_delta_table")# Define your user metadata myccpayuser_meta_data = {"ke...

  • 0 kudos
1 More Replies
GS2312
by New Contributor II
  • 5430 Views
  • 6 replies
  • 5 kudos

KeyProviderException when trying to create external table on databricks

Hi There,I have been trying to create an external table on Azure Databricks with below statement.df.write.partitionBy("year", "month", "day").format('org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat').option("path",sourcepath).mod...

  • 5430 Views
  • 6 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Gaurishankar Sakhare​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...

  • 5 kudos
5 More Replies
Erik
by Valued Contributor III
  • 1787 Views
  • 2 replies
  • 2 kudos

Create python modules for both repos and workspace

We are using the "databricks_notebook" terraform resource to deploy our notebooks into the "Workspace" as part of our CICD run, and our jobs run notebooks from the workspace. For development we clone the repo into "Repos". At the moment the only modu...

  • 1787 Views
  • 2 replies
  • 2 kudos
Latest Reply
RobiTakToRobi
New Contributor II
  • 2 kudos

You can create your own Python package and host it in Azure Artifacts. https://learn.microsoft.com/en-us/azure/devops/artifacts/quickstarts/python-packages?view=azure-devops

  • 2 kudos
1 More Replies
zeta_load
by New Contributor II
  • 2310 Views
  • 3 replies
  • 2 kudos

Resolved! Z-orderiing df using python

Is there a way to perform Z-ordering using python? With sql you you should be able to use:%sql OPTIMIZE df ZORDER BY (column)however I get the error "Table or view 'df' not found in database 'default''" and since I'm not really using sql, I would lik...

  • 2310 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Lukas Goldschmied​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 2 kudos
2 More Replies
az38
by New Contributor II
  • 6733 Views
  • 2 replies
  • 3 kudos

load files filtered by last_modified in PySpark

Hi, community!How do you think what is the best way to load from Azure ADLS (actually, filesystem doesn't matter) into df onli files modified after some point in time?Is there any function like input_file_name() but for last_modified to use it in a w...

  • 6733 Views
  • 2 replies
  • 3 kudos
Latest Reply
venkatcrc
New Contributor III
  • 3 kudos

_metadata will provide file modification timestamp. I tried on dbfs but not sure for ADLS.https://docs.databricks.com/ingestion/file-metadata-column.html

  • 3 kudos
1 More Replies
drewtoby
by New Contributor II
  • 9638 Views
  • 2 replies
  • 1 kudos

Resolved! How to Pull Cached SQL Table into Python Dictionary?

Hello,I have been working on this issue as a proof of concept - it would be extremely helpful to iterate through tables via loops in a few scenarios. I have a simple three column dimension that I added to a cached table.cache lazy table hedis_cache s...

Method 1 Method 2
  • 9638 Views
  • 2 replies
  • 1 kudos
Latest Reply
drewtoby
New Contributor II
  • 1 kudos

Got it to work, thank you for the tip! I needed to convert the dataframe over to a pandas dataframehttps://www.geeksforgeeks.org/convert-pyspark-dataframe-to-dictionary-in-python/

  • 1 kudos
1 More Replies
fijoy
by Contributor
  • 7438 Views
  • 1 replies
  • 2 kudos

Resolved! Using widget values in a shell script cell

I have a Databricks notebook containing a mix of SQL, Python, and shell script cells. I know I can retrieve and use values of widgets in Python cells using dbutils.widgets.get('key') and in SQL cells using ${key}.How can I use widget values in shell ...

  • 7438 Views
  • 1 replies
  • 2 kudos
Latest Reply
fijoy
Contributor
  • 2 kudos

For those interested, I found and am for now using this workaround:https://stackoverflow.com/questions/54662605/how-to-pass-a-python-variables-to-shell-script-in-azure-databricks-notebookbleswhile I wait for a more direct method.

  • 2 kudos
Labels