cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Yash_542965
by New Contributor II
  • 7713 Views
  • 2 replies
  • 3 kudos

Resolved! Access Excel file in delta live pipeline

I'm having an issue accessing the excel through dlt pipeline. the file is in ADLS I'm using pandas to read the Excel. It seems pandas are not able to understand abfss protocol is there any way to read Excel with pandas in dlt pipeline?I'm getting thi...

  • 7713 Views
  • 2 replies
  • 3 kudos
Latest Reply
Yash_542965
New Contributor II
  • 3 kudos

Thanks for the info. It works just need to install an additional library using "%pip install openpyxl".

  • 3 kudos
1 More Replies
pablociu
by New Contributor
  • 1102 Views
  • 2 replies
  • 0 kudos

How to define write Option in a DLT using Python?

In a normal notebook I would save metadata to my Delta table using the following code:( df.write .format("delta") .mode("overwrite") .option("userMetadata", user_meta_data) .saveAsTable("my_table") )But I couldn't find online how c...

  • 1102 Views
  • 2 replies
  • 0 kudos
Latest Reply
United_Communit
New Contributor II
  • 0 kudos

In Delta lab you can set up User MetaData so i will give you some tips from delta import DeltaTable# Create or load your Delta tabledelta_table = DeltaTable.forPath(spark, "path_to_delta_table")# Define your user metadata myccpayuser_meta_data = {"ke...

  • 0 kudos
1 More Replies
GS2312
by New Contributor II
  • 4609 Views
  • 6 replies
  • 5 kudos

KeyProviderException when trying to create external table on databricks

Hi There,I have been trying to create an external table on Azure Databricks with below statement.df.write.partitionBy("year", "month", "day").format('org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat').option("path",sourcepath).mod...

  • 4609 Views
  • 6 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Gaurishankar Sakhare​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...

  • 5 kudos
5 More Replies
Erik
by Valued Contributor II
  • 1470 Views
  • 2 replies
  • 2 kudos

Create python modules for both repos and workspace

We are using the "databricks_notebook" terraform resource to deploy our notebooks into the "Workspace" as part of our CICD run, and our jobs run notebooks from the workspace. For development we clone the repo into "Repos". At the moment the only modu...

  • 1470 Views
  • 2 replies
  • 2 kudos
Latest Reply
RobiTakToRobi
New Contributor II
  • 2 kudos

You can create your own Python package and host it in Azure Artifacts. https://learn.microsoft.com/en-us/azure/devops/artifacts/quickstarts/python-packages?view=azure-devops

  • 2 kudos
1 More Replies
zeta_load
by New Contributor II
  • 1894 Views
  • 3 replies
  • 2 kudos

Resolved! Z-orderiing df using python

Is there a way to perform Z-ordering using python? With sql you you should be able to use:%sql OPTIMIZE df ZORDER BY (column)however I get the error "Table or view 'df' not found in database 'default''" and since I'm not really using sql, I would lik...

  • 1894 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Lukas Goldschmied​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 2 kudos
2 More Replies
az38
by New Contributor II
  • 5518 Views
  • 2 replies
  • 3 kudos

load files filtered by last_modified in PySpark

Hi, community!How do you think what is the best way to load from Azure ADLS (actually, filesystem doesn't matter) into df onli files modified after some point in time?Is there any function like input_file_name() but for last_modified to use it in a w...

  • 5518 Views
  • 2 replies
  • 3 kudos
Latest Reply
venkatcrc
New Contributor III
  • 3 kudos

_metadata will provide file modification timestamp. I tried on dbfs but not sure for ADLS.https://docs.databricks.com/ingestion/file-metadata-column.html

  • 3 kudos
1 More Replies
drewtoby
by New Contributor II
  • 8801 Views
  • 2 replies
  • 1 kudos

Resolved! How to Pull Cached SQL Table into Python Dictionary?

Hello,I have been working on this issue as a proof of concept - it would be extremely helpful to iterate through tables via loops in a few scenarios. I have a simple three column dimension that I added to a cached table.cache lazy table hedis_cache s...

Method 1 Method 2
  • 8801 Views
  • 2 replies
  • 1 kudos
Latest Reply
drewtoby
New Contributor II
  • 1 kudos

Got it to work, thank you for the tip! I needed to convert the dataframe over to a pandas dataframehttps://www.geeksforgeeks.org/convert-pyspark-dataframe-to-dictionary-in-python/

  • 1 kudos
1 More Replies
fijoy
by Contributor
  • 6962 Views
  • 1 replies
  • 2 kudos

Resolved! Using widget values in a shell script cell

I have a Databricks notebook containing a mix of SQL, Python, and shell script cells. I know I can retrieve and use values of widgets in Python cells using dbutils.widgets.get('key') and in SQL cells using ${key}.How can I use widget values in shell ...

  • 6962 Views
  • 1 replies
  • 2 kudos
Latest Reply
fijoy
Contributor
  • 2 kudos

For those interested, I found and am for now using this workaround:https://stackoverflow.com/questions/54662605/how-to-pass-a-python-variables-to-shell-script-in-azure-databricks-notebookbleswhile I wait for a more direct method.

  • 2 kudos
AnuVat
by New Contributor III
  • 30101 Views
  • 7 replies
  • 13 kudos

Resolved! How to read data from a table into a dataframe outside of Databricks environment?

Hi, I am working on an ML project and I need to access the data in tables hosted in my Databricks cluster through a notebook that I am running locally. This has been very easy while I run the notebooks in Databricks but I cannot figure out how to do ...

  • 30101 Views
  • 7 replies
  • 13 kudos
Latest Reply
chakri
New Contributor III
  • 13 kudos

We can use Apis and pyodbc to achieve this. Once go through the official documentation of databricks that might be helpful to access outside of the databricks environment.

  • 13 kudos
6 More Replies
Data_Analytics1
by Contributor III
  • 12274 Views
  • 17 replies
  • 24 kudos

Fatal error: The Python kernel is unresponsive.

I am using MultiThread in this job which creates 8 parallel jobs. It fails for few times in a day and sometimes stuck in any of the Python notebook cell process. Here The Python process exited with an unknown exit code.The last 10 KB of the process's...

  • 12274 Views
  • 17 replies
  • 24 kudos
Latest Reply
luis_herrera
Contributor
  • 24 kudos

Hey, it seems that the issue is related to the driver undergoing a memory bottleneck, which causes it to crash with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. The reason for th...

  • 24 kudos
16 More Replies
scalasparkdev
by New Contributor
  • 2207 Views
  • 2 replies
  • 0 kudos

Pyspark Structured Streaming Avro integration to Azure Schema Registry with Kafka/Eventhub in Databricks environment.

I am looking for a simple way to have a structured streaming pipeline that would automatically register a schema to Azure schema registry when converting a df col into avro and that would be able to deserialize an avro col based on schema registry ur...

  • 2207 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Tomas Sedlon​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 0 kudos
1 More Replies
chandra_ym
by New Contributor II
  • 3938 Views
  • 7 replies
  • 2 kudos

Resolved! recommended course ?

hello, I am new here. Any recommended courses for Databricks Certified Associate Developer for Apache Spark 3.0 - Python ? Thank you

  • 3938 Views
  • 7 replies
  • 2 kudos
Latest Reply
fabio2352
Contributor
  • 2 kudos

Hi, this post have a practice exams:https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DCADAS3-Python.pdf?_gl=1*1kqf0to*_gcl_aw*R0NMLjE2ODI0NDkyOTcuRUFJYUlRb2JDaE1JNWFTZ2d0ekZfZ0lWSkc1dkJCMVQ2UTJNRUFBWUFpQUFFZ0pOc3ZEX0J3RQ.

  • 2 kudos
6 More Replies
Abel_Martinez
by Contributor
  • 14492 Views
  • 10 replies
  • 38 kudos

Why Python logs shows [REDACTED] literal in spaces when I use dbutils.secrets.get in my code?

When I use  dbutils.secrets.get in my code, spaces in the log are replaced by "[REDACTED]" literal. This is very annoying and makes the log reading difficult. Any idea how to avoid this?See my screenshot...

  • 14492 Views
  • 10 replies
  • 38 kudos
Latest Reply
jlb0001
New Contributor III
  • 38 kudos

I ran into the same issue and found that the reason was that the notebook included some test keys with values of "A" and "B" for simple testing. I noticed that any string with a substring of "A" or "B" was "[REDACTED]".​So, in my case, it was an eas...

  • 38 kudos
9 More Replies
AyushModi038
by New Contributor III
  • 11850 Views
  • 2 replies
  • 1 kudos

Resolved! Upgrade Python version in cluster

Currently I am using the following cluster. It is using the default python version of 3.9.5 and I would like to update it to 3.10.1.0How to achieve this?

image
  • 11850 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Ayush Modi​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

  • 1 kudos
1 More Replies
Snowhow1
by New Contributor II
  • 8265 Views
  • 1 replies
  • 1 kudos

Logging when using multiprocessing with joblib

Hi,I'm using joblib for multiprocessing in one of our processes. The logging does work well (except weird py4j errors which I supress) except when it's within multiprocessing. Also how do I supress the other errors that I always receive on DB - perha...

  • 8265 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Sam G​ :It seems like the issue is related to the py4j library used by Spark, and not specifically related to joblib or multiprocessing. The error message indicates a network error while sending a command between the Python process and the Java Virt...

  • 1 kudos
Labels