Hello, I cloned a repo my_repo in the Dataricks space Repos.Inside my_repo, I created a notebook new_experiment where I can import functions from my_repo, which is really handy. When I want to modify a function in my_repo, I open my local IDE, do the...
Hi All, Could you please suggest to me the best way to write PySpark code in Databricks,I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the res...
Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks...
When I use the following code: df
.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...
Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...
Has anyone found a nice way to run code-formating (like black) on the notebooks **in the workspace**? My current workflow is to commit the file, pull it locally, format, repush and pull. It would be nice if it was some relatively easy way to run blac...
Hi Erik,I don't know if you are aware of this feature, currently there is an option to format the code in your databricks notebooks using the black code style formatter.Just you need to either have a version of your DBR equal to or greater than 11.2 ...
Hello,forecast_date = '2017-12-01'
spark.conf.set('spark.sql.shuffle.partitions', 500 )
# generate forecast for this data
forecasts = (
history
.where(history.date < forecast_date) # limit training data to prior to our forecast date
.groupBy...
@Mr_K ApplyInPandas is a higher order function in Python. As of now, we do not support higher order functions in Unity Catalog. We do support direct calls made to python UDFs. Here is an example of how to reference UDFs in UC - https://docs.databrick...
I need help understanding why I can't open a file.In a databricks notebook, I use this code:%fs
ls /mnt/cntnr/demoI get back dbfs:/mnt/cntnr/demo/circuits.csv as one of the path values.When I use this code, I get an error:circuits_df = spark.read....
It turns out my spark config was wrong #Set Spark configuration configs = {"fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azu...
Hello,I'm trying to connect to our databricks instance using the vscode extension. However, when following this guide we cannot get the configuration to proceed past the point that it asks for our instance URL. The prompt appears to expect a URL of t...
Hello,Yes, the databricks team shared a modified version of the vs code plugin which did not include the URL matching logic. It connects successfully. However, our custom URL is as it is because our organisation is hosting its own instance of Databri...
I have a scheduled job that is executed using a notebook. Within one of the notebook cells, there is a check to determine if a table exists. However, even when the table does exist, it incorrectly identifies it as non-existent and proceeds to execut...
Hi @Mahesh Chahare​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.
I have the following code:from pyspark.sql.functions import *
!pip install dbl-tempo
from tempo import TSDF
from pyspark.sql.functions import *
# interpolate target_cols column linearly for tsdf dataframe
def interpolate_tsdf(tsdf_data, target_c...
HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types i...
Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan
Hi, I did the following code but it seems like the cluster is running for a long period of time and then stops without any results. Attached my following code: (I used 'com.springml.spark.sftp' library and install it as Maven)Also i whitelisted my lo...
I am using below code to create and read widgets. I am assigning default value.dbutils.widgets.text("pname", "default","parameter_name")pname=dbutils.widgets.get("pname")I am using this widget parameter in some sql scripts. one example is given below...
Hi @Ashwathy P P​ , Which Databricks Runtime are you using?A known issue is that a widget state may not be adequately clear after pressing Run All, even after clearing or removing the widget in the code. If this happens, you will see a discrepancy be...
When I follow the instructions Modularize your code using files I get the following error:I am on azure, use DBRT 12.2 LTS, use ADLS as storage, I am happy to provide more details if needed. My research suggest that the reason is that the dfbs fuse...
I am trying to use the following code to create a deltatable%sqlCREATE TABLE rectangles(a INT, b INT, area INT GENERATED ALWAYS AS IDENTITY (START WITH 1, STEP BY 1))I don't know why but I am always getting the ParseException error.I tried all other ...