cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

William_Scardua
by Valued Contributor
  • 2400 Views
  • 1 replies
  • 3 kudos

How to use Pylint to check your pyspark code quality ?

Hi guys,I would like to use the Pylint to check my pyspark scripts, do you do that ?Thank you ?

  • 2400 Views
  • 1 replies
  • 3 kudos
Latest Reply
developer_lumo
New Contributor II
  • 3 kudos

Currently I am working on Databricks (Notebooks) and have the same issue as unable to find a linter that is well integrated with Python, Pyspark and databricks notebooks. 

  • 3 kudos
kjoth
by Contributor II
  • 17897 Views
  • 9 replies
  • 7 kudos

How to make the job fail via code after handling exception

Hi , We are capturing the exception if an error occurs using try except. But we want the job status to be failed once we got the exception. Whats the best way to do that. We are using pyspark.

  • 17897 Views
  • 9 replies
  • 7 kudos
Latest Reply
kumar_ravi
New Contributor III
  • 7 kudos

you can do some hack arround   dbutils = get_dbutils(spark)    tables_with_exceptions = []    for table_config in table_configs:        try:            process(spark, table_config)        except Exception as e:            exception_detail = f"Error p...

  • 7 kudos
8 More Replies
pgagliardi
by New Contributor II
  • 1898 Views
  • 1 replies
  • 2 kudos

Latest pushed code is not taken into account by Notebook

Hello, I cloned a repo my_repo in the Dataricks space Repos.Inside my_repo, I created a notebook new_experiment where I can import functions from my_repo, which is really handy. When I want to modify a function in my_repo, I open my local IDE, do the...

  • 1898 Views
  • 1 replies
  • 2 kudos
Latest Reply
Jnguyen
Databricks Employee
  • 2 kudos

Use  %reload_ext autoreload instead, it will do your expected behavior.You just need to run it once, like %load_ext autoreload %autoreload 2

  • 2 kudos
Mr__D
by New Contributor II
  • 16021 Views
  • 7 replies
  • 1 kudos

Resolved! Writing modular code in Databricks

Hi All, Could you please suggest to me the best way to write PySpark code in Databricks,I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the res...

  • 16021 Views
  • 7 replies
  • 1 kudos
Latest Reply
Gamlet
New Contributor II
  • 1 kudos

Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks...

  • 1 kudos
6 More Replies
Danielsg94
by New Contributor II
  • 33438 Views
  • 5 replies
  • 1 kudos

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

When I use the following code: df .coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") .save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...

  • 33438 Views
  • 5 replies
  • 1 kudos
Latest Reply
Simha
New Contributor II
  • 1 kudos

Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...

  • 1 kudos
4 More Replies
Erik
by Valued Contributor III
  • 10971 Views
  • 4 replies
  • 3 kudos

Resolved! How to run code-formating on the notebooks

Has anyone found a nice way to run code-formating (like black) on the notebooks **in the workspace**? My current workflow is to commit the file, pull it locally, format, repush and pull. It would be nice if it was some relatively easy way to run blac...

  • 10971 Views
  • 4 replies
  • 3 kudos
Latest Reply
MartinPlay01
New Contributor II
  • 3 kudos

Hi Erik,I don't know if you are aware of this feature, currently there is an option to format the code in your databricks notebooks using the black code style formatter.Just you need to either have a version of your DBR equal to or greater than 11.2 ...

  • 3 kudos
3 More Replies
Prank
by New Contributor III
  • 6378 Views
  • 10 replies
  • 7 kudos
  • 6378 Views
  • 10 replies
  • 7 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 7 kudos

@Prank  why do you want the browser hostname?

  • 7 kudos
9 More Replies
Mr_K
by New Contributor
  • 9479 Views
  • 2 replies
  • 2 kudos

AnalysisException: [UC_COMMAND_NOT_SUPPORTED] Spark higher-order functions are not supported in Unity Catalog.;

Hello,forecast_date = '2017-12-01' spark.conf.set('spark.sql.shuffle.partitions', 500 ) # generate forecast for this data forecasts = ( history .where(history.date < forecast_date) # limit training data to prior to our forecast date .groupBy...

  • 9479 Views
  • 2 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 2 kudos

@Mr_K ApplyInPandas is a higher order function in Python. As of now, we do not support higher order functions in Unity Catalog. We do support direct calls made to python UDFs. Here is an example of how to reference UDFs in UC - https://docs.databrick...

  • 2 kudos
1 More Replies
jch
by New Contributor III
  • 7501 Views
  • 4 replies
  • 5 kudos

Resolved! Why does spark.read.csv come back with an error: com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/cntnr/demo/circuits.csv ?

I need help understanding why I can't open a file.In a databricks notebook, I use this code:%fs   ls /mnt/cntnr/demoI get back dbfs:/mnt/cntnr/demo/circuits.csv as one of the path values.When I use this code, I get an error:circuits_df = spark.read....

  • 7501 Views
  • 4 replies
  • 5 kudos
Latest Reply
jch
New Contributor III
  • 5 kudos

It turns out my spark config was wrong    #Set Spark configuration    configs = {"fs.azure.account.auth.type": "OAuth",          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",          "fs.azu...

  • 5 kudos
3 More Replies
PrawnballNightm
by New Contributor III
  • 5704 Views
  • 4 replies
  • 0 kudos

Resolved! Cannot configure VS code databricks extension with a non-standard databricks URL: not a databricks host.

Hello,I'm trying to connect to our databricks instance using the vscode extension. However, when following this guide we cannot get the configuration to proceed past the point that it asks for our instance URL. The prompt appears to expect a URL of t...

databricks_error
  • 5704 Views
  • 4 replies
  • 0 kudos
Latest Reply
PrawnballNightm
New Contributor III
  • 0 kudos

Hello,Yes, the databricks team shared a modified version of the vs code plugin which did not include the URL matching logic. It connects successfully. However, our custom URL is as it is because our organisation is hosting its own instance of Databri...

  • 0 kudos
3 More Replies
Data_Analytics1
by Contributor III
  • 2308 Views
  • 1 replies
  • 0 kudos

Getting JsonParseException: Unexpected character ('<' (code 60))

I have a scheduled job that is executed using a notebook. Within one of the notebook cells, there is a check to determine if a table exists. However, even when the table does exist, it incorrectly identifies it as non-existent and proceeds to execut...

  • 2308 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Mahesh Chahare​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
Eelke
by New Contributor II
  • 6600 Views
  • 3 replies
  • 0 kudos

I want to perform interpolation on a streaming table in delta live tables.

I have the following code:from pyspark.sql.functions import * !pip install dbl-tempo from tempo import TSDF   from pyspark.sql.functions import *   # interpolate target_cols column linearly for tsdf dataframe def interpolate_tsdf(tsdf_data, target_c...

  • 6600 Views
  • 3 replies
  • 0 kudos
Latest Reply
Eelke
New Contributor II
  • 0 kudos

The issue was not resolved because we were trying to use a streaming table within TSDF which does not work.

  • 0 kudos
2 More Replies
Sas
by New Contributor II
  • 1530 Views
  • 1 replies
  • 0 kudos

A streaming job going into infinite looping

HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types i...

  • 1530 Views
  • 1 replies
  • 0 kudos
Latest Reply
swethaNandan
Databricks Employee
  • 0 kudos

Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan

  • 0 kudos
eyalo
by New Contributor II
  • 4875 Views
  • 6 replies
  • 0 kudos

Why the SFTP ingest doesn't work?

Hi, I did the following code but it seems like the cluster is running for a long period of time and then stops without any results. Attached my following code: (I used 'com.springml.spark.sftp' library and install it as Maven)Also i whitelisted my lo...

image
  • 4875 Views
  • 6 replies
  • 0 kudos
Latest Reply
eyalo
New Contributor II
  • 0 kudos

@Debayan Mukherjee​ Hi, I don't know if you got my reply so i am bouncing my message to you again.Thanks.

  • 0 kudos
5 More Replies
Ashwathy
by New Contributor II
  • 6820 Views
  • 2 replies
  • 2 kudos

Facing issue while using widget values in sql script

I am using below code to create and read widgets. I am assigning default value.dbutils.widgets.text("pname", "default","parameter_name")pname=dbutils.widgets.get("pname")I am using this widget parameter in some sql scripts. one example is given below...

  • 6820 Views
  • 2 replies
  • 2 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 2 kudos

This widget could not be displayed.
I am using below code to create and read widgets. I am assigning default value.dbutils.widgets.text("pname", "default","parameter_name")pname=dbutils.widgets.get("pname")I am using this widget parameter in some sql scripts. one example is given below...

This widget could not be displayed.
  • 2 kudos
This widget could not be displayed.
1 More Replies
Labels