cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tanjil
by New Contributor III
  • 3022 Views
  • 4 replies
  • 2 kudos

print(flush = True) not working

Hello, I have the following minimum example working example using multiprocessing:from multiprocessing import Pool   files_list = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]     def f(t): print('Hello from child process', flush = Tr...

  • 3022 Views
  • 4 replies
  • 2 kudos
Latest Reply
tanjil
New Contributor III
  • 2 kudos

No errors are generated. The code executes successfully, but there the print statement for "Hello from child process" does not work.

  • 2 kudos
3 More Replies
JK2021
by New Contributor III
  • 4346 Views
  • 6 replies
  • 3 kudos

Resolved! Exception handling in Databricks

We are planning to customise code on Databricks to call Salesforce bulk API 2.0 to load data from databricks delta table to Salesforce.My question is : All the exception handling, retries and all around Bulk API can be coded explicitly in Data bricks...

  • 4346 Views
  • 6 replies
  • 3 kudos
Latest Reply
Rolx
New Contributor II
  • 3 kudos

Bulk api is working as expected for loading data?

  • 3 kudos
5 More Replies
William_Scardua
by Valued Contributor
  • 2670 Views
  • 1 replies
  • 3 kudos

How to use Pylint to check your pyspark code quality ?

Hi guys,I would like to use the Pylint to check my pyspark scripts, do you do that ?Thank you ?

  • 2670 Views
  • 1 replies
  • 3 kudos
Latest Reply
developer_lumo
New Contributor II
  • 3 kudos

Currently I am working on Databricks (Notebooks) and have the same issue as unable to find a linter that is well integrated with Python, Pyspark and databricks notebooks. 

  • 3 kudos
kjoth
by Contributor II
  • 20658 Views
  • 9 replies
  • 7 kudos

How to make the job fail via code after handling exception

Hi , We are capturing the exception if an error occurs using try except. But we want the job status to be failed once we got the exception. Whats the best way to do that. We are using pyspark.

  • 20658 Views
  • 9 replies
  • 7 kudos
Latest Reply
kumar_ravi
New Contributor III
  • 7 kudos

you can do some hack arround   dbutils = get_dbutils(spark)    tables_with_exceptions = []    for table_config in table_configs:        try:            process(spark, table_config)        except Exception as e:            exception_detail = f"Error p...

  • 7 kudos
8 More Replies
pgagliardi
by New Contributor II
  • 2087 Views
  • 1 replies
  • 2 kudos

Latest pushed code is not taken into account by Notebook

Hello, I cloned a repo my_repo in the Dataricks space Repos.Inside my_repo, I created a notebook new_experiment where I can import functions from my_repo, which is really handy. When I want to modify a function in my_repo, I open my local IDE, do the...

  • 2087 Views
  • 1 replies
  • 2 kudos
Latest Reply
Jnguyen
Databricks Employee
  • 2 kudos

Use  %reload_ext autoreload instead, it will do your expected behavior.You just need to run it once, like %load_ext autoreload %autoreload 2

  • 2 kudos
Mr__D
by New Contributor II
  • 26545 Views
  • 7 replies
  • 1 kudos

Resolved! Writing modular code in Databricks

Hi All, Could you please suggest to me the best way to write PySpark code in Databricks,I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the res...

  • 26545 Views
  • 7 replies
  • 1 kudos
Latest Reply
Gamlet
New Contributor II
  • 1 kudos

Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks...

  • 1 kudos
6 More Replies
Danielsg94
by New Contributor II
  • 35054 Views
  • 5 replies
  • 1 kudos

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

When I use the following code: df .coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") .save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...

  • 35054 Views
  • 5 replies
  • 1 kudos
Latest Reply
Simha
New Contributor II
  • 1 kudos

Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...

  • 1 kudos
4 More Replies
Erik
by Valued Contributor III
  • 12529 Views
  • 4 replies
  • 3 kudos

Resolved! How to run code-formating on the notebooks

Has anyone found a nice way to run code-formating (like black) on the notebooks **in the workspace**? My current workflow is to commit the file, pull it locally, format, repush and pull. It would be nice if it was some relatively easy way to run blac...

  • 12529 Views
  • 4 replies
  • 3 kudos
Latest Reply
MartinPlay01
New Contributor II
  • 3 kudos

Hi Erik,I don't know if you are aware of this feature, currently there is an option to format the code in your databricks notebooks using the black code style formatter.Just you need to either have a version of your DBR equal to or greater than 11.2 ...

  • 3 kudos
3 More Replies
Prank
by New Contributor III
  • 6949 Views
  • 10 replies
  • 7 kudos
  • 6949 Views
  • 10 replies
  • 7 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 7 kudos

@Prank  why do you want the browser hostname?

  • 7 kudos
9 More Replies
Mr_K
by New Contributor
  • 9998 Views
  • 2 replies
  • 2 kudos

AnalysisException: [UC_COMMAND_NOT_SUPPORTED] Spark higher-order functions are not supported in Unity Catalog.;

Hello,forecast_date = '2017-12-01' spark.conf.set('spark.sql.shuffle.partitions', 500 ) # generate forecast for this data forecasts = ( history .where(history.date < forecast_date) # limit training data to prior to our forecast date .groupBy...

  • 9998 Views
  • 2 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 2 kudos

@Mr_K ApplyInPandas is a higher order function in Python. As of now, we do not support higher order functions in Unity Catalog. We do support direct calls made to python UDFs. Here is an example of how to reference UDFs in UC - https://docs.databrick...

  • 2 kudos
1 More Replies
jch
by New Contributor III
  • 7871 Views
  • 4 replies
  • 5 kudos

Resolved! Why does spark.read.csv come back with an error: com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/cntnr/demo/circuits.csv ?

I need help understanding why I can't open a file.In a databricks notebook, I use this code:%fs   ls /mnt/cntnr/demoI get back dbfs:/mnt/cntnr/demo/circuits.csv as one of the path values.When I use this code, I get an error:circuits_df = spark.read....

  • 7871 Views
  • 4 replies
  • 5 kudos
Latest Reply
jch
New Contributor III
  • 5 kudos

It turns out my spark config was wrong    #Set Spark configuration    configs = {"fs.azure.account.auth.type": "OAuth",          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",          "fs.azu...

  • 5 kudos
3 More Replies
PrawnballNightm
by New Contributor III
  • 7325 Views
  • 4 replies
  • 0 kudos

Resolved! Cannot configure VS code databricks extension with a non-standard databricks URL: not a databricks host.

Hello,I'm trying to connect to our databricks instance using the vscode extension. However, when following this guide we cannot get the configuration to proceed past the point that it asks for our instance URL. The prompt appears to expect a URL of t...

databricks_error
  • 7325 Views
  • 4 replies
  • 0 kudos
Latest Reply
PrawnballNightm
New Contributor III
  • 0 kudos

Hello,Yes, the databricks team shared a modified version of the vs code plugin which did not include the URL matching logic. It connects successfully. However, our custom URL is as it is because our organisation is hosting its own instance of Databri...

  • 0 kudos
3 More Replies
Data_Analytics1
by Contributor III
  • 2577 Views
  • 1 replies
  • 0 kudos

Getting JsonParseException: Unexpected character ('<' (code 60))

I have a scheduled job that is executed using a notebook. Within one of the notebook cells, there is a check to determine if a table exists. However, even when the table does exist, it incorrectly identifies it as non-existent and proceeds to execut...

  • 2577 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Mahesh Chahare​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
Eelke
by New Contributor II
  • 6927 Views
  • 3 replies
  • 0 kudos

I want to perform interpolation on a streaming table in delta live tables.

I have the following code:from pyspark.sql.functions import * !pip install dbl-tempo from tempo import TSDF   from pyspark.sql.functions import *   # interpolate target_cols column linearly for tsdf dataframe def interpolate_tsdf(tsdf_data, target_c...

  • 6927 Views
  • 3 replies
  • 0 kudos
Latest Reply
Eelke
New Contributor II
  • 0 kudos

The issue was not resolved because we were trying to use a streaming table within TSDF which does not work.

  • 0 kudos
2 More Replies
Sas
by New Contributor II
  • 1707 Views
  • 1 replies
  • 0 kudos

A streaming job going into infinite looping

HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types i...

  • 1707 Views
  • 1 replies
  • 0 kudos
Latest Reply
swethaNandan
Databricks Employee
  • 0 kudos

Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan

  • 0 kudos
Labels