cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jose_gonzalez
by Databricks Employee
  • 2771 Views
  • 1 replies
  • 0 kudos

Resolved! how to troubleshot Python version mismatch error in DBconnect?

Im getting some weird messages when trying to run my Dbconnect. I would like to know if there is a troubleshooting guide to solve Python version mismatch errors.

  • 2771 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

We have a troubleshooting section in our docs that could help you to solve this issue. Please check the docs here https://docs.databricks.com/dev-tools/databricks-connect.html#python-version-mismatch

  • 0 kudos
alexott
by Databricks Employee
  • 2794 Views
  • 1 replies
  • 0 kudos

What libraries could be used for unit testing of the Spark code?

We need to add unit test cases for our code that we're writing using the Scala in Python. But we can't use the calls like `assertEqual` for comparing the content of DataFrames. Are any special libraries for that?

  • 2794 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexott
Databricks Employee
  • 0 kudos

There are several libraries for Scala and Python that help with writing unit tests for Spark code.For Scala you can use following:Built-in Spark test suite - it's designed to test all parts of Spark. It supports RDD, Dataframe/Dataset, Streaming API...

  • 0 kudos
User16753724663
by Valued Contributor
  • 1525 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to create a token while deploying the workspace using terraform

we have automated out deployment with python API's however we have been caught in a situation which we cannot yet solve.We are looking to collect a token during the first deployment within the environment. currently our API requires a token.Is there...

  • 1525 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16753724663
Valued Contributor
  • 0 kudos

We can use below API to create a token and use the username and passwordcurl -X POST -u "admin_email":"xxxx" https://host/api/2.0/token/create -d' { "lifetime_seconds": 100, "comment": "this is an example token" }'

  • 0 kudos
User16788317454
by New Contributor III
  • 1380 Views
  • 1 replies
  • 0 kudos
  • 1380 Views
  • 1 replies
  • 0 kudos
Latest Reply
j_weaver
New Contributor III
  • 0 kudos

If you are talking about distributed training of a single XGBoost model, there is no built-in capability in SparkML. SparkML supports gradient boosted trees, but not XGBoost specifically. However, there are 3rd party packages, such as XGBoost4J that ...

  • 0 kudos
j_weaver
by New Contributor III
  • 1660 Views
  • 1 replies
  • 0 kudos
  • 1660 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16788317454
New Contributor III
  • 0 kudos

With Spark, there are a few ways you can scale your model: TrainingHyperparameter tuningInferenceIf you're looking to train one model across multiple workers, you can leverage Horovod. It's an open source project designed to simplify distributed neur...

  • 0 kudos
FernandoBenedet
by New Contributor
  • 5996 Views
  • 2 replies
  • 0 kudos

Loop through Dataframe in Python

Hello, Imagine you have a dataframe with cols: A, B, C. I want to add a column D based on some calculations of columns B and C of the previous record of the df. Which is the best way of doing this? I am trying to avoid looping through the df. I am u...

  • 5996 Views
  • 2 replies
  • 0 kudos
Latest Reply
quincybatten
New Contributor II
  • 0 kudos

Iterating through pandas dataFrame objects is generally slow. Pandas Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a...

  • 0 kudos
1 More Replies
dtr
by New Contributor
  • 6393 Views
  • 1 replies
  • 0 kudos

PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.

I am trying to write a function in Azure databricks. I would like to spark.sql inside the function. But it looks like I cannot use it with worker nodes. def SEL_ID(value, index): # some processing on value here ans = spark.sql("SELECT id FRO...

  • 6393 Views
  • 1 replies
  • 0 kudos
Latest Reply
MartinhoAzevedo
New Contributor II
  • 0 kudos

Hi there. i guess im a bit late but do you remember how and if you fixed this issue? im getting the same exact problem. @dtr

  • 0 kudos
Seenu45
by New Contributor II
  • 6082 Views
  • 3 replies
  • 1 kudos

Resolved! JavaPackage' object is not callable :: Mlean

Hi Folks, We are working on production Databricks project using Mleap. when run below code on databricks, it throws error like " 'JavaPackage' object is not callable" code :import mleap.pyspark from mleap.pyspark.spark_support import SimpleSparkSer...

  • 6082 Views
  • 3 replies
  • 1 kudos
Latest Reply
Seenu45
New Contributor II
  • 1 kudos

Thanks syamspr. it is working now.

  • 1 kudos
2 More Replies
NandhaKumar
by New Contributor II
  • 5571 Views
  • 3 replies
  • 0 kudos

How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .

I have created a databricks in azure. I have created a cluster for python 3. I am creating a job using spark-submit parameters. How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in ...

  • 5571 Views
  • 3 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @Nandha Kumar,please go through the below docs to pass python files as job,https://docs.databricks.com/dev-tools/api/latest/jobs.html#sparkpythontask

  • 0 kudos
2 More Replies
rba76
by New Contributor
  • 19896 Views
  • 2 replies
  • 0 kudos

Python spark.read.text Path does not exist

Dear all, I want to read files with python from a storage account. I followed this instruction https://docs.microsoft.com/en-us/azure/azure-databricks/store-secrets-azure-key-vault. This is my python code: dbutils.fs.mount(source = "wasbs://contain...

  • 19896 Views
  • 2 replies
  • 0 kudos
Latest Reply
PRADEEPCHEEKATL
New Contributor II
  • 0 kudos

@rba76​  Make sure helloworld.txt file exists in the container1 folderI'm able to view the text file using the same commands as follows:Mount Blob Storage:dbutils.fs.mount( source = "wasbs://sampledata@azure.blob.core.windows.net/Azure", mount_po...

  • 0 kudos
1 More Replies
desai_n_3
by New Contributor II
  • 14724 Views
  • 6 replies
  • 0 kudos

Cannot Convert Column to Bool Error - When Converting dataframe column which is in string to date type in python

Hi All, I am trying to convert a dataframe column which is in the format of string to date type format yyyy-MM-DD? I have written a sql query and stored it in dataframe. df3 = sqlContext.sql(sqlString2) df3.withColumn(df3['CalDay'],pd.to_datetime(df...

  • 14724 Views
  • 6 replies
  • 0 kudos
Latest Reply
JoshuaJames
New Contributor II
  • 0 kudos

Registered to post this so forgive the formatting nightmare This is a python databricks script function that allows you to convert from string to datetime or date and utilising coalescefrom pyspark.sql.functions import coalesce, to_date def to_dat...

  • 0 kudos
5 More Replies
Maser_AZ
by New Contributor II
  • 17218 Views
  • 1 replies
  • 0 kudos

NameError: name 'col' is not defined

I m executing the below code and using Pyhton in notebook and it appears that the col() function is not getting recognized . I want to know if the col() function belongs to any specific Dataframe library or Python library .I dont want to use pyspark...

  • 17218 Views
  • 1 replies
  • 0 kudos
Latest Reply
MOHAN_KUMARL_N
New Contributor II
  • 0 kudos

@mudassar45@gmail.com as the document describe generic column not yet associated. Please refer the below code. display(peopleDF.select("firstName").filter("firstName = 'An'"))

  • 0 kudos
martinch
by New Contributor II
  • 18318 Views
  • 4 replies
  • 0 kudos

DROP TABLE IF EXISTS does not work

When I try to run the command spark.sql("DROP TABLE IF EXISTS table_to_drop") and the table does not exist, I get the following error: AnalysisException: "Table or view 'table_to_drop' not found in database 'null';;\nDropTableCommand `table_to_drop...

  • 18318 Views
  • 4 replies
  • 0 kudos
Latest Reply
StevenWilliams
New Contributor II
  • 0 kudos

I agree about this being a usability bug. Documentation clearly states that if the optional flag "IF EXISTS" is provided that the statement will do nothing.https://docs.databricks.com/spark/latest/spark-sql/language-manual/drop-table.htmlDrop Table ...

  • 0 kudos
3 More Replies
Dee
by New Contributor
  • 11152 Views
  • 2 replies
  • 0 kudos

Resolved! How to Change Schema of a Spark SQL

I am new to Spark and just started an online pyspark tutorial. I uploaded the json data in DataBrick and wrote the commands as follows: df = sqlContext.sql("SELECT * FROM people_json") df.printSchema() from pyspark.sql.types import * data_schema =...

  • 11152 Views
  • 2 replies
  • 0 kudos
Latest Reply
bhanu2448
New Contributor II
  • 0 kudos

http://www.bigdatainterview.com/

  • 0 kudos
1 More Replies
AdamArold
by New Contributor
  • 5744 Views
  • 4 replies
  • 0 kudos

How can I integrate DataBricks into PyCharm?

Editing notebooks on DataBricks is rather cumbersome because it lacks a lot of features IDEs like PyCharm have. Another problem is that a DataBricks notebook comes with some local state which are not present on my computer. How can I edit notebooks ...

  • 5744 Views
  • 4 replies
  • 0 kudos
Latest Reply
SimonD_Morias
New Contributor II
  • 0 kudos

The documents are out for databricks-connect: https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html I've also written up about a few limitations I have found - some with workarounds: https://datathirst.net/blog/2019/3/7/databricks-co...

  • 0 kudos
3 More Replies
Labels