cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Seenu45
by New Contributor II
  • 6690 Views
  • 3 replies
  • 1 kudos

Resolved! JavaPackage' object is not callable :: Mlean

Hi Folks, We are working on production Databricks project using Mleap. when run below code on databricks, it throws error like " 'JavaPackage' object is not callable" code :import mleap.pyspark from mleap.pyspark.spark_support import SimpleSparkSer...

  • 6690 Views
  • 3 replies
  • 1 kudos
Latest Reply
Seenu45
New Contributor II
  • 1 kudos

Thanks syamspr. it is working now.

  • 1 kudos
2 More Replies
pepevo
by New Contributor III
  • 18013 Views
  • 10 replies
  • 0 kudos

Resolved! How to convert column type from decimal to date in sparksql

I need to convert column type from decimal to date in sparksql when the format is not yyyy-mm-dd? A table contains column data declared as decimal (38,0) and data is in yyyymmdd format and I am unable to run sql queries on it in databrick notebook. ...

  • 18013 Views
  • 10 replies
  • 0 kudos
Latest Reply
pepevo
New Contributor III
  • 0 kudos

thank you Tom. I made it work already.

  • 0 kudos
9 More Replies
ArielHerrera
by New Contributor II
  • 21215 Views
  • 5 replies
  • 2 kudos

Resolved! How to display SHAP plots?

I am looking to display SHAP plots, here is the code:import xgboost import shap shap.initjs() # load JS visualization code to notebookX,y = shap.datasets.boston() # train XGBoost model model = xgboost.train({"learning_rate": 0.01}, xgboost.DMatri...

0693f000007OoIfAAK
  • 21215 Views
  • 5 replies
  • 2 kudos
Latest Reply
lrnzcig
New Contributor II
  • 2 kudos

As @Vinh dqvinh87​  noted, the accepted solution only works for force_plot. For other plots, the following trick works for me:import matplotlib.pyplot as plt p = shap.summary_plot(shap_values, test_df, show=False) display(p)

  • 2 kudos
4 More Replies
Muthu145
by New Contributor
  • 16256 Views
  • 3 replies
  • 0 kudos

KNN classifier on Spark

Hi Team , Can you please help me in implementing KNN classifer in pyspark using distributed architecture and processing the dataset. Even I want to validate the KNN model with the testing dataset. I tried to use scikit learn but the program is runn...

  • 16256 Views
  • 3 replies
  • 0 kudos
Latest Reply
SouravSaha
New Contributor II
  • 0 kudos

Hey, about about using NEC Frovedis (https://github.com/frovedis/frovedis) framework for the same. Refer: https://github.com/frovedis/frovedis/blob/master/src/foreign_if/python/examples/unsupervised_knn_demo.py It works on a distributed framework (...

  • 0 kudos
2 More Replies
zhaoxuan210
by New Contributor
  • 25584 Views
  • 1 replies
  • 0 kudos

How can I read all the files in a folder on S3 into several pandas dataframes?

import pandas as pd import glob path = "s3://somewhere/" # use your path all_files = glob.glob(path + "/*.csv") print(all_files) li = [] for filename in all_files: dfi = pd.read_csv(filename,names =['acct_id', 'SOR_ID'], dtype={'acct_id':str,...

  • 25584 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @zhaoxuan210, Please go through the below answer,https://stackoverflow.com/questions/52855221/reading-multiple-csv-files-from-s3-bucket-with-boto3

  • 0 kudos
Van-DuyetLe
by New Contributor
  • 45565 Views
  • 5 replies
  • 3 kudos

What's the difference between Interactive Clusters and Job Cluster?

I am new to databricks. I would like to know what is the difference between Interactive Clusters and Job Cluster? There are no official document now.

  • 45565 Views
  • 5 replies
  • 3 kudos
Latest Reply
Forum_Admin
Contributor
  • 3 kudos

Sports news Football news International football news Football news Thai football news, Thai football Follow news, know sports news at Siamsportnews

  • 3 kudos
4 More Replies
User16301467532
by New Contributor II
  • 25762 Views
  • 9 replies
  • 1 kudos

How can I change the parquet compression algorithm from gzip to something else?

Spark, by default, uses gzip to store parquet files. I would like to change the compression algorithm from gzip to snappy or lz4.

  • 25762 Views
  • 9 replies
  • 1 kudos
Latest Reply
ZhenZeng
New Contributor II
  • 1 kudos

spark.sql("set spark.sql.parquet.compression.codec=gzip");

  • 1 kudos
8 More Replies
Venkata_Krishna
by New Contributor
  • 10540 Views
  • 1 replies
  • 0 kudos

convert string dataframe column MM/dd/yyyy hh:mm:ss AM/PM to timestamp MM-dd-yyyy hh:mm:ss

How to convert string 6/3/2019 5:06:00 AM to timestamp in 24 hour format MM-dd-yyyy hh:mm:ss in python spark.

  • 10540 Views
  • 1 replies
  • 0 kudos
Latest Reply
lee
Contributor
  • 0 kudos

You would use a combination of the functions: pyspark.sql.functions.from_unixtime(timestamp, format='yyyy-MM-dd HH:mm:ss') (documentation) and pyspark.sql.functions.unix_timestamp(timestamp=None, format='yyyy-MM-dd HH:mm:ss') (documentation)from pysp...

  • 0 kudos
MithuWagh
by New Contributor
  • 9249 Views
  • 1 replies
  • 0 kudos

How to deal with column name with .(dot) in pyspark dataframe??

We are streaming data from kafka source with json but in some column we are getting .(dot) in column names.streaming json data: df1 = df.selectExpr("CAST(value AS STRING)") {"pNum":"A14","from":"telecom","payload":{"TARGET":"1","COUNTRY":"India"...

  • 9249 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @Mithu Wagh you can use backticks to enclose the column name.df.select("`col0.1`")

  • 0 kudos
KrisMusial
by New Contributor
  • 7024 Views
  • 2 replies
  • 0 kudos

Resolved! Saving to parquet with SaveMode.Overwrite throws exception

Hello, I'm trying to save DataFrame in parquet with SaveMode.Overwrite with no success. I minimized the code and reproduced the issue with the following two cells: > case class MyClass(val fld1: Integer, val fld2: Integer) > > val lst1 = sc.paralle...

  • 7024 Views
  • 2 replies
  • 0 kudos
Latest Reply
Guru421421
New Contributor II
  • 0 kudos

results.select("ValidationTable", "Results","Description","CreatedBy","ModifiedBy","CreatedDate","ModifiedDate").write.mode('overwrite').save("

  • 0 kudos
1 More Replies
NandhaKumar
by New Contributor II
  • 7215 Views
  • 3 replies
  • 0 kudos

How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .

I have created a databricks in azure. I have created a cluster for python 3. I am creating a job using spark-submit parameters. How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in ...

  • 7215 Views
  • 3 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @Nandha Kumar,please go through the below docs to pass python files as job,https://docs.databricks.com/dev-tools/api/latest/jobs.html#sparkpythontask

  • 0 kudos
2 More Replies
cfregly
by Contributor
  • 5347 Views
  • 4 replies
  • 0 kudos
  • 5347 Views
  • 4 replies
  • 0 kudos
Latest Reply
GeethGovindSrin
New Contributor II
  • 0 kudos

@cfregly​ : For DataFrames, you can use the following code for using groupBy without aggregations.Df.groupBy(Df["column_name"]).agg({})

  • 0 kudos
3 More Replies
tourist_on_road
by New Contributor
  • 7023 Views
  • 1 replies
  • 0 kudos

How to read binary data in pyspark

I'm reading binary file http://snap.stanford.edu/data/amazon/productGraph/image_features/image_features.b using pyspark.from io importStringIO import array img_embedding_file = sc.binaryRecords("s3://bucket/image_features.b",4106)def mapper(featur...

  • 7023 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @tourist_on_road, please go through the below spark docs,https://spark.apache.org/docs/2.3.0/api/python/pyspark.html#pyspark.SparkContext.binaryFiles

  • 0 kudos
MikeK_
by New Contributor II
  • 15209 Views
  • 1 replies
  • 0 kudos

Resolved! SQL variables in a notebook

Hi, In an SQL notebook, using this link: https://docs.databricks.com/spark/latest/spark-sql/language-manual/set.html I managed to figure out to set values and how to get the value. SET my_val=10; //saves the value 10 for key my_val SET my_val; //dis...

  • 15209 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @Mike K.., you can do this with widgets and getArgument. Here's a small example of what that might look like: https://community.databricks.com/s/feed/0D53f00001HKHZfCAP

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels