cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

xiaozy
by New Contributor
  • 1878 Views
  • 1 replies
  • 1 kudos
  • 1878 Views
  • 1 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

Hi @xiaojun wang​  please check the blog and let us know if this helps you.https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

  • 1 kudos
Maser_AZ
by New Contributor II
  • 18182 Views
  • 4 replies
  • 1 kudos

Resolved! How to fix TypeError: __init__() got an unexpected keyword argument 'max_iter'?

# Create the model using sklearn (don't worry about the parameters for now): model = SGDRegressor(loss='squared_loss', verbose=0, eta0=0.0003, max_iter=3000) Train/fit the model to the train-part of the dataset: odel.fit(X_train, y_train) ERROR: Typ...

  • 18182 Views
  • 4 replies
  • 1 kudos
Latest Reply
Fantomas_nl
New Contributor II
  • 1 kudos

Replacing max_iter with n_iter resolves the error. Thnx! It is a bit unusual to expect errors like this with this type of solution from Microsoft. As if it could not be prevented..

  • 1 kudos
3 More Replies
User15787040559
by Databricks Employee
  • 2506 Views
  • 2 replies
  • 0 kudos

What subset of mysql sql syntax we support in spark sql?

https://spark.apache.org/docs/latest/sql-ref-syntax.html

  • 2506 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Spark 3 has experimental support for ANSI. Read more here:https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html

  • 0 kudos
1 More Replies
haseebkhan1421
by New Contributor
  • 3192 Views
  • 1 replies
  • 3 kudos

How can I create a column on the fly which would have same value for all rows in spark sql query

I have a SQL query which I am converting into spark sql in azure databricks running in my jupyter notebook. In my SQL query, a column named Type is created on the fly which has value 'Goal' for every row:SELECT Type='Goal', Value FROM tableNow, when...

  • 3192 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 3 kudos

The correct syntax would be: SELECT 'Goal' AS Type, Value FROM table

  • 3 kudos
Kotofosonline
by New Contributor III
  • 1534 Views
  • 1 replies
  • 0 kudos

Bug Report: Date type with year less than 1000 (years 1-999) in spark sql where [solved]

Hi, I noticed unexpected behavior for Date type. If year value is less then 1000 then filtering do not work. Steps:create table test (date Date); insert into test values ('0001-01-01'); select * from test where date = '0001-01-01' Returns 0 rows....

  • 1534 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kotofosonline
New Contributor III
  • 0 kudos

Hm, seems to work now.

  • 0 kudos
daniil_terentye
by New Contributor III
  • 3774 Views
  • 3 replies
  • 0 kudos

EXISTS statement works incorrectly

Hi everybody. Looks like EXISTS statement works incorrectly. If i execute the following statement in SQL Server it returns one row, as it should WITH a AS ( SELECT '1' AS id, 'Super Company' AS name UNION SELECT '2' AS id, 'SUPER COMPANY...

  • 3774 Views
  • 3 replies
  • 0 kudos
Latest Reply
daniil_terentye
New Contributor III
  • 0 kudos

In newer versions of spark it's possible to use ANTI JOIN and SEMI JOIN It looks this way:WITH a AS ( SELECT '1' AS id, 'Super Company' AS name UNION SELECT '2' AS id, 'SUPER COMPANY' AS name ), b AS ( SELECT 'a@b.com' AS user_username, 'Super Co...

  • 0 kudos
2 More Replies
User16783853501
by Databricks Employee
  • 2501 Views
  • 2 replies
  • 1 kudos

using Spark SQL or particularly %SQL in a databricks notebook, is there a way to use pagination or offset or skip ?

using Spark SQL or particularly %SQL in a databricks notebook, is there a way to use pagination or offset or skip ? 

  • 2501 Views
  • 2 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 1 kudos

There is no offset support yet. Here are a few possible workarounds If you data is all in one partition ( rarely the case ) , you could create a column with monotonically_increasing_id and apply filter conditions. if there are multiple partitions...

  • 1 kudos
1 More Replies
User16826994223
by Databricks Employee
  • 1215 Views
  • 1 replies
  • 0 kudos

Time stamp changes in spark sql

Hi Team Is there a way to change the current timestamp from the current time zone to a different time zone .

  • 1215 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Databricks Employee
  • 0 kudos

import sqlContext.implicits._import org.apache.spark.sql.functions._inputDF.select(   unix_timestamp($"unix_timestamp").alias("unix_timestamp"),   from_utc_timestamp($"unix_timestamp".cast(DataTypes.TimestampType), "UTC").alias("UTC"),   from_utc_tim...

  • 0 kudos
cfregly
by Contributor
  • 7340 Views
  • 5 replies
  • 0 kudos
  • 7340 Views
  • 5 replies
  • 0 kudos
Latest Reply
srisre111
New Contributor II
  • 0 kudos

I am trying to store a dataframe as table in databricks and encountering the following error, can someone help? "typeerror: field date: can not merge type <class 'pyspark.sql.types.stringtype'> and <class 'pyspark.sql.types.doubletype'>"

  • 0 kudos
4 More Replies
dhanunjaya
by New Contributor II
  • 10218 Views
  • 6 replies
  • 0 kudos

how to remove empty rows from the data frame.

lets assume if i have 10 columns in a data frame,all 10 columns has empty values for 100 rows out of 200 rows, how i can skip the empty rows?

  • 10218 Views
  • 6 replies
  • 0 kudos
Latest Reply
GaryDiaz
New Contributor II
  • 0 kudos

you can try this: df.na.drop(how = "all"), this will remove the row only if all the rows are null or NaN

  • 0 kudos
5 More Replies
cfregly
by Contributor
  • 5424 Views
  • 4 replies
  • 0 kudos
  • 5424 Views
  • 4 replies
  • 0 kudos
Latest Reply
GeethGovindSrin
New Contributor II
  • 0 kudos

@cfregly​ : For DataFrames, you can use the following code for using groupBy without aggregations.Df.groupBy(Df["column_name"]).agg({})

  • 0 kudos
3 More Replies
RohiniMathur
by New Contributor II
  • 19251 Views
  • 1 replies
  • 0 kudos

Resolved! Length Value of a column in pyspark

Hello, i am using pyspark 2.12 After Creating Dataframe can we measure the length value for each row. For Example: I am measuring length of a value in column 2 Input file |TYCO|1303| |EMC |120989| |VOLVO|102329| |BMW|130157| |FORD|004| Output ...

  • 19251 Views
  • 1 replies
  • 0 kudos
Latest Reply
lee
Databricks Employee
  • 0 kudos

You can use the length function for this from pyspark.sql.functions import length mock_data = [('TYCO', '1303'),('EMC', '120989'), ('VOLVO', '102329'),('BMW', '130157'),('FORD', '004')] df = spark.createDataFrame(mock_data, ['col1', 'col2']) df2 = d...

  • 0 kudos
Maser_AZ
by New Contributor II
  • 18791 Views
  • 1 replies
  • 0 kudos

NameError: name 'col' is not defined

I m executing the below code and using Pyhton in notebook and it appears that the col() function is not getting recognized . I want to know if the col() function belongs to any specific Dataframe library or Python library .I dont want to use pyspark...

  • 18791 Views
  • 1 replies
  • 0 kudos
Latest Reply
MOHAN_KUMARL_N
New Contributor II
  • 0 kudos

@mudassar45@gmail.com as the document describe generic column not yet associated. Please refer the below code. display(peopleDF.select("firstName").filter("firstName = 'An'"))

  • 0 kudos
AnilKumar
by New Contributor II
  • 12555 Views
  • 4 replies
  • 0 kudos

How to solve column header issues in Spark SQL data frame

My code : val name = sc.textFile("/FileStore/tables/employeenames.csv") case class x(ID:String,Employee_name:String) val namePairRDD = name.map(_.split(",")).map(x => (x(0), x(1).trim.toString)).toDF("ID", "Employee_name") namePairRDD.createOrRe...

0693f000007OoHrAAK
  • 12555 Views
  • 4 replies
  • 0 kudos
Latest Reply
evan_matthews1
New Contributor II
  • 0 kudos

Hi, I have the opposite issue. When I run and SQL query through the bulk download as per the standard prc fobasx notebook, the first row of data somehow gets attached to the column headers. When I import the csv file into R using read_csv, R thinks ...

  • 0 kudos
3 More Replies
Labels