cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

xiaozy
by New Contributor
  • 1498 Views
  • 1 replies
  • 1 kudos
  • 1498 Views
  • 1 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

Hi @xiaojun wang​  please check the blog and let us know if this helps you.https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

  • 1 kudos
MudassarA
by New Contributor II
  • 15417 Views
  • 4 replies
  • 1 kudos

Resolved! How to fix TypeError: __init__() got an unexpected keyword argument 'max_iter'?

# Create the model using sklearn (don't worry about the parameters for now): model = SGDRegressor(loss='squared_loss', verbose=0, eta0=0.0003, max_iter=3000) Train/fit the model to the train-part of the dataset: odel.fit(X_train, y_train) ERROR: Typ...

  • 15417 Views
  • 4 replies
  • 1 kudos
Latest Reply
Fantomas_nl
New Contributor II
  • 1 kudos

Replacing max_iter with n_iter resolves the error. Thnx! It is a bit unusual to expect errors like this with this type of solution from Microsoft. As if it could not be prevented..

  • 1 kudos
3 More Replies
User15787040559
by Databricks Employee
  • 1664 Views
  • 2 replies
  • 0 kudos

What subset of mysql sql syntax we support in spark sql?

https://spark.apache.org/docs/latest/sql-ref-syntax.html

  • 1664 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Spark 3 has experimental support for ANSI. Read more here:https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html

  • 0 kudos
1 More Replies
haseebkhan1421
by New Contributor
  • 2584 Views
  • 1 replies
  • 3 kudos

How can I create a column on the fly which would have same value for all rows in spark sql query

I have a SQL query which I am converting into spark sql in azure databricks running in my jupyter notebook. In my SQL query, a column named Type is created on the fly which has value 'Goal' for every row:SELECT Type='Goal', Value FROM tableNow, when...

  • 2584 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 3 kudos

The correct syntax would be: SELECT 'Goal' AS Type, Value FROM table

  • 3 kudos
Kotofosonline
by New Contributor III
  • 1200 Views
  • 1 replies
  • 0 kudos

Bug Report: Date type with year less than 1000 (years 1-999) in spark sql where [solved]

Hi, I noticed unexpected behavior for Date type. If year value is less then 1000 then filtering do not work. Steps:create table test (date Date); insert into test values ('0001-01-01'); select * from test where date = '0001-01-01' Returns 0 rows....

  • 1200 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kotofosonline
New Contributor III
  • 0 kudos

Hm, seems to work now.

  • 0 kudos
daniil_terentye
by New Contributor III
  • 2850 Views
  • 3 replies
  • 0 kudos

EXISTS statement works incorrectly

Hi everybody. Looks like EXISTS statement works incorrectly. If i execute the following statement in SQL Server it returns one row, as it should WITH a AS ( SELECT '1' AS id, 'Super Company' AS name UNION SELECT '2' AS id, 'SUPER COMPANY...

  • 2850 Views
  • 3 replies
  • 0 kudos
Latest Reply
daniil_terentye
New Contributor III
  • 0 kudos

In newer versions of spark it's possible to use ANTI JOIN and SEMI JOIN It looks this way:WITH a AS ( SELECT '1' AS id, 'Super Company' AS name UNION SELECT '2' AS id, 'SUPER COMPANY' AS name ), b AS ( SELECT 'a@b.com' AS user_username, 'Super Co...

  • 0 kudos
2 More Replies
User16783853501
by Databricks Employee
  • 1993 Views
  • 2 replies
  • 1 kudos

using Spark SQL or particularly %SQL in a databricks notebook, is there a way to use pagination or offset or skip ?

using Spark SQL or particularly %SQL in a databricks notebook, is there a way to use pagination or offset or skip ? 

  • 1993 Views
  • 2 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

There is no offset support yet. Here are a few possible workarounds If you data is all in one partition ( rarely the case ) , you could create a column with monotonically_increasing_id and apply filter conditions. if there are multiple partitions...

  • 1 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 923 Views
  • 1 replies
  • 0 kudos

Time stamp changes in spark sql

Hi Team Is there a way to change the current timestamp from the current time zone to a different time zone .

  • 923 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Databricks Employee
  • 0 kudos

import sqlContext.implicits._import org.apache.spark.sql.functions._inputDF.select(   unix_timestamp($"unix_timestamp").alias("unix_timestamp"),   from_utc_timestamp($"unix_timestamp".cast(DataTypes.TimestampType), "UTC").alias("UTC"),   from_utc_tim...

  • 0 kudos
cfregly
by Contributor
  • 5809 Views
  • 5 replies
  • 0 kudos
  • 5809 Views
  • 5 replies
  • 0 kudos
Latest Reply
srisre111
New Contributor II
  • 0 kudos

I am trying to store a dataframe as table in databricks and encountering the following error, can someone help? "typeerror: field date: can not merge type <class 'pyspark.sql.types.stringtype'> and <class 'pyspark.sql.types.doubletype'>"

  • 0 kudos
4 More Replies
dhanunjaya
by New Contributor II
  • 8221 Views
  • 6 replies
  • 0 kudos

how to remove empty rows from the data frame.

lets assume if i have 10 columns in a data frame,all 10 columns has empty values for 100 rows out of 200 rows, how i can skip the empty rows?

  • 8221 Views
  • 6 replies
  • 0 kudos
Latest Reply
GaryDiaz
New Contributor II
  • 0 kudos

you can try this: df.na.drop(how = "all"), this will remove the row only if all the rows are null or NaN

  • 0 kudos
5 More Replies
cfregly
by Contributor
  • 4392 Views
  • 4 replies
  • 0 kudos
  • 4392 Views
  • 4 replies
  • 0 kudos
Latest Reply
GeethGovindSrin
New Contributor II
  • 0 kudos

@cfregly​ : For DataFrames, you can use the following code for using groupBy without aggregations.Df.groupBy(Df["column_name"]).agg({})

  • 0 kudos
3 More Replies
RohiniMathur
by New Contributor II
  • 17117 Views
  • 1 replies
  • 0 kudos

Resolved! Length Value of a column in pyspark

Hello, i am using pyspark 2.12 After Creating Dataframe can we measure the length value for each row. For Example: I am measuring length of a value in column 2 Input file |TYCO|1303| |EMC |120989| |VOLVO|102329| |BMW|130157| |FORD|004| Output ...

  • 17117 Views
  • 1 replies
  • 0 kudos
Latest Reply
lee
Contributor
  • 0 kudos

You can use the length function for this from pyspark.sql.functions import length mock_data = [('TYCO', '1303'),('EMC', '120989'), ('VOLVO', '102329'),('BMW', '130157'),('FORD', '004')] df = spark.createDataFrame(mock_data, ['col1', 'col2']) df2 = d...

  • 0 kudos
MudassarA
by New Contributor II
  • 16724 Views
  • 1 replies
  • 0 kudos

NameError: name 'col' is not defined

I m executing the below code and using Pyhton in notebook and it appears that the col() function is not getting recognized . I want to know if the col() function belongs to any specific Dataframe library or Python library .I dont want to use pyspark...

  • 16724 Views
  • 1 replies
  • 0 kudos
Latest Reply
MOHAN_KUMARL_N
New Contributor II
  • 0 kudos

@mudassar45@gmail.com as the document describe generic column not yet associated. Please refer the below code. display(peopleDF.select("firstName").filter("firstName = 'An'"))

  • 0 kudos
AnilKumar
by New Contributor II
  • 10338 Views
  • 4 replies
  • 0 kudos

How to solve column header issues in Spark SQL data frame

My code : val name = sc.textFile("/FileStore/tables/employeenames.csv") case class x(ID:String,Employee_name:String) val namePairRDD = name.map(_.split(",")).map(x => (x(0), x(1).trim.toString)).toDF("ID", "Employee_name") namePairRDD.createOrRe...

0693f000007OoHrAAK
  • 10338 Views
  • 4 replies
  • 0 kudos
Latest Reply
evan_matthews1
New Contributor II
  • 0 kudos

Hi, I have the opposite issue. When I run and SQL query through the bulk download as per the standard prc fobasx notebook, the first row of data somehow gets attached to the column headers. When I import the csv file into R using read_csv, R thinks ...

  • 0 kudos
3 More Replies
Labels