cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MudassarA
by New Contributor II
  • 14865 Views
  • 1 replies
  • 0 kudos

NameError: name 'col' is not defined

I m executing the below code and using Pyhton in notebook and it appears that the col() function is not getting recognized . I want to know if the col() function belongs to any specific Dataframe library or Python library .I dont want to use pyspark...

  • 14865 Views
  • 1 replies
  • 0 kudos
Latest Reply
MOHAN_KUMARL_N
New Contributor II
  • 0 kudos

@mudassar45@gmail.com as the document describe generic column not yet associated. Please refer the below code. display(peopleDF.select("firstName").filter("firstName = 'An'"))

  • 0 kudos
AnilKumar
by New Contributor II
  • 8590 Views
  • 4 replies
  • 0 kudos

How to solve column header issues in Spark SQL data frame

My code : val name = sc.textFile("/FileStore/tables/employeenames.csv") case class x(ID:String,Employee_name:String) val namePairRDD = name.map(_.split(",")).map(x => (x(0), x(1).trim.toString)).toDF("ID", "Employee_name") namePairRDD.createOrRe...

0693f000007OoHrAAK
  • 8590 Views
  • 4 replies
  • 0 kudos
Latest Reply
evan_matthews1
New Contributor II
  • 0 kudos

Hi, I have the opposite issue. When I run and SQL query through the bulk download as per the standard prc fobasx notebook, the first row of data somehow gets attached to the column headers. When I import the csv file into R using read_csv, R thinks ...

  • 0 kudos
3 More Replies
mashaye
by New Contributor
  • 20732 Views
  • 6 replies
  • 2 kudos

How can I call a stored procedure in Spark Sql?

I have seen the following code: val url = "jdbc:mysql://yourIP:yourPort/test? user=yourUsername; password=yourPassword" val df = sqlContext .read .format("jdbc") .option("url", url) .option("dbtable", "people") .load() But I ...

  • 20732 Views
  • 6 replies
  • 2 kudos
Latest Reply
j500sut
New Contributor III
  • 2 kudos

This doesn't seem to be supported. There is an alternative but requires using pyodbc and adding to your init script. Details can be found here: https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark I hav...

  • 2 kudos
5 More Replies
martinch
by New Contributor II
  • 12422 Views
  • 4 replies
  • 0 kudos

DROP TABLE IF EXISTS does not work

When I try to run the command spark.sql("DROP TABLE IF EXISTS table_to_drop") and the table does not exist, I get the following error: AnalysisException: "Table or view 'table_to_drop' not found in database 'null';;\nDropTableCommand `table_to_drop...

  • 12422 Views
  • 4 replies
  • 0 kudos
Latest Reply
StevenWilliams
New Contributor II
  • 0 kudos

I agree about this being a usability bug. Documentation clearly states that if the optional flag "IF EXISTS" is provided that the statement will do nothing.https://docs.databricks.com/spark/latest/spark-sql/language-manual/drop-table.htmlDrop Table ...

  • 0 kudos
3 More Replies
rishigc
by New Contributor
  • 14098 Views
  • 1 replies
  • 0 kudos

Split a row into multiple rows based on a column value in Spark SQL

Hi, I am trying to split a record in a table to 2 records based on a column value. Please refer to the sample below. The input table displays the 3 types of Product and their price. Notice that for a specific Product (row) only its corresponding col...

  • 14098 Views
  • 1 replies
  • 0 kudos
Latest Reply
mathan_pillai
Valued Contributor
  • 0 kudos

Hi @rishigc You can use something like below. SELECT explode(arrays_zip(split(Product, '+'), split(Price, '+') ) as product_and_price from df or df.withColumn("product_and_price", explode(arrays_zip(split(Product, '+'), split(Price, '+'))).select( ...

  • 0 kudos
cfregly
by Contributor
  • 16982 Views
  • 15 replies
  • 0 kudos
  • 16982 Views
  • 15 replies
  • 0 kudos
Latest Reply
wildhogg
New Contributor II
  • 0 kudos

Well, just a little bit research, and i found this post below: Hopefully this will help. " registerTempTable() registerTempTable() creates an in-memory table that is scoped to the cluster in which it was created. The data is stored using Hive's high...

  • 0 kudos
14 More Replies
PranjalThapar
by New Contributor
  • 6779 Views
  • 4 replies
  • 0 kudos

Splitting Date into Year, Month and Day, with inconsistent delimiters

I am trying to split my Date Column which is a String Type right now into 3 columns Year, Month and Date. I use (PySpark): <code>split_date=pyspark.sql.functions.split(df['Date'], '-') df= df.withColumn('Year', split_date.getItem(0)) df= df.wit...

  • 6779 Views
  • 4 replies
  • 0 kudos
Latest Reply
youssefassouli
New Contributor II
  • 0 kudos

thank you so much that was halpful

  • 0 kudos
3 More Replies
shampa
by New Contributor
  • 4731 Views
  • 1 replies
  • 0 kudos

How can we compare two dataframes in spark scala to find difference between these 2 files, which column ?? and value ??.

I have two files and I created two dataframes prod1 and prod2 out of it.I need to find the records with column names and values that are not matching in both the dfs. id_sk is the primary key .all the cols are string datatype dataframe 1 (prod1) id_...

  • 4731 Views
  • 1 replies
  • 0 kudos
Latest Reply
manojlukhi
New Contributor II
  • 0 kudos

use full Outer Join in spark SQL

  • 0 kudos
senthilkumar
by New Contributor
  • 16829 Views
  • 1 replies
  • 0 kudos

How filter condition working in spark dataframe?

I have a table in hbase with 1 billions records.I want to filter the records based on certain condition (by date). For example: Dataframe.filter(col(date) === todayDate) Filter will be applied after all records from the table will be loaded into me...

  • 16829 Views
  • 1 replies
  • 0 kudos
Latest Reply
muk1
New Contributor II
  • 0 kudos

Hello @senthil kumar​ To pass external values to the filter (or where) transformations you can use the "lit" function in the following way:Dataframe.filter(col(date) == lit(todayDate))don´t know if that helps. Be careful with the schema infered by th...

  • 0 kudos
kelleyrw
by New Contributor II
  • 9464 Views
  • 7 replies
  • 0 kudos

Resolved! How do I register a UDF that returns an array of tuples in scala/spark?

I'm relatively new to Scala. In the past, I was able to do the following python: def foo(p1, p2): import datetime as dt dt.datetime(2014, 4, 17, 12, 34) result = [ (1, "1", 1.1, dt.datetime(2014, 4, 17, 1, 0)), (2, "2", 2...

0693f000007OoHdAAK
  • 9464 Views
  • 7 replies
  • 0 kudos
Latest Reply
__max
New Contributor III
  • 0 kudos

Hello, Just in case, here is an example for proposed solution above: import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions._ import org.apache.spark.sql.types._ val data = Seq(("A", Seq((3,4),(5,6),(7,10))), ("B", Seq((-1,...

  • 0 kudos
6 More Replies
dheeraj
by New Contributor II
  • 4978 Views
  • 3 replies
  • 0 kudos

How to calculate Percentile of column in a DataFrame in spark?

I am trying to calculate percentile of a column in a DataFrame? I cant find any percentile_approx function in Spark aggregation functions. For e.g. in Hive we have percentile_approx and we can use it in the following way hiveContext.sql("select per...

  • 4978 Views
  • 3 replies
  • 0 kudos
Latest Reply
amandaphy
New Contributor II
  • 0 kudos

You can try using df.registerTempTable("tmp_tbl") val newDF = sql(/ do something with tmp_tbl /)// and continue using newDF Learn More

  • 0 kudos
2 More Replies
johnmcauley
by New Contributor II
  • 9728 Views
  • 2 replies
  • 0 kudos

How do I escape a query string in Spark SQL?

Hey all, I am trying to filter on a string but the string has a single quote - how do I escape the string in Scala? I have tried an old version of StringEscapeUtils but no luck. Sorry if a silly question - new to Scala.import org.apache.commons.lan...

  • 9728 Views
  • 2 replies
  • 0 kudos
Latest Reply
antoniosarco
New Contributor II
  • 0 kudos

generally when u deal with apostrophe u replace the the single quote(') with (''). More about....handling single quotes Antonio

  • 0 kudos
1 More Replies
cfregly
by Contributor
  • 10501 Views
  • 1 replies
  • 0 kudos
  • 10501 Views
  • 1 replies
  • 0 kudos
Latest Reply
cfregly
Contributor
  • 0 kudos

Sorted DataIf your data is sorted using either sort() or ORDER BY, these operations will be deterministic and return either the 1st element using first()/head() or the top-n using head(n)/take(n).show()/show(n) return Unit (void) and will print up to...

  • 0 kudos
Labels