cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RaghuMundru
by New Contributor III
  • 26919 Views
  • 15 replies
  • 0 kudos

Resolved! I am running simple count and I am getting an error

Here is the error that I am getting when I run the following query statement=sqlContext.sql("SELECT count(*) FROM ARDATA_2015_09_01").show() ---------------------------------------------------------------------------Py4JJavaError Traceback (most rec...

  • 26919 Views
  • 15 replies
  • 0 kudos
Latest Reply
muchave
New Contributor II
  • 0 kudos

192.168.o.1 is a private IP address used to login the admin panel of a router. 192.168.l.l is the host address to change default router settings.

  • 0 kudos
14 More Replies
pepevo
by New Contributor III
  • 10095 Views
  • 10 replies
  • 0 kudos

Resolved! How to convert column type from decimal to date in sparksql

I need to convert column type from decimal to date in sparksql when the format is not yyyy-mm-dd? A table contains column data declared as decimal (38,0) and data is in yyyymmdd format and I am unable to run sql queries on it in databrick notebook. ...

  • 10095 Views
  • 10 replies
  • 0 kudos
Latest Reply
pepevo
New Contributor III
  • 0 kudos

thank you Tom. I made it work already.

  • 0 kudos
9 More Replies
User16301467532
by New Contributor II
  • 16871 Views
  • 9 replies
  • 1 kudos

How can I change the parquet compression algorithm from gzip to something else?

Spark, by default, uses gzip to store parquet files. I would like to change the compression algorithm from gzip to snappy or lz4.

  • 16871 Views
  • 9 replies
  • 1 kudos
Latest Reply
ZhenZeng
New Contributor II
  • 1 kudos

spark.sql("set spark.sql.parquet.compression.codec=gzip");

  • 1 kudos
8 More Replies
MikeK_
by New Contributor II
  • 13081 Views
  • 1 replies
  • 0 kudos

Resolved! SQL variables in a notebook

Hi, In an SQL notebook, using this link: https://docs.databricks.com/spark/latest/spark-sql/language-manual/set.html I managed to figure out to set values and how to get the value. SET my_val=10; //saves the value 10 for key my_val SET my_val; //dis...

  • 13081 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @Mike K.., you can do this with widgets and getArgument. Here's a small example of what that might look like: https://community.databricks.com/s/feed/0D53f00001HKHZfCAP

  • 0 kudos
tripplehay777
by New Contributor
  • 10459 Views
  • 1 replies
  • 0 kudos

How can I create a Table from a CSV file with first column with data in dictionary format (JSON like)?

I have a csv file with the first column containing data in dictionary form (keys: value). [see below] I tried to create a table by uploading the csv file directly to databricks but the file can't be read. Is there a way for me to flatten or conver...

0693f000007OoIpAAK
  • 10459 Views
  • 1 replies
  • 0 kudos
Latest Reply
MaxStruever
New Contributor II
  • 0 kudos

This is apparently a known issue, databricks has their own csv format handler which can handle this https://github.com/databricks/spark-csv SQL API CSV data source for Spark can infer data types: CREATE TABLE cars USING com.databricks.spark.csv OP...

  • 0 kudos
martinch
by New Contributor II
  • 8411 Views
  • 4 replies
  • 0 kudos

DROP TABLE IF EXISTS does not work

When I try to run the command spark.sql("DROP TABLE IF EXISTS table_to_drop") and the table does not exist, I get the following error: AnalysisException: "Table or view 'table_to_drop' not found in database 'null';;\nDropTableCommand `table_to_drop...

  • 8411 Views
  • 4 replies
  • 0 kudos
Latest Reply
StevenWilliams
New Contributor II
  • 0 kudos

I agree about this being a usability bug. Documentation clearly states that if the optional flag "IF EXISTS" is provided that the statement will do nothing.https://docs.databricks.com/spark/latest/spark-sql/language-manual/drop-table.htmlDrop Table ...

  • 0 kudos
3 More Replies
rishigc
by New Contributor
  • 12132 Views
  • 1 replies
  • 0 kudos

Split a row into multiple rows based on a column value in Spark SQL

Hi, I am trying to split a record in a table to 2 records based on a column value. Please refer to the sample below. The input table displays the 3 types of Product and their price. Notice that for a specific Product (row) only its corresponding col...

  • 12132 Views
  • 1 replies
  • 0 kudos
Latest Reply
mathan_pillai
Valued Contributor
  • 0 kudos

Hi @rishigc You can use something like below. SELECT explode(arrays_zip(split(Product, '+'), split(Price, '+') ) as product_and_price from df or df.withColumn("product_and_price", explode(arrays_zip(split(Product, '+'), split(Price, '+'))).select( ...

  • 0 kudos
dan11
by New Contributor II
  • 2384 Views
  • 4 replies
  • 1 kudos

sql delete?

<pre> Hello databricks people, I started working with databricks today. I have a sql script which I developed with sqlite3 on a laptop. I want to port the script to databricks. I started with two sql statements: select count(prop_id) from prop0; del...

  • 2384 Views
  • 4 replies
  • 1 kudos
Latest Reply
Bill_Chambers
Contributor II
  • 1 kudos

Hey Dan, good to hear you're getting started with Databricks. This is not a limitation of Databricks it's a restriction built into Spark itself. Spark is not a data store, it's a distributed computation framework. Therefore deleting data would be un...

  • 1 kudos
3 More Replies
Tamara
by New Contributor III
  • 8772 Views
  • 8 replies
  • 1 kudos

Resolved! Can I connect to a MS SQL server table in Databricks account?

I'd like to access a table on a MS SQL Server (Microsoft). Is it possible from Databricks? To my understanding, the syntax is something like this (in a SQL Notebook): CREATE TEMPORARY TABLE jdbcTable USING org.apache.spark.sql.jdbc OPTIONS ( url...

  • 8772 Views
  • 8 replies
  • 1 kudos
Latest Reply
JohnSmith091
New Contributor II
  • 1 kudos

Thanks for the trick that you have shared with us. I am really amazed to use this informational post. If you are facing MacBook error like MacBook Pro won't turn on black screen then click the link.

  • 1 kudos
7 More Replies
semihcandoken
by New Contributor
  • 13642 Views
  • 4 replies
  • 0 kudos

How to convert column type from str to date in sparksql when the format is not yyyy-mm-dd?

I imported a large csv file into databricks as a table. I am able to run sql queries on it in a databricks notebook. In my table, I have a column that contains date information in the mm/dd/yyyy format : 12/29/2015 12/30/2015 etc... Databricks impo...

  • 13642 Views
  • 4 replies
  • 0 kudos
Latest Reply
ShubhamGupta187
New Contributor II
  • 0 kudos

@josephpconley would it be safe to cast a column that contains null values?

  • 0 kudos
3 More Replies
max522over
by New Contributor II
  • 12497 Views
  • 3 replies
  • 0 kudos

Resolved! I've set the partition mode to nonstrict in hive but spark is not seeing it

I've got a table I want to add some data to and it's partitoned. I want to use dynamic partitioning but I get this error org.apache.spark.SparkException: Dynamic partition strict mode requires at least one static partition column. To turn this off ...

  • 12497 Views
  • 3 replies
  • 0 kudos
Latest Reply
max522over
New Contributor II
  • 0 kudos

I got it working. This was exactly what I needed. Thank you @Peyman Mohajerian​ 

  • 0 kudos
2 More Replies
dan11
by New Contributor II
  • 3284 Views
  • 1 replies
  • 1 kudos

sql: how to convert datatype of column?

Bricklayers, I want to port this sql statement from sqlite to databricks: select cast(myage as number) as my_integer_age from ages; Does databricks allow me to do something like this?

  • 3284 Views
  • 1 replies
  • 1 kudos
Latest Reply
raela
New Contributor III
  • 1 kudos

@dan11 We don't support number in Spark SQL. Try using int, double, float, and your query should be fine. To run SQL in a notebook, just prepend any cell with %sql. %sql select cast(myage as double) as my_integer_age from ages;

  • 1 kudos
Anonymous
by Not applicable
  • 10108 Views
  • 2 replies
  • 0 kudos

How can I use display() in a python notebook with pyspark.sql.Row Objects, e.g. after calling the first() operation on a DataFrame?

I'm trying to display() the results from calling first() on a DataFrame, but display() doesn't work with pyspark.sql.Row objects. How can I display this result?

  • 10108 Views
  • 2 replies
  • 0 kudos
Latest Reply
dnchari
New Contributor II
  • 0 kudos

Use take()

  • 0 kudos
1 More Replies
Labels