Data Engineering

Forum Posts

Sorted by:

by bkr • New Contributor

06-08-2018 6:29:45 AM

6850 Views
1 replies
0 kudos

How to move files of same extension in databricks files system?

I am facing file not found exception when i am trying to move the file with * in DBFS. Here both source and destination directories are in DBFS. I have the source file named "test_sample.csv" available in dbfs directory and i am using the command li...

Data Engineering

6850 Views
1 replies
0 kudos

06-08-2018 6:29:45 AM

View Replies

Latest Reply

ricardo_portill
New Contributor III

06-08-2018 9:45:33 AM

0 kudos

@bkr, you can reference the file name using dbutils and then pass this to the move command. Here's an example for this in Scala: val fileNm = dbutils.fs.ls("/usr/krishna/sample").map(_.name).filter(r => r.startsWith("test"))(0) val fileLoc = "dbfs:/...

0 kudos

06-08-2018 9:45:33 AM

by rlgarris • Databricks Employee

02-10-2016 10:07:06 AM

9777 Views
5 replies
0 kudos

Resolved! How do I get a cartesian product of a huge dataset?

A cartesian product is a common operation to get the cross product of two tables. For example, say you have a list of customers and a list of your product catalog and want to get the cross product of all customer - product combinations. Cartesian pr...

Data Engineering

9777 Views
5 replies
0 kudos

02-10-2016 10:07:06 AM

View Replies

Latest Reply

Forum_Admin
Contributor

05-10-2018 2:12:21 AM

0 kudos

Hi buddies, it is great written piece entirely defined, continue the good work constantly.

0 kudos

05-10-2018 2:12:21 AM

4 More Replies

by Mahesha999 • New Contributor II

04-27-2018 5:52:00 AM

6153 Views
3 replies
0 kudos

Resolving NoClassDefFoundError: org/apache/spark/Logging exception

I was trying out hbase-spark connector. To start with, I am trying out this code. My pom dependencies are: <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version...

Data Engineering

6153 Views
3 replies
0 kudos

04-27-2018 5:52:00 AM

View Replies

Latest Reply

User16301467518
New Contributor II

04-27-2018 7:36:40 AM

0 kudos

The alpha of hbase-spark you're using depends on Spark 1.6 -- see hbase-spark/pom.xml:L33 -- so you'll probably have to stick with 1.6 if you want to use that published jar. For reasons I don't understand hbase-spark was removed in the last couple o...

0 kudos

04-27-2018 7:36:40 AM

2 More Replies

by semihcandoken • New Contributor

08-18-2016 9:29:07 PM

17681 Views
4 replies
0 kudos

How to convert column type from str to date in sparksql when the format is not yyyy-mm-dd?

I imported a large csv file into databricks as a table. I am able to run sql queries on it in a databricks notebook. In my table, I have a column that contains date information in the mm/dd/yyyy format : 12/29/2015 12/30/2015 etc... Databricks impo...

Data Engineering

17681 Views
4 replies
0 kudos

08-18-2016 9:29:07 PM

View Replies

Latest Reply

ShubhamGupta187
New Contributor II

04-19-2018 9:37:52 PM

0 kudos

@josephpconley would it be safe to cast a column that contains null values?

0 kudos

04-19-2018 9:37:52 PM

3 More Replies

by Young_TackPark • New Contributor

01-06-2017 11:22:20 PM

19959 Views
2 replies
0 kudos

upload local files into DBFS

I am using Databricks Notebook Community Edition (2.36) and want to upload a local file into DBFS. Is there any simple Hadoop commands like "hadoop fs -put ..."? Any help would be appreciated.

Data Engineering

19959 Views
2 replies
0 kudos

01-06-2017 11:22:20 PM

View Replies

Latest Reply

sushrutt_12
New Contributor II

03-15-2018 9:07:08 AM

0 kudos

Python 2.7:import urllib.request urllib.urlretrieve("https://github.com/sushrutt12/DataSets/blob/master/final_chris.zip","/tmp/chris_data.zip") dbutils.fs.mv("file:/tmp/chris_data.zip", "dbfs:/data/chris_data.zip")Python 3.x: import urllib.requesturl...

0 kudos

03-15-2018 9:07:08 AM

1 More Replies

by ArvindShyamsund • New Contributor II

11-28-2017 8:54:40 PM

12456 Views
12 replies
0 kudos

Resolved! Custom line separator

I see that https://github.com/apache/spark/pull/18581 will enable defining custom Line Separators for many sources, including CSV. Apart from waiting on this PR to make it into the main Databricks runtime, is there any other alternative to support d...

Data Engineering

12456 Views
12 replies
0 kudos

11-28-2017 8:54:40 PM

View Replies

Latest Reply

DanielTomes
New Contributor II

03-07-2018 2:46:50 PM

0 kudos

You can use newAPIHadoopFile SCALA import org.apache.hadoop.io.LongWritable import org.apache.hadoop.io.Text import org.apache.hadoop.conf.Configuration import org.apache.hadoop.mapreduce.lib.input.TextInputFormat val conf = new Configuration conf.s...

0 kudos

03-07-2018 2:46:50 PM

11 More Replies

by max522over • New Contributor II

06-09-2016 1:22:08 PM

18393 Views
3 replies
0 kudos

Resolved! I've set the partition mode to nonstrict in hive but spark is not seeing it

I've got a table I want to add some data to and it's partitoned. I want to use dynamic partitioning but I get this error org.apache.spark.SparkException: Dynamic partition strict mode requires at least one static partition column. To turn this off ...

Data Engineering

18393 Views
3 replies
0 kudos

06-09-2016 1:22:08 PM

View Replies

Latest Reply

max522over
New Contributor II

06-13-2016 3:53:56 PM

0 kudos

I got it working. This was exactly what I needed. Thank you @Peyman Mohajerian

0 kudos

06-13-2016 3:53:56 PM

2 More Replies

by PrasadGaikwad • New Contributor

12-03-2017 3:19:34 AM

11284 Views
0 replies
0 kudos

TypeError: Column is not iterable when using more than one columns in withColumn()

I am trying to find quarter start date from a date column. I get the expected result when i write it using selectExpr() but when i add the same logic in .withColumn() i get TypeError: Column is not iterableI am using a workaround as follows workarou...

Data Engineering

11284 Views
0 replies
0 kudos

12-03-2017 3:19:34 AM

by kkarthik • New Contributor

11-13-2017 9:09:37 PM

6438 Views
1 replies
0 kudos

I want to split a dataframe with date range 1 week, with each week data in different column.

DF Q Date(yyyy-mm-dd) q1 2017-10-01 q2 2017-10-03 q1 2017-10-09 q3 2017-10-06 q2 2017-10-01 q1 2017-10-13 Q1 2017-10-02 Q3 2017-10-21 Q4 2017-10-17 Q5 2017-10-20 Q4 2017-10-31 Q2 2017-10-27 Q5 2017-10-01 Dataframe: ...

Data Engineering

6438 Views
1 replies
0 kudos

11-13-2017 9:09:37 PM

View Replies

Latest Reply

User16857281974
Contributor

11-28-2017 4:24:00 PM

0 kudos

It should just be a matter of applying the correct set of transformations:You can start by adding the week-of-year to each record with the command pyspark.sql.functions.weekofyear(..) and name it something like weekOfYear. See https://spark.apache.or...

0 kudos

11-28-2017 4:24:00 PM

by SethuSrinivasan • New Contributor II

11-25-2017 11:32:22 AM

36864 Views
0 replies
2 kudos

Requesting support for "SELECT TOP n from Table"

In notebook, It looks like if I need to select top N rows, I can rely on "LIMIT" keyword. It would be nice if you can support "TOP" as well The current approach to select 10 rows: select * from table1 LIMIT 10 Requesting TOP support: SELECT TOP 10 *...

Data Engineering

36864 Views
0 replies
2 kudos

11-25-2017 11:32:22 AM

by XinZodl • New Contributor III

11-03-2017 12:01:16 AM

19125 Views
3 replies
1 kudos

Resolved! How to parse a file with newline character, escaped with \ and not quoted

Hi! I am facing an issue when reading and parsing a CSV file. Some records have a newline symbol, "escaped" by a \, and that record not being quoted. The file might look like this: Line1field1;Line1field2.1 \ Line1field2.2;Line1field3; Line2FIeld1;...

Data Engineering

19125 Views
3 replies
1 kudos

11-03-2017 12:01:16 AM

View Replies

Latest Reply

XinZodl
New Contributor III

11-07-2017 11:59:09 PM

1 kudos

Solution is "sparkContext.wholeTextFiles"

1 kudos

11-07-2017 11:59:09 PM

2 More Replies

by kelleyrw • New Contributor II

06-30-2016 1:28:05 PM

13920 Views
7 replies
0 kudos

Resolved! How do I register a UDF that returns an array of tuples in scala/spark?

I'm relatively new to Scala. In the past, I was able to do the following python: def foo(p1, p2): import datetime as dt dt.datetime(2014, 4, 17, 12, 34) result = [ (1, "1", 1.1, dt.datetime(2014, 4, 17, 1, 0)), (2, "2", 2...

Data Engineering

13920 Views
7 replies
0 kudos

06-30-2016 1:28:05 PM

View Replies

Latest Reply

__max
New Contributor III

10-18-2017 5:40:07 PM

0 kudos

Hello, Just in case, here is an example for proposed solution above: import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions._ import org.apache.spark.sql.types._ val data = Seq(("A", Seq((3,4),(5,6),(7,10))), ("B", Seq((-1,...

0 kudos

10-18-2017 5:40:07 PM

6 More Replies

by samalexg • New Contributor III

09-03-2015 9:24:07 PM

21953 Views
13 replies
1 kudos

How to add environment variable

Instead of setting the AWS accessKey and secret Key in hadoopConfiguration, I would like to add those in environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. How can I do that in databricks?

Data Engineering

21953 Views
13 replies
1 kudos

09-03-2015 9:24:07 PM

View Replies

Latest Reply

jric
New Contributor II

10-16-2017 4:16:15 PM

1 kudos

It is possible! I was able to confirm that the following post's "Best" answer works: https://forums.databricks.com/questions/11116/how-to-set-an-environment-variable.htmlFYI for @Miklos Christine and @Mike Trewartha

1 kudos

10-16-2017 4:16:15 PM

12 More Replies

by KiranRastogi • New Contributor

05-07-2017 11:55:01 PM

43839 Views
2 replies
2 kudos

Pandas dataframe to a table

I want to write a pandas dataframe to a table, how can I do this ? Write command is not working, please help.

Data Engineering

43839 Views
2 replies
2 kudos

05-07-2017 11:55:01 PM

View Replies

Latest Reply

amy_wang
New Contributor II

09-27-2017 11:13:12 AM

2 kudos

Hey Kiran, Just taking a stab in the dark but do you want to convert the Pandas DataFrame to a Spark DataFrame and then write out the Spark DataFrame as a non-temporary SQL table? import pandas as pd ## Create Pandas Frame pd_df = pd.DataFrame({u'20...

2 kudos

09-27-2017 11:13:12 AM

1 More Replies

by cfregly • Contributor

04-30-2015 2:58:41 PM

6036 Views
4 replies
0 kudos

How do I replace nulls with 0's in a DataFrame?

Data Engineering

6036 Views
4 replies
0 kudos

04-30-2015 2:58:41 PM

View Replies

Latest Reply

GauravKhare
New Contributor II

09-04-2017 5:21:27 AM

0 kudos

df.na.replace(df.columns,Map("" -> "0")).show() // to convert from blank strings to zero

0 kudos

09-04-2017 5:21:27 AM

3 More Replies

Databricks Community

Forum Posts

How to move files of same extension in databricks files system?

Resolved! How do I get a cartesian product of a huge dataset?

Resolving NoClassDefFoundError: org/apache/spark/Logging exception

How to convert column type from str to date in sparksql when the format is not yyyy-mm-dd?

upload local files into DBFS

Resolved! Custom line separator

Resolved! I've set the partition mode to nonstrict in hive but spark is not seeing it

TypeError: Column is not iterable when using more than one columns in withColumn()

I want to split a dataframe with date range 1 week, with each week data in different column.

Requesting support for "SELECT TOP n from Table"

Resolved! How to parse a file with newline character, escaped with \ and not quoted

Resolved! How do I register a UDF that returns an array of tuples in scala/spark?

How to add environment variable

Pandas dataframe to a table

How do I replace nulls with 0's in a DataFrame?

Join Us as a Local Community Builder!

Data profiling monitoring with foreign catalog

How to invoke Databricks AI Assistant from a noteb...

Issue with Lakebridge transpile installation – SSL...

Spark JDBC Netsuite error - SQLSyntaxErrorExcepti...

Syncing lakebase table to delta table