Topics with Label: Data Ingestion & connectivity

Forum Posts

Sorted by:

by Nik • New Contributor III

09-04-2018 10:03:05 AM

21784 Views
19 replies
0 kudos

write from a Dataframe to a CSV file, CSV file is blank

Hi i am reading from a text file from a blob val sparkDF = spark.read.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", file_delimiter) .load(wasbs_string + "/" + PR_FileName) Then i test my Datafra...

Data Engineering

21784 Views
19 replies
0 kudos

09-04-2018 10:03:05 AM

View Replies

Latest Reply

nl09
New Contributor II

06-25-2020 9:15:52 AM

0 kudos

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same. fpath=output+'/'+'temp' def file_exists(path): try: dbutils.fs.ls(path) return...

0 kudos

06-25-2020 9:15:52 AM

18 More Replies

by ubsingh • New Contributor II

11-07-2019 3:44:50 AM

13320 Views
3 replies
1 kudos

Resolved! I want to create a function in azure Databricks notebook to send a email, based on a filter. Any leads are appriciated.

I have no idea from where to start

Data Engineering

13320 Views
3 replies
1 kudos

11-07-2019 3:44:50 AM

View Replies

Latest Reply

ubsingh
New Contributor II

11-13-2019 1:05:26 AM

1 kudos

Thanks for you help @leedabee. I will go through second option, First one is not applicable in my case.

1 kudos

11-13-2019 1:05:26 AM

2 More Replies

by Seenu45 • New Contributor II

10-29-2019 7:31:50 AM

7839 Views
3 replies
1 kudos

Resolved! JavaPackage' object is not callable :: Mlean

Hi Folks, We are working on production Databricks project using Mleap. when run below code on databricks, it throws error like " 'JavaPackage' object is not callable" code :import mleap.pyspark from mleap.pyspark.spark_support import SimpleSparkSer...

Data Engineering

7839 Views
3 replies
1 kudos

10-29-2019 7:31:50 AM

View Replies

Latest Reply

Seenu45
New Contributor II

10-30-2019 1:41:57 AM

1 kudos

Thanks syamspr. it is working now.

1 kudos

10-30-2019 1:41:57 AM

2 More Replies

by Van-DuyetLe • New Contributor

03-21-2018 12:38:32 AM

49337 Views
5 replies
3 kudos

What's the difference between Interactive Clusters and Job Cluster?

I am new to databricks. I would like to know what is the difference between Interactive Clusters and Job Cluster? There are no official document now.

Data Engineering

49337 Views
5 replies
3 kudos

03-21-2018 12:38:32 AM

View Replies

Latest Reply

Forum_Admin
Databricks Employee

01-19-2020 6:53:28 AM

3 kudos

Sports news Football news International football news Football news Thai football news, Thai football Follow news, know sports news at Siamsportnews

3 kudos

01-19-2020 6:53:28 AM

4 More Replies

by NandhaKumar • New Contributor II

11-14-2019 2:34:57 AM

7931 Views
3 replies
0 kudos

How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .

I have created a databricks in azure. I have created a cluster for python 3. I am creating a job using spark-submit parameters. How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in ...

Data Engineering

7931 Views
3 replies
0 kudos

11-14-2019 2:34:57 AM

View Replies

Latest Reply

shyam_9
Databricks Employee

11-17-2019 9:46:20 PM

0 kudos

Hi @Nandha Kumar,please go through the below docs to pass python files as job,https://docs.databricks.com/dev-tools/api/latest/jobs.html#sparkpythontask

0 kudos

11-17-2019 9:46:20 PM

2 More Replies

by AnandJ_Kadhi • New Contributor II

08-18-2017 5:47:44 AM

8604 Views
2 replies
1 kudos

Handle comma inside cell of CSV

We are using spark-csv_2.10 > version 1.5.0 and reading the csv file column which contains comma " , " as one of the character. The problem we are facing is like that it treats the rest of line after the comma as new column and data is not interpre...

Data Engineering

8604 Views
2 replies
1 kudos

08-18-2017 5:47:44 AM

View Replies

Latest Reply

User16857282152
Databricks Employee

11-01-2019 10:27:53 AM

1 kudos

Take a look here for options, http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframereader#pyspark.sql.DataFrameReader.csv If a csv file has commas then the tradition is to quote the string that contains the comma, In ...

1 kudos

11-01-2019 10:27:53 AM

1 More Replies

by rba76 • New Contributor

10-21-2019 1:42:56 AM

22203 Views
2 replies
0 kudos

Python spark.read.text Path does not exist

Dear all, I want to read files with python from a storage account. I followed this instruction https://docs.microsoft.com/en-us/azure/azure-databricks/store-secrets-azure-key-vault. This is my python code: dbutils.fs.mount(source = "wasbs://contain...

Data Engineering

22203 Views
2 replies
0 kudos

10-21-2019 1:42:56 AM

View Replies

Latest Reply

PRADEEPCHEEKATL
New Contributor II

10-25-2019 3:30:52 AM

0 kudos

@rba76 Make sure helloworld.txt file exists in the container1 folderI'm able to view the text file using the same commands as follows:Mount Blob Storage:dbutils.fs.mount( source = "wasbs://sampledata@azure.blob.core.windows.net/Azure", mount_po...

0 kudos

10-25-2019 3:30:52 AM

1 More Replies

by desai_n_3 • New Contributor II

10-03-2019 1:41:05 AM

18875 Views
6 replies
0 kudos

Cannot Convert Column to Bool Error - When Converting dataframe column which is in string to date type in python

Hi All, I am trying to convert a dataframe column which is in the format of string to date type format yyyy-MM-DD? I have written a sql query and stored it in dataframe. df3 = sqlContext.sql(sqlString2) df3.withColumn(df3['CalDay'],pd.to_datetime(df...

Data Engineering

18875 Views
6 replies
0 kudos

10-03-2019 1:41:05 AM

View Replies

Latest Reply

JoshuaJames
New Contributor II

10-24-2019 11:06:21 PM

0 kudos

Registered to post this so forgive the formatting nightmare This is a python databricks script function that allows you to convert from string to datetime or date and utilising coalescefrom pyspark.sql.functions import coalesce, to_date def to_dat...

0 kudos

10-24-2019 11:06:21 PM

5 More Replies

by akj2784 • New Contributor II

09-19-2019 12:05:10 AM

10259 Views
5 replies
0 kudos

How to create a dataframe with the files from S3 bucket

I have connected my S3 bucket from databricks. Using the following command : import urllib import urllib.parse ACCESS_KEY = "Test" SECRET_KEY = "Test" ENCODED_SECRET_KEY = urllib.parse.quote(SECRET_KEY, "") AWS_BUCKET_NAME = "Test" MOUNT_NAME = "...

Data Engineering

10259 Views
5 replies
0 kudos

09-19-2019 12:05:10 AM

View Replies

Latest Reply

shyam_9
Databricks Employee

09-19-2019 12:13:35 AM

0 kudos

Hi @akj2784,Please go through Databricks documentation on working with files in S3,https://docs.databricks.com/spark/latest/data-sources/aws/amazon-s3.html#mount-s3-buckets-with-dbfs

0 kudos

09-19-2019 12:13:35 AM

4 More Replies

by AdityaDeshpande • New Contributor II

08-25-2019 4:47:41 AM

6758 Views
2 replies
0 kudos

How to maintain Primary Key Column in Databricks Delta Multi Cluster environment

I am trying to replicate the SQL DB like feature of maintaining the Primary Keys in Databrciks Delta approach where the data is being written to Blob Storage such as ADLS2 oe AWS S3. I want a Auto Incremented Primary key feature using Databricks Del...

Data Engineering

6758 Views
2 replies
0 kudos

08-25-2019 4:47:41 AM

View Replies

Latest Reply

girivaratharaja
New Contributor III

08-26-2019 7:13:23 AM

0 kudos

Hi @Aditya Deshpande There is no locking mechanism of PK in Delta. You can use row_number() function on the df and save using delta and do a distinct() before the write.

0 kudos

08-26-2019 7:13:23 AM

1 More Replies

by Maser_AZ • New Contributor II

08-21-2019 3:15:22 PM

19541 Views
1 replies
0 kudos

NameError: name 'col' is not defined

I m executing the below code and using Pyhton in notebook and it appears that the col() function is not getting recognized . I want to know if the col() function belongs to any specific Dataframe library or Python library .I dont want to use pyspark...

Data Engineering

19541 Views
1 replies
0 kudos

08-21-2019 3:15:22 PM

View Replies

Latest Reply

MOHAN_KUMARL_N
New Contributor II

08-22-2019 2:18:12 AM

0 kudos

@mudassar45@gmail.com as the document describe generic column not yet associated. Please refer the below code. display(peopleDF.select("firstName").filter("firstName = 'An'"))

0 kudos

08-22-2019 2:18:12 AM

by Tamara • New Contributor III

11-03-2015 4:01:50 AM

18459 Views
8 replies
2 kudos

Resolved! Can I connect to a MS SQL server table in Databricks account?

I'd like to access a table on a MS SQL Server (Microsoft). Is it possible from Databricks? To my understanding, the syntax is something like this (in a SQL Notebook): CREATE TEMPORARY TABLE jdbcTable USING org.apache.spark.sql.jdbc OPTIONS ( url...

Data Engineering

18459 Views
8 replies
2 kudos

11-03-2015 4:01:50 AM

View Replies

Latest Reply

JohnSmith091
New Contributor II

11-27-2018 1:19:31 AM

2 kudos

Thanks for the trick that you have shared with us. I am really amazed to use this informational post. If you are facing MacBook error like MacBook Pro won't turn on black screen then click the link.

2 kudos

11-27-2018 1:19:31 AM

7 More Replies

by bkr • New Contributor

06-08-2018 6:29:45 AM

7186 Views
1 replies
0 kudos

How to move files of same extension in databricks files system?

I am facing file not found exception when i am trying to move the file with * in DBFS. Here both source and destination directories are in DBFS. I have the source file named "test_sample.csv" available in dbfs directory and i am using the command li...

Data Engineering

7186 Views
1 replies
0 kudos

06-08-2018 6:29:45 AM

View Replies

Latest Reply

ricardo_portill
Databricks Employee

06-08-2018 9:45:33 AM

0 kudos

@bkr, you can reference the file name using dbutils and then pass this to the move command. Here's an example for this in Scala: val fileNm = dbutils.fs.ls("/usr/krishna/sample").map(_.name).filter(r => r.startsWith("test"))(0) val fileLoc = "dbfs:/...

0 kudos

06-08-2018 9:45:33 AM

by XinZodl • New Contributor III

11-03-2017 12:01:16 AM

22872 Views
3 replies
1 kudos

Resolved! How to parse a file with newline character, escaped with \ and not quoted

Hi! I am facing an issue when reading and parsing a CSV file. Some records have a newline symbol, "escaped" by a \, and that record not being quoted. The file might look like this: Line1field1;Line1field2.1 \ Line1field2.2;Line1field3; Line2FIeld1;...

Data Engineering

22872 Views
3 replies
1 kudos

11-03-2017 12:01:16 AM

View Replies

Latest Reply

XinZodl
New Contributor III

11-07-2017 11:59:09 PM

1 kudos

Solution is "sparkContext.wholeTextFiles"

1 kudos

11-07-2017 11:59:09 PM

2 More Replies

by letsflykite • New Contributor II

07-31-2015 10:25:03 PM

21052 Views
2 replies
1 kudos

How to increase spark.kryoserializer.buffer.max

when I join two dataframes, I got the following error. org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1 Serialization trace: values (org.apache.spark.sql.catalyst.expressions.GenericRow) otherEle...

Data Engineering

21052 Views
2 replies
1 kudos

07-31-2015 10:25:03 PM

View Replies

Latest Reply

Jose_Maria_Tala
New Contributor II

08-03-2017 6:34:42 AM

1 kudos

val conf = new SparkConf() ... conf.set("spark.kryoserializer.buffer.max.mb", "512") ...

1 kudos

08-03-2017 6:34:42 AM

1 More Replies