Data Engineering

Forum Posts

Sorted by:

by RantoB • Valued Contributor

10-29-2021 4:08:48 AM

24319 Views
8 replies
7 kudos

Resolved! read csv directly from url with pyspark

I would like to load a csv file directly to a spark dataframe in Databricks. I tried the following code :url = "https://opendata.reseaux-energies.fr/explore/dataset/eco2mix-national-tr/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_fo...

Data Engineering

24319 Views
8 replies
7 kudos

10-29-2021 4:08:48 AM

View Replies

Latest Reply

anwangari
New Contributor II

12-10-2024 1:41:13 AM

7 kudos

Hello it's end of 2024 and I still have this issue with python. As mentioned sc method nolonger works. Also, working with volumes within "/databricks/driver/" is not supported in Apache Spark.ALTERNATIVE SOLUTION: Use requests to download the file fr...

7 kudos

12-10-2024 1:41:13 AM

7 More Replies

by Venky • New Contributor III

10-26-2021 1:40:50 AM

87132 Views
18 replies
19 kudos

Resolved! i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

i am trying to read csv file using databricks, i am getting error like ......FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/world_bank.csv'

Data Engineering

87132 Views
18 replies
19 kudos

10-26-2021 1:40:50 AM

View Replies

Latest Reply

Alexis
New Contributor III

05-24-2022 11:52:02 PM

19 kudos

Hiyou can try: my_df = spark.read.format("csv") .option("inferSchema","true") # to get the types from your data .option("sep",",") # if your file is using "," as separator .option("header","true") # if you...

19 kudos

05-24-2022 11:52:02 PM

17 More Replies

by bchaubey • Contributor II

01-05-2022 12:45:24 AM

42080 Views
8 replies
6 kudos

To read data from Azure Storage

Hi Team,May i know how to read Azure storage data in Databricks through Python.

Data Engineering

42080 Views
8 replies
6 kudos

01-05-2022 12:45:24 AM

View Replies

Latest Reply

bchaubey
Contributor II

01-10-2022 6:22:05 AM

6 kudos

@Kaniz Fatma need full syllabus of Azure Databricks

6 kudos

01-10-2022 6:22:05 AM

7 More Replies

by Jayanth746 • New Contributor III

11-07-2022 7:04:31 PM

17740 Views
9 replies
4 kudos

Kafka unable to read client.keystore.jks.

Below is the error we have received when trying to read the stream Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore /dbfs/FileStore/Certs/client.keystore.jksCaused by: java.nio.file.NoSuchFileException: /dbfs...

Data Engineering

17740 Views
9 replies
4 kudos

11-07-2022 7:04:31 PM

View Replies

Latest Reply

mwoods
New Contributor III

09-22-2023 12:13:13 PM

4 kudos

Ok, scrub that - the problem in my case was that I was using the 14.0 databricks runtime, which appears to have a bug relating to abfss paths here. Switching back to the 13.3 LTS release resolved it for me. So if you're in the same boat finding abfss...

4 kudos

09-22-2023 12:13:13 PM

8 More Replies

by CrisCampos • New Contributor II

11-22-2022 1:07:43 PM

3743 Views
1 replies
1 kudos

How to load a "pickle/joblib" file on Databricks

Hi Community, I am trying to load a joblib on Databricks, but doesn't seems to be working.Getting an error message: "Incompatible format detected" Any idea of how to load this type of file on db?Thanks!

Data Engineering

3743 Views
1 replies
1 kudos

11-22-2022 1:07:43 PM

View Replies

Latest Reply

tapash-db
Databricks Employee

08-07-2023 11:36:02 AM

1 kudos

You can import joblib/joblibspark package to load joblib files

1 kudos

08-07-2023 11:36:02 AM

by carlosjrestr • New Contributor III

06-21-2023 11:55:35 AM

3527 Views
1 replies
1 kudos

Does Unity Catalog on Azure require premium blob storage tier?

from the docs I read : Create a storage container where the metastore’s managed table data will be stored.This storage container must be in a Premium performance Azure Data Lake Storage Gen2 account in the same region as the workspaces you want to us...

Data Engineering

3527 Views
1 replies
1 kudos

06-21-2023 11:55:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-21-2023 11:37:08 PM

1 kudos

Hi @Carlos Restrepo We haven't heard from you since the last response from @Kaniz Fatma , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to o...

1 kudos

06-21-2023 11:37:08 PM

by Tracy_ • New Contributor II

01-31-2023 9:16:53 PM

11957 Views
5 replies
0 kudos

Incorrect reading csv format with inferSchema

Hi All,There is a CSV with a column ID (format: 8-digits & "D" at the end).When trying to read a csv with .option("inferSchema", "true"), it returns the ID as double and trim the "D". Is there any idea (apart from inferSchema=False) to get correct ...

Data Engineering

11957 Views
5 replies
0 kudos

01-31-2023 9:16:53 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 9:52:21 PM

0 kudos

Hi @tracy ng Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...

0 kudos

04-10-2023 9:52:21 PM

4 More Replies

by ima94 • New Contributor II

02-13-2023 8:01:42 AM

5614 Views
1 replies
1 kudos

read cdm error: java.util.NoSuchElementException: None.get

Hi all, I'm trying to read cdm file and get the error in the image (I replaced the names in uppercase). Any ideas on how to solve it?Thank you!

Data Engineering

5614 Views
1 replies
1 kudos

02-13-2023 8:01:42 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-10-2023 6:02:41 PM

1 kudos

Hi @imma marra Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

1 kudos

03-10-2023 6:02:41 PM

by Fred_F • New Contributor III

01-09-2023 6:57:28 AM

8102 Views
5 replies
5 kudos

JDBC connection timeout on workflow cluster

Hi there,I've a batch process configured in a workflow which fails due to a jdbc timeout on a Postgres DB.I checked the JDBC connection configuration and it seems to work when I query a table and doing a df.show() in the process and it displays th...

Data Engineering

8102 Views
5 replies
5 kudos

01-09-2023 6:57:28 AM

View Replies

Latest Reply

RKNutalapati
Valued Contributor

01-09-2023 11:51:10 AM

5 kudos

HI @Fred Foucart ,The above code looks good to me. Can you try with below code as well.spark.read\ .format("jdbc") \ .option("url", f"jdbc:postgresql://{host}/{database}") \ .option("driver", "org.postgresql.Driver") \ .option("user", username) ...

5 kudos

01-09-2023 11:51:10 AM

4 More Replies

by BkP • Contributor

10-28-2022 12:55:01 AM

2445 Views
2 replies
3 kudos

Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment

Hi All, We are developing a new Scala/Java program which needs to read & process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs & TBs). What kind of connection ...

Data Engineering

2445 Views
2 replies
3 kudos

10-28-2022 12:55:01 AM

View Replies

Latest Reply

BkP
Contributor

10-31-2022 12:31:28 PM

3 kudos

hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this@Kaniz Fatma , @Vartika Nain , @Bilal Aslam

3 kudos

10-31-2022 12:31:28 PM

1 More Replies

by tanjil • New Contributor III

09-01-2022 5:10:42 AM

8231 Views
6 replies
6 kudos

Resolved! Read and transform CSVs in parallel.

I need to read and transform several CSV files and then append them to a single data frame. I am able to do this in databricks using simple for loops, but I would like to speed this up.Below is the rough structure of my code: for filepath in all_file...

Data Engineering

8231 Views
6 replies
6 kudos

09-01-2022 5:10:42 AM

View Replies

Latest Reply

Vidula
Honored Contributor

09-20-2022 9:56:48 PM

6 kudos

Hi @tanjil Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

6 kudos

09-20-2022 9:56:48 PM

5 More Replies

by LearningDatabri • Contributor II

07-05-2022 6:36:26 AM

8034 Views
7 replies
2 kudos

Resolved! Unable to read file from S3

I tried to read a file from S3, but facing the below error:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 53.0 failed 4 times, most recent failure: Lost task 0.3 in stage 53.0 (TID 82, xx.xx.xx.xx, executor 0): com...

Data Engineering

8034 Views
7 replies
2 kudos

07-05-2022 6:36:26 AM

View Replies

Latest Reply

Sivaprasad1
Valued Contributor II

07-05-2022 12:33:36 PM

2 kudos

Which DBR version are you using? Could you please test it with a different DBR version probably DBR 9.x?

2 kudos

07-05-2022 12:33:36 PM

6 More Replies

by sannycse • New Contributor II

03-30-2022 11:54:53 AM

4059 Views
4 replies
6 kudos

Resolved! read the csv file as shown in description

Data Engineering

4059 Views
4 replies
6 kudos

03-30-2022 11:54:53 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

04-13-2022 8:56:47 AM

6 kudos

@SANJEEV BANDRU You can simply do thisJust change the file path CREATE TEMPORARY VIEW readcsv USING CSV OPTIONS ( path "dbfs:/docs/test.csv", header "true", delimiter "|", mode "FAILFAST");select ProjectNo, collect_list(EmployeeNo) Employeesfrom re...

6 kudos

04-13-2022 8:56:47 AM

3 More Replies

by Orianh • Valued Contributor II

10-17-2021 4:55:24 AM

8256 Views
4 replies
2 kudos

Resolved! Read JSON with backslash.

Hello guys.I'm trying to read JSON file which contains backslash and failed to read it via pyspark.Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as te...

Data Engineering

8256 Views
4 replies
2 kudos

10-17-2021 4:55:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-11-2021 8:48:53 AM

2 kudos

@orian hindi - Would you be happy to post the solution you came up with and then mark it as best? That will help other members.

2 kudos

11-11-2021 8:48:53 AM

3 More Replies

by User16783853906 • Contributor III

06-10-2021 2:49:06 PM

2975 Views
2 replies
0 kudos

VACUUM during read/write

Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time? Will it impact the job result/performance?

Data Engineering

2975 Views
2 replies
0 kudos

06-10-2021 2:49:06 PM

View Replies

Latest Reply

User16783853906
Contributor III

06-23-2021 2:26:03 PM

0 kudos

In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...

0 kudos

06-23-2021 2:26:03 PM

1 More Replies