Topics with Label: File

Forum Posts

Sorted by:

by Enzo_Bahrami • New Contributor III

05-25-2023 5:53:30 PM

1547 Views
2 replies
0 kudos

Resolved! Input File Path from Autoloader in Delta Live Tables

Hello everyone!I was wondering if there is any way to get the subdirectories in which the file resides while loading while loading using Autoloader with DLT. For example:def customer(): return ( spark.readStream.format('cloudfiles') .option('clou...

Data Engineering

1547 Views
2 replies
0 kudos

05-25-2023 5:53:30 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 1:37:22 AM

0 kudos

Hi @Parsa Bahraminejad We haven't heard from you since the last response from @Vigneshraja Palaniraj , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be...

0 kudos

06-01-2023 1:37:22 AM

1 More Replies

by cnjrules • New Contributor III

04-27-2023 2:25:24 PM

1567 Views
3 replies
0 kudos

Resolved! Reference file name when using COPY INTO?

When using the COPY INTO statement is it possible to reference the current file name in the select staement? A generic example is shown below, hoping I can log the file name in the target table.COPY INTO my_table FROM (SELECT key, index, textData, ...

Data Engineering

1567 Views
3 replies
0 kudos

04-27-2023 2:25:24 PM

View Replies

Latest Reply

cnjrules
New Contributor III

05-08-2023 12:45:52 PM

0 kudos

Found the info I was looking for on the page below:https://docs.databricks.com/ingestion/file-metadata-column.html

0 kudos

05-08-2023 12:45:52 PM

2 More Replies

by nyehia • Contributor

04-26-2023 7:35:36 AM

4650 Views
19 replies
1 kudos

Can not access SQL files in the Shared workspace

Hey,we have an issue in that we can access the SQL files whenever the notebook is in the repo path but whenever the CICD pipeline imports the repo notebooks and SQL files to the shared workspace, we can list the SQL files but can not read them.we cha...

Data Engineering

4650 Views
19 replies
1 kudos

04-26-2023 7:35:36 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

04-27-2023 10:29:09 AM

1 kudos

@Nermin Yehia yes, as you are moving files to different location manually , just update as can manage permissions in target and that should take care of everything

1 kudos

04-27-2023 10:29:09 AM

18 More Replies

by Praveen • New Contributor II

09-27-2021 1:22:34 AM

6061 Views
10 replies
1 kudos

Resolved! Pass Typesafe config file to the Spark Submit Job

Hello everyone ! I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file. Code: import org.slf4j.{Logger, LoggerFactory} import com.typesafe.config.{Config, ConfigFactory} import org.apache.spa...

Data Engineering

6061 Views
10 replies
1 kudos

09-27-2021 1:22:34 AM

View Replies

Latest Reply

source2sea
Contributor

04-25-2023 6:54:24 AM

1 kudos

I've experenced similar issues; please help to answer how to get this working;I've tried using below to be either /dbfs/mnt/blah path or dbfs:/mnt/blah pathin either spark_submit_task or spark_jar_task (via cluster spark_conf for java optinos); no su...

1 kudos

04-25-2023 6:54:24 AM

9 More Replies

by andrew0117 • Contributor

04-16-2023 8:39:11 PM

2745 Views
4 replies
0 kudos

Resolved! partition on a csv file

When I use SQL code like "create table myTable (column1 string, column2 string) using csv options('delimiter' = ',', 'header' = 'true') location 'pathToCsv'" to create a table from a single CSV file stored in a folder within an Azure Data Lake contai...

Data Engineering

2745 Views
4 replies
0 kudos

04-16-2023 8:39:11 PM

View Replies

Latest Reply

pvignesh92
Honored Contributor

04-17-2023 4:40:46 AM

0 kudos

Hi @andrew li, When you specify a path with LOCATION keyword, Spark will consider that to be an EXTERNAL table. So when you dropped the table, you underlying data if any will not be cleared. So in you case, as this is an external table, you folder s...

0 kudos

04-17-2023 4:40:46 AM

3 More Replies

by Tonny_Stark • New Contributor II

04-05-2023 4:06:02 PM

3596 Views
6 replies
0 kudos

FileNotFoundError: [Errno 2] No such file or directory: when I try to unzip .tar or .zip files it gives me this error

Hello, how are you? I have a small problem. I need to unzip some .zip, tar files. and gz inside these may have multiple files trying to unzip the .zip files i got this errorFileNotFoundError: [Errno 2] No such file or directory: but the files are in ...

Data Engineering

3596 Views
6 replies
0 kudos

04-05-2023 4:06:02 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-07-2023 11:46:17 PM

0 kudos

Hi @Alfredo Vallejos Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

0 kudos

04-07-2023 11:46:17 PM

5 More Replies

by andrew0117 • Contributor

01-05-2023 12:03:56 PM

2961 Views
6 replies
2 kudos

index a dataframe from a csv file based on the file's original order (not based on any specific column, based on the entire row) using spark

how to guarantee the index is always following the file's original order no matter what. Currently, I'm using val df = spark.read.options(Map("header"-> "true", "inferSchema" -> "true")).csv("filePath").withColumn("index", monotonically_increasing...

Data Engineering

2961 Views
6 replies
2 kudos

01-05-2023 12:03:56 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-05-2023 1:39:33 PM

2 kudos

monotonically_increasing_id will not as it is to guarantee that every partition has separate ids. What is the whole code? Do you load directory with a lot of CSVs? What "original order" means? Is it csvs ordered by file creation date, by file name? o...

2 kudos

01-05-2023 1:39:33 PM

5 More Replies

by Sagacious • New Contributor II

01-31-2023 9:02:46 PM

8045 Views
5 replies
0 kudos

How to upload large files to Databricks? and how to unzip files successfully?

I have two JSON files, one ~3 gb and one ~5 gb. I am unable to upload them to databricks community edition as they exceed the max allowed up-loadable file size (~2 gb). If I zip them I am able to upload them, but I am also having issues figuring out ...

Data Engineering

8045 Views
5 replies
0 kudos

01-31-2023 9:02:46 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 9:04:55 PM

0 kudos

Hi @Sage Olson Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

0 kudos

04-08-2023 9:04:55 PM

4 More Replies

by Tonny_Stark • New Contributor II

04-05-2023 6:51:08 AM

1654 Views
3 replies
0 kudos

FileNotFoundError: [Errno 2] No such file or directory:

I have the following error code in databricks when I want to unzip filesFileNotFoundError: [Errno 2] No such file or directory: but the file is there I already tried several ways and nothing worksI have tried modifying by placing/dbfs/mnt/dbfs/mnt/d...

Data Engineering

1654 Views
3 replies
0 kudos

04-05-2023 6:51:08 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

04-06-2023 5:50:25 AM

0 kudos

@Alfredo Vallejos then your file is tar.gz file right, have you tried tar command instead of unzip

0 kudos

04-06-2023 5:50:25 AM

2 More Replies

by Larrio • New Contributor III

03-07-2023 2:06:48 AM

4023 Views
6 replies
3 kudos

Autoloader - understanding missing file after schema update.

Hello,Concerning Autoloader (based on https://docs.databricks.com/ingestion/auto-loader/schema.html), so far what I understand is when it detects a schema update, the stream fails and I have to rerun it to make it works, it's ok.But once I rerun it, ...

Data Engineering

4023 Views
6 replies
3 kudos

03-07-2023 2:06:48 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:47:18 PM

3 kudos

Hi @Lucien Arrio Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

3 kudos

03-31-2023 5:47:18 PM

5 More Replies

by uv • New Contributor II

03-26-2023 8:51:04 PM

2895 Views
3 replies
2 kudos

Parquet to csv delta file

Hi Team, I have a parquet file in s3 bucket which is a delta file I am able to read it but I am unable to write it as a csv file.getting the following error when i am trying to write:A transaction log for Databricks Delta was found at `s3://path/a...

Data Engineering

2895 Views
3 replies
2 kudos

03-26-2023 8:51:04 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-03-2023 11:40:42 PM

2 kudos

Hi @yuvesh kotiala Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

2 kudos

04-03-2023 11:40:42 PM

2 More Replies

by Akshith_Rajesh • New Contributor III

03-05-2023 3:13:15 AM

1215 Views
4 replies
1 kudos

Does DataBricks lock the file in Adls Gen 2 before writing (Append) to a file If yes then how can we fetch the file is locked

I have a requirement , I am running 2 Notebooks parallelly I want to overwrite the file parallelly .If 2 Notebooks Try to overwrite the file at the same time , will I lose the data because of overwriting the file at the same time .I want to overwr...

Data Engineering

1215 Views
4 replies
1 kudos

03-05-2023 3:13:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 4:52:14 PM

1 kudos

Hi @Rajesh Akshith Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

1 kudos

03-31-2023 4:52:14 PM

3 More Replies

by Galdino • New Contributor II

05-15-2022 2:29:06 PM

3112 Views
3 replies
1 kudos

How to read a json from BytesIO with PySpark?

I want read a json from IO variable using PySpark.My code using pandas:io = BytesIO()ftp.retrbinary('RETR '+ file_name, io.write)io.seek(0)# With pandasdf = pd.read_json(io)What I tried using PySpark, but don't work: io = BytesIO() ftp.retrbinary('...

Data Engineering

3112 Views
3 replies
1 kudos

05-15-2022 2:29:06 PM

View Replies

Latest Reply

Erik_L
Contributor II

03-16-2023 11:57:30 AM

1 kudos

Just use pandas and follow with spark.createDataFrame(df)

1 kudos

03-16-2023 11:57:30 AM

2 More Replies

by brickster • New Contributor II

10-30-2022 3:28:41 AM

2145 Views
3 replies
0 kudos

How to trigger workflow job tasks from Autoloader

I have configured a File Notification Autoloader that monitors S3 bucket for binary files. I want to integrate autoloader with workflow job so that whenever a file is placed in S3 bucket, the pipeline job notebook tasks can pick-up new file and start...

Data Engineering

2145 Views
3 replies
0 kudos

10-30-2022 3:28:41 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-08-2023 9:56:12 PM

0 kudos

Hi @Saravanan Ponnaiah Hope everything is going great.Does @odoll odoll response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

0 kudos

01-08-2023 9:56:12 PM

2 More Replies

by aline_alvarez • New Contributor III

02-08-2023 4:07:05 AM

2164 Views
6 replies
7 kudos

Resolved! How can I delete a file in DBFS with Illegal character?

How can I delete a file in DBFS with Illegal character?Someone put the file named "planejamento_[4098.]___SHORT_SAIA_JEANS__.xlsx" inside the folder /FileStore and I can delete it, because of this error: java.net.URISyntaxException: Illegal character...

Data Engineering

2164 Views
6 replies
7 kudos

02-08-2023 4:07:05 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-17-2023 4:23:51 AM

7 kudos

try this %sh ls -li /dbfsif the file is located in a subdirectory you can change the path mentioned above.the %sh magic command gives you access to linux shell commands.

7 kudos

02-17-2023 4:23:51 AM

5 More Replies