Topics with Label: Files

Forum Posts

Sorted by:

by draculla1208 • New Contributor

10-28-2022 7:22:23 AM

635 Views
0 replies
0 kudos

Able to read .hdf files but not able to write to .hdf files from worker nodes and save to dbfs

I have a set of .hdf files that I want to distribute and read on Worker nodes under Databricks environment using PySpark. I am able to read .hdf files on worker nodes and get the data from the files. The next requirement is that now each worker node ...

Data Engineering

635 Views
0 replies
0 kudos

10-28-2022 7:22:23 AM

by Erik • Valued Contributor II

09-30-2022 3:40:28 AM

1278 Views
2 replies
2 kudos

Resolved! Can we have the powerbi connector step into "hive_metastore" automatically?

We are distributing pbids files providing the connection info to databricks. It contains options passed to the "Databricks.Catalogs " function implementing the connection to databricks. It is my understanding that databricks has made this together wi...

Data Engineering

1278 Views
2 replies
2 kudos

09-30-2022 3:40:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

10-08-2022 10:49:56 PM

2 kudos

Hi @Erik Parmann Does @Hubert Dudek response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

2 kudos

10-08-2022 10:49:56 PM

1 More Replies

by data_boy_2022 • New Contributor III

08-30-2022 1:44:24 PM

4436 Views
8 replies
3 kudos

Data ingest of csv files from S3 using Autoloader is slow

I have 150k small csv files (~50Mb) stored in S3 which I want to load into a delta table.All CSV files are stored in the following structure in S3:bucket/folder/name_00000000_00000100.csvbucket/folder/name_00000100_00000200.csvThis is the code I use ...

Data Engineering

4436 Views
8 replies
3 kudos

08-30-2022 1:44:24 PM

View Replies

Latest Reply

Vidula
Honored Contributor

09-17-2022 12:45:08 AM

3 kudos

Hi @Jan R Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

3 kudos

09-17-2022 12:45:08 AM

7 More Replies

by chandan_a_v • Valued Contributor

08-18-2022 1:25:35 AM

950 Views
1 replies
1 kudos

Can't import local files under repo

I have a yaml file inside one of the sub dir in Databricks, I have appended the repo path to sys. Still I can't access this file. https://docs.databricks.com/_static/notebooks/files-in-repos.html

Data Engineering

950 Views
1 replies
1 kudos

08-18-2022 1:25:35 AM

View Replies

Latest Reply

chandan_a_v
Valued Contributor

08-18-2022 1:26:30 AM

1 kudos

@Kaniz Fatma ,Could you please help me out here?

1 kudos

08-18-2022 1:26:30 AM

by Swann • New Contributor

07-25-2022 1:54:11 AM

489 Views
0 replies
0 kudos

How to enforce schema check and benefit from badRecordsPath when using autoloader

We would like to have a robust reader that ensure that the data we read and write using the autoloader respect the schema which is provided to the autoloader reader.We also provide the option "badRecordsPath" (refer to https://docs.databricks.com/spa...

Data Engineering

489 Views
0 replies
0 kudos

07-25-2022 1:54:11 AM

by StephanieRivera • Valued Contributor II

04-08-2022 9:10:51 AM

1543 Views
2 replies
6 kudos

Resolved! Is it possible to use Autoloader with a daily update file structure?

We get new files from a third-p@rty each day. The files could be the same or different. However, each day all csv files arrive in the same dated folder. Is it possible to use autoloader on this structure?We want each csv file to be a table that gets ...

Data Engineering

1543 Views
2 replies
6 kudos

04-08-2022 9:10:51 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-26-2022 4:10:33 AM

6 kudos

Hi @Stephanie Rivera , Just a friendly follow-up. Do you still need help, or @Hubert Dudek (Customer) 's response help you to find the solution? Please let us know.

6 kudos

04-26-2022 4:10:33 AM

1 More Replies

by weldermartins • Honored Contributor

02-04-2022 4:55:32 AM

1922 Views
5 replies
13 kudos

Hello everyone, I have a directory with 40 files. File names are divided into prefixes. I need to rename the prefix k3241 according to the name in the...

Hello everyone, I have a directory with 40 files.File names are divided into prefixes. I need to rename the prefix k3241 according to the name in the last prefix.I even managed to insert the csv extension at the end of the file. but renaming files ba...

Data Engineering

1922 Views
5 replies
13 kudos

02-04-2022 4:55:32 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-13-2022 7:39:11 AM

13 kudos

Hi @welder martins How are you doing?Thank you for posting that question. We are glad you could resolve the issue. Would you want to mark an answer as the best solution?Cheers

13 kudos

04-13-2022 7:39:11 AM

4 More Replies

by Constantine • Contributor III

03-06-2022 10:49:56 AM

875 Views
2 replies
3 kudos

Resolved! Can't view files of different types in databricks

I am reading a Kafka input using Spark Streaming on databricks and trying to deserialize it. The input is in the form of thrift. I want to create a file of .thrift format to provide schema but am unable to do it. Even if I create the file locally and...

Data Engineering

875 Views
2 replies
3 kudos

03-06-2022 10:49:56 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

04-11-2022 1:59:33 PM

3 kudos

Hi @John Constantine ,Just checking if you still need help or not anymore. If you do, please share as much details and logs as possible, so we would be able to help better.

3 kudos

04-11-2022 1:59:33 PM

1 More Replies

by Direo • Contributor

04-07-2022 5:06:11 AM

7386 Views
3 replies
3 kudos

Resolved! How temporary is dbfs:/tmp/? Are files periodically deleted from there?

Data Engineering

7386 Views
3 replies
3 kudos

04-07-2022 5:06:11 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-08-2022 1:15:36 PM

3 kudos

Hi @Direo Direo , Did @Nidhin Valiyaparambil Narayanan Nair 's reply help you resolve your query?

3 kudos

04-08-2022 1:15:36 PM

2 More Replies

by tomsyouruncle • New Contributor III

03-13-2022 4:20:01 PM

9619 Views
20 replies
4 kudos

Resolved! How do I enable support for arbitrary files in Databricks Repos? Public Preview feature doesn't appear in admin console.

"Arbitrary files in Databricks Repos", allowing not just notebooks to be added to repos, is in Public Preview. I've tried to activate it following the instructions in the above link but the option doesn't appear in Admin Console. Minimum requirements...

Data Engineering

9619 Views
20 replies
4 kudos

03-13-2022 4:20:01 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-21-2022 4:55:16 AM

4 kudos

Hi @Tom Turner , An admin can enable this feature as follows:Go to the Admin Console.Click the Workspace Settings tab.In the Repos section, click the Files in Repos toggle.After the feature has been enabled, you must restart your cluster and refresh...

4 kudos

03-21-2022 4:55:16 AM

19 More Replies

by al_joe • Contributor

02-05-2022 12:19:12 AM

1886 Views
3 replies
1 kudos

Resolved! Where / how does DBFS store files?

I tried to use %fs head to print the contents of a CSV file used in a training%fs head "/mnt/path/file.csv"but got an error saying cannot head a directory!?Then I did %fs ls on the same CSV file and got a list of 4 files under a directory named as a ...

Data Engineering

1886 Views
3 replies
1 kudos

02-05-2022 12:19:12 AM

View Replies

Latest Reply

User16753725182
Contributor III

03-02-2022 6:56:09 AM

1 kudos

Hi @Al Jo , are you still seeing the error while printing the contents of te CSV file?

1 kudos

03-02-2022 6:56:09 AM

2 More Replies

by BenzDriver • New Contributor II

02-17-2022 8:06:08 AM

1296 Views
2 replies
1 kudos

Resolved! SQL command FSCK is not found

Hello there,I currently have the problem of deleted files still being in the transaction log when trying to call a delta table. What I found was this statement:%sql FSCK REPAIR TABLE table_name [DRY RUN]But using it returned following error:Error in ...

Data Engineering

1296 Views
2 replies
1 kudos

02-17-2022 8:06:08 AM

View Replies

Latest Reply

RKNutalapati
Valued Contributor

02-17-2022 8:14:37 AM

1 kudos

Remove square brackets and try executing the command%sqlFSCK REPAIR TABLE table_name DRY RUN

1 kudos

02-17-2022 8:14:37 AM

1 More Replies

by MichaelO • New Contributor III

01-28-2022 1:49:44 PM

10155 Views
3 replies
2 kudos

Resolved! Transfer files saved in filestore to either the workspace or to a repo

I built a machine learning model:lr = LinearRegression() lr.fit(X_train, y_train)which I can save to the filestore by:filename = "/dbfs/FileStore/lr_model.pkl" with open(filename, 'wb') as f: pickle.dump(lr, f)Ideally, I wanted to save the model ...

Data Engineering

10155 Views
3 replies
2 kudos

01-28-2022 1:49:44 PM

View Replies

Latest Reply

Kaniz
Community Manager

02-04-2022 1:10:11 PM

2 kudos

Hi @Michael Okelola , When you store the file in DBFS (/FileStore/...), it's in your account (data plane). While notebooks, etc. are in the Databricks account (control plane). By design, you can't import non-code objects into a workspace. But Repos ...

2 kudos

02-04-2022 1:10:11 PM

2 More Replies

by CleverAnjos • New Contributor III

01-17-2022 12:24:41 PM

2828 Views
8 replies
5 kudos

Resolved! Best way of loading several csv files in a table

What would be the best way of loading several files like in a single table to be consumed?https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-10.csvhttps://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-11.csvhttps://s3.amazonaws...

Data Engineering

2828 Views
8 replies
5 kudos

01-17-2022 12:24:41 PM

View Replies

Latest Reply

CleverAnjos
New Contributor III

01-31-2022 4:04:37 AM

5 kudos

Thanks Kaniz, I already have the files. I was discussing about the best way to load them

5 kudos

01-31-2022 4:04:37 AM

7 More Replies

by wyzer • Contributor II

01-27-2022 5:50:25 AM

2141 Views
2 replies
4 kudos

Resolved! How to show the properties of the folders/files from DBFS ?

Hello,How to show the properties of the folders/files from DBFS ?Currently i am using this command :display(dbutils.fs.ls("dbfs:/"))But it only shows :pathnamesizeHow to show these properties ? : CreatedBy (Name)CreatedOn (Date)ModifiedBy (Name)Modi...

Data Engineering

2141 Views
2 replies
4 kudos

01-27-2022 5:50:25 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-27-2022 6:13:06 AM

4 kudos

Only one idea is to use %sh magic command but there is no name (just root)

4 kudos

01-27-2022 6:13:06 AM

1 More Replies