cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alejandrofm
by Valued Contributor
  • 2160 Views
  • 4 replies
  • 2 kudos

Resolved! Orphan (?) files on Databricks S3 bucket

Hi, I'm seeing a lot of empty (and not) directories on routes like:xxxxxx.jobs/FileStore/job-actionstats/xxxxxx.jobs/FileStore/job-result/xxxxxx.jobs/command-results/Can I create a lifecycle to delete old objects (files/directories)? how many days? w...

  • 2160 Views
  • 4 replies
  • 2 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 2 kudos

Hi! I didn't know that, Purging right now, is there a way to schedule that so logs are retained for less time? Maybe I want to maintain the last 7 days for everything?Thanks!

  • 2 kudos
3 More Replies
draculla1208
by New Contributor
  • 998 Views
  • 0 replies
  • 0 kudos

Able to read .hdf files but not able to write to .hdf files from worker nodes and save to dbfs

I have a set of .hdf files that I want to distribute and read on Worker nodes under Databricks environment using PySpark. I am able to read .hdf files on worker nodes and get the data from the files. The next requirement is that now each worker node ...

  • 998 Views
  • 0 replies
  • 0 kudos
Erik
by Valued Contributor II
  • 2130 Views
  • 2 replies
  • 2 kudos

Resolved! Can we have the powerbi connector step into "hive_metastore" automatically?

We are distributing pbids files providing the connection info to databricks. It contains options passed to the "Databricks.Catalogs " function implementing the connection to databricks. It is my understanding that databricks has made this together wi...

  • 2130 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Erik Parmann​ Does @Hubert Dudek​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 2 kudos
1 More Replies
data_boy_2022
by New Contributor III
  • 8062 Views
  • 8 replies
  • 3 kudos

Data ingest of csv files from S3 using Autoloader is slow

I have 150k small csv files (~50Mb) stored in S3 which I want to load into a delta table.All CSV files are stored in the following structure in S3:bucket/folder/name_00000000_00000100.csvbucket/folder/name_00000100_00000200.csvThis is the code I use ...

Cluster Metrics SparkUI_DAG SparkUI_Job
  • 8062 Views
  • 8 replies
  • 3 kudos
Latest Reply
Vidula
Honored Contributor
  • 3 kudos

Hi @Jan R​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 3 kudos
7 More Replies
StephanieAlba
by Valued Contributor III
  • 2358 Views
  • 2 replies
  • 6 kudos

Resolved! Is it possible to use Autoloader with a daily update file structure?

We get new files from a third-p@rty each day. The files could be the same or different. However, each day all csv files arrive in the same dated folder. Is it possible to use autoloader on this structure?We want each csv file to be a table that gets ...

The folders In the folders
  • 2358 Views
  • 2 replies
  • 6 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 6 kudos

Hi @Stephanie Rivera​ , Just a friendly follow-up. Do you still need help, or @Hubert Dudek (Customer)​ 's response help you to find the solution? Please let us know.

  • 6 kudos
1 More Replies
weldermartins
by Honored Contributor
  • 3229 Views
  • 5 replies
  • 13 kudos

Hello everyone, I have a directory with 40 files. File names are divided into prefixes. I need to rename the prefix k3241 according to the name in the...

Hello everyone, I have a directory with 40 files.File names are divided into prefixes. I need to rename the prefix k3241 according to the name in the last prefix.I even managed to insert the csv extension at the end of the file. but renaming files ba...

Template
  • 3229 Views
  • 5 replies
  • 13 kudos
Latest Reply
Anonymous
Not applicable
  • 13 kudos

Hi @welder martins​ How are you doing?Thank you for posting that question. We are glad you could resolve the issue. Would you want to mark an answer as the best solution?Cheers

  • 13 kudos
4 More Replies
Constantine
by Contributor III
  • 1434 Views
  • 2 replies
  • 3 kudos

Resolved! Can't view files of different types in databricks

I am reading a Kafka input using Spark Streaming on databricks and trying to deserialize it. The input is in the form of thrift. I want to create a file of .thrift format to provide schema but am unable to do it. Even if I create the file locally and...

  • 1434 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Hi @John Constantine​ ,Just checking if you still need help or not anymore. If you do, please share as much details and logs as possible, so we would be able to help better.

  • 3 kudos
1 More Replies
tomsyouruncle
by New Contributor III
  • 15997 Views
  • 20 replies
  • 4 kudos

Resolved! How do I enable support for arbitrary files in Databricks Repos? Public Preview feature doesn't appear in admin console.

"Arbitrary files in Databricks Repos", allowing not just notebooks to be added to repos, is in Public Preview. I've tried to activate it following the instructions in the above link but the option doesn't appear in Admin Console. Minimum requirements...

image repos
  • 15997 Views
  • 20 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @Tom Turner​ , An admin can enable this feature as follows:Go to the Admin Console.Click the Workspace Settings tab.In the Repos section, click the Files in Repos toggle.After the feature has been enabled, you must restart your cluster and refresh...

  • 4 kudos
19 More Replies
al_joe
by Contributor
  • 3231 Views
  • 3 replies
  • 1 kudos

Resolved! Where / how does DBFS store files?

I tried to use %fs head to print the contents of a CSV file used in a training%fs head "/mnt/path/file.csv"but got an error saying cannot head a directory!?Then I did %fs ls on the same CSV file and got a list of 4 files under a directory named as a ...

screenshot image
  • 3231 Views
  • 3 replies
  • 1 kudos
Latest Reply
User16753725182
Contributor III
  • 1 kudos

Hi @Al Jo​ , are you still seeing the error while printing the contents of te CSV file?

  • 1 kudos
2 More Replies
BenzDriver
by New Contributor II
  • 1978 Views
  • 2 replies
  • 1 kudos

Resolved! SQL command FSCK is not found

Hello there,I currently have the problem of deleted files still being in the transaction log when trying to call a delta table. What I found was this statement:%sql FSCK REPAIR TABLE table_name [DRY RUN]But using it returned following error:Error in ...

  • 1978 Views
  • 2 replies
  • 1 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 1 kudos

Remove square brackets and try executing the command%sqlFSCK REPAIR TABLE table_name DRY RUN

  • 1 kudos
1 More Replies
MichaelO
by New Contributor III
  • 12087 Views
  • 3 replies
  • 2 kudos

Resolved! Transfer files saved in filestore to either the workspace or to a repo

I built a machine learning model:lr = LinearRegression() lr.fit(X_train, y_train)which I can save to the filestore by:filename = "/dbfs/FileStore/lr_model.pkl" with open(filename, 'wb') as f: pickle.dump(lr, f)Ideally, I wanted to save the model ...

  • 12087 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Michael Okelola​ , When you store the file in DBFS (/FileStore/...), it's in your account (data plane). While notebooks, etc. are in the Databricks account (control plane). By design, you can't import non-code objects into a workspace. But Repos ...

  • 2 kudos
2 More Replies
CleverAnjos
by New Contributor III
  • 5065 Views
  • 8 replies
  • 5 kudos

Resolved! Best way of loading several csv files in a table

What would be the best way of loading several files like in a single table to be consumed?https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-10.csvhttps://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-11.csvhttps://s3.amazonaws...

  • 5065 Views
  • 8 replies
  • 5 kudos
Latest Reply
CleverAnjos
New Contributor III
  • 5 kudos

Thanks Kaniz, I already have the files. I was discussing about the best way to load them

  • 5 kudos
7 More Replies
wyzer
by Contributor II
  • 2943 Views
  • 2 replies
  • 4 kudos

Resolved! How to show the properties of the folders/files from DBFS ?

Hello,How to show the properties of the folders/files from DBFS ?Currently i am using this command :display(dbutils.fs.ls("dbfs:/"))But it only shows :pathnamesizeHow to show these properties ? : CreatedBy (Name)CreatedOn (Date)ModifiedBy (Name)Modi...

  • 2943 Views
  • 2 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

Only one idea is to use %sh magic command but there is no name (just root)

  • 4 kudos
1 More Replies
Labels