cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

a2_ish
by New Contributor II
  • 1930 Views
  • 1 replies
  • 0 kudos

Where are delta lake files stored by given path?

I have below code which works for the path below but fails for path = azure storage account path. i have enough access to write and update the storage account. I would like to know what wrong am I doing and the path below which works , how can i phys...

  • 1930 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Ankit Kumar​ :The error message you received indicates that the user does not have sufficient permission to access the Azure Blob Storage account. You mentioned that you have enough access to write and update the storage account, but it's possible t...

  • 0 kudos
Tonny_Stark
by New Contributor II
  • 10212 Views
  • 6 replies
  • 0 kudos

FileNotFoundError: [Errno 2] No such file or directory: when I try to unzip .tar or .zip files it gives me this error

Hello, how are you? I have a small problem. I need to unzip some .zip, tar files. and gz inside these may have multiple files trying to unzip the .zip files i got this errorFileNotFoundError: [Errno 2] No such file or directory: but the files are in ...

error
  • 10212 Views
  • 6 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Alfredo Vallejos​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

  • 0 kudos
5 More Replies
Sagacious
by New Contributor II
  • 12825 Views
  • 5 replies
  • 0 kudos

How to upload large files to Databricks? and how to unzip files successfully?

I have two JSON files, one ~3 gb and one ~5 gb. I am unable to upload them to databricks community edition as they exceed the max allowed up-loadable file size (~2 gb). If I zip them I am able to upload them, but I am also having issues figuring out ...

  • 12825 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Sage Olson​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 0 kudos
4 More Replies
seboz123
by New Contributor II
  • 2166 Views
  • 3 replies
  • 0 kudos

Display Html from dbfs files

Hi,I want to display some content from dbfs inside my notebook. Let's say I have a image under: /dbfs/mnt/test-bucket/test-custom/sample.pngI want to embed that into my Notebook Html Output like this:displayHTML("""<img src ='/dbfs/mnt/test-bucket/te...

  • 2166 Views
  • 3 replies
  • 0 kudos
Latest Reply
seboz123
New Contributor II
  • 0 kudos

Hi @Vidula Khanna​ ,unfortunately not. I can access the file through the notebook via e.g. !ls /dbfs/mnt/test-bucket/test-custom/ but it cannot be displayed via the displayHTML, with the 401.

  • 0 kudos
2 More Replies
FabriceDeseyn
by Contributor
  • 5683 Views
  • 5 replies
  • 1 kudos

Resolved! Autoloader directory listing not listing all files

Hi communityI have an Autoloader pipeline running with following configuration. Unfortunately, it does not detect all files. (see below query definition). The folder that needs to be read has 38.246 files that all have the same schema and structure.:...

image.png image.png image.png image.png
  • 5683 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Fabrice Deseyn​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 1 kudos
4 More Replies
MikeJohnsonZa
by New Contributor
  • 2240 Views
  • 3 replies
  • 0 kudos

Resolved! Importing irregularly formatted json files

HiI'm importing a large collection of json files, the problem is that they are not what I would expect a well-formatted json file to be (although probably still valid), each file consists of only a single record that looks something like this (this i...

  • 2240 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Michael Johnson​,I would like to share the following notebook which contains examples on how to process complex data types, like JSON. Please check the following link and let us know if you still need help https://docs.databricks.com/optimization...

  • 0 kudos
2 More Replies
kkawka1
by New Contributor III
  • 5955 Views
  • 8 replies
  • 10 kudos

Resolved! Removing files saved in the root FileStore

We have just started working with databricks in one of my university modules, and the lecturers gave us a set of commands to practice saving data in the FileStore. One of the commands was the following:dbutils .fs.cp("/ databricks - datasets / weathh...

  • 5955 Views
  • 8 replies
  • 10 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 10 kudos

Hi @Konrad Kawka​ , Are you using the Community Edition?

  • 10 kudos
7 More Replies
Prototype998
by New Contributor III
  • 3672 Views
  • 5 replies
  • 2 kudos

Resolved! reading multiple csv files using pathos.multiprocessing

I'm using PySpark and Pathos to read numerous CSV files and create many DF, but I keep getting this problem.code for the same:-from pathos.multiprocessing import ProcessingPooldef readCsv(path):  return spark.read.csv(path,header=True)csv_file_list =...

dbx_error
  • 3672 Views
  • 5 replies
  • 2 kudos
Latest Reply
Prototype998
New Contributor III
  • 2 kudos

@Ajay Pandey​ @Rishabh Pandey​ 

  • 2 kudos
4 More Replies
KVNARK
by Honored Contributor II
  • 1035 Views
  • 1 replies
  • 5 kudos

Resolved! Trigger another .py file by uisng 2 .py files.

Hi,I have 3 .py files - a.py, b.py & c.py files. By joining a.py & b.py, based on the output that I get need to trigger the c.py file.

  • 1035 Views
  • 1 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

Hi @KVNARK .​ refer below link this will help in thisLink

  • 5 kudos
avenu
by New Contributor
  • 1795 Views
  • 1 replies
  • 0 kudos

AutoLoader - process multiple files

I need to process files of different schema coming to different folders in ADLS using Autoloader. Do I need to start a separate read stream for each file type / folder or can this be handled using a single stream ?When I tried using a single stream, ...

  • 1795 Views
  • 1 replies
  • 0 kudos
Latest Reply
Wassim
New Contributor III
  • 0 kudos

As you are talking about different schemas ,perhaps schemaevolutionmode, infercolumntypes, or schemahints may help?? Check out this- 32min onward - https://youtu.be/8a38Fv9cpd8 ​Hope it helps, do let know how you solve it if you can.​

  • 0 kudos
db-avengers2rul
by Contributor II
  • 6768 Views
  • 2 replies
  • 0 kudos

Resolved! delete files from the directory

Is there a way to delete recursively files using a command in notebookssince in the below directory i have many combination of files like .txt,,png,.jpg but i only want to delete files with .csv example dbfs:/FileStore/.csv*

  • 6768 Views
  • 2 replies
  • 0 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 0 kudos

Hi @Rakesh Reddy Gopidi​ You can use the os module to iterate over a directory.By using a loop over the directory, you can check what the file ends with using .endsWith(".csv).After fetching all the files, you can remove it. Hope this helps..Cheers.

  • 0 kudos
1 More Replies
Gerhard
by New Contributor III
  • 1278 Views
  • 0 replies
  • 1 kudos

Read proprietary files and transform contents to a table - error resilient process needed

We do have data stored in HDF5 files in a "proprietary" way. This data needs to be read, converted and transformed before it can be inserted into a delta table.All of this transformation is done in a custom python function that takes the HDF5 file an...

  • 1278 Views
  • 0 replies
  • 1 kudos
THIAM_HUATTAN
by Valued Contributor
  • 3313 Views
  • 5 replies
  • 4 kudos

Resolved! Using R, how do we write csv file to say dbfs:/tmp?

let us say I already have the data 'TotalData'write.csv(TotalData,file='/tmp/TotalData.csv',row.names = FALSE)I do not see any error from abovewhen I list files below:%fs ls /tmpI do not see any files written there. Why?

  • 3313 Views
  • 5 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @THIAM HUAT TAN​  We haven’t heard from you since the last response from @Cedric Law Hing Ping​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to oth...

  • 4 kudos
4 More Replies
-werners-
by Esteemed Contributor III
  • 2356 Views
  • 2 replies
  • 17 kudos

Autoloader: how to avoid overlap in files

I'm thinking of using autoloader to process files being put on our data lake.Let's say f.e. every 15 minutes, a parquet files is written. These files however contain overlapping data.Now, every 2 hours I want to process the new data (autoloader) and...

  • 2356 Views
  • 2 replies
  • 17 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 17 kudos

What about forEachBatch and then MERGE?Alternatively, run another process that will clean updates using the window function, as you said.

  • 17 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 2155 Views
  • 0 replies
  • 5 kudos

Clean up _delta_log files

Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:SET spark.da...

  • 2155 Views
  • 0 replies
  • 5 kudos
Labels