cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RiyazAli
by Valued Contributor
  • 3090 Views
  • 4 replies
  • 3 kudos

Resolved! Where does the files downloaded from wget get stored in Databricks?

Hey Team!All I'm trying is to download a csv file stored on S3 and read it using Spark.Here's what I mean:!wget https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csvIf i download this "yellow_tripdata_2020-01.csv" where exactly it wo...

  • 3090 Views
  • 4 replies
  • 3 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 3 kudos

Hi @Kaniz Fatma​ , thanks for the remainder.Hey @Hubert Dudek​ - thank you very much for your prompt response.Initially, I was using urllib3 to 'GET' the data residing in the URL. So, I wanted an alternative for the same. Unfortunately, requests libr...

  • 3 kudos
3 More Replies
RantoB
by Valued Contributor
  • 5440 Views
  • 19 replies
  • 7 kudos

Resolved! unzip twice the same file not executing

Hi, I need to unzip some files that are ingested but when I unzip twice the same zipped file, the unzip command does not execute :As suggesgted in the documentation I did :import urllib urllib.request.urlretrieve("https://resources.lendingclub.com/L...

  • 5440 Views
  • 19 replies
  • 7 kudos
Latest Reply
Kaniz
Community Manager
  • 7 kudos

Hi @Bertrand BURCKER​ , Create a script.sh and copy the script in the directory where is data.zip archive. This script is working with any name of archives and any name of csv.#!/bin/bash   currLoc="$PWD" path="${currLoc}"   cd ${currLoc}   #EXTRACT ...

  • 7 kudos
18 More Replies
Jiri_Koutny
by New Contributor III
  • 4923 Views
  • 8 replies
  • 4 kudos

Resolved! Programatic access to Files in Repos

Hi, we are testing the new Files support in Databricks repos. Is there a way how to programmatically read notebooks?Thanks

Untitled
  • 4923 Views
  • 8 replies
  • 4 kudos
Latest Reply
User16871418122
Contributor III
  • 4 kudos

Hi @Jiri Koutny​ these files anyway should be synced to your remote repository (git, bitbucket, GitLab etc). The APIs from version control tools Git API for example might help you achieve what you want. https://stackoverflow.com/questions/38491722/r...

  • 4 kudos
7 More Replies
Sarvagna_Mahaka
by New Contributor III
  • 11158 Views
  • 6 replies
  • 8 kudos

Resolved! Exporting csv files from Databricks

I'm trying to export a csv file from my Databricks workspace to my laptop.I have followed the below steps. 1.Installed databricks CLI2. Generated Token in Azure Databricks3. databricks configure --token5. Token:xxxxxxxxxxxxxxxxxxxxxxxxxx6. databrick...

  • 11158 Views
  • 6 replies
  • 8 kudos
Latest Reply
User16871418122
Contributor III
  • 8 kudos

Hi @Sarvagna Mahakali​ There is an easier hack: a) You can save results locally on the disk and create a hyper link for downloading CSV . You can copy the file to this location: dbfs:/FileStore/table1_good_2020_12_18_07_07_19.csvb) Then download with...

  • 8 kudos
5 More Replies
pine
by New Contributor III
  • 2500 Views
  • 5 replies
  • 4 kudos

Resolved! Databricks fails writing after writing ~30 files

Good day, Copy of https://stackoverflow.com/questions/69974301/looping-through-files-in-databricks-failsI got 100 files of csv data on adls-gen1 store. I want to do some processing to them and save results to same drive, different directory. def look...

  • 2500 Views
  • 5 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

was actually anything created by script in directory <my_output_dir>?The best would be to permanently mount ADSL storage and use azure app for that.In Azure please go to App registrations - register app with name for example "databricks_mount" . Ad...

  • 4 kudos
4 More Replies
Jreco
by Contributor
  • 2714 Views
  • 3 replies
  • 5 kudos

Resolved! Reference py file from a notebook

Hi All,I'm trying to reference a py file from a notebook following this documentation: Files in repoI downloaded and added the files to my repo and when I try to run the notebook, the modules is not recognized: Any idea why is this happening? Thanks ...

image image
  • 2714 Views
  • 3 replies
  • 5 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 5 kudos

In this topic you can find some more info:https://community.databricks.com/s/question/0D53f00001Pp5EhCAJThe docs are not that clear.

  • 5 kudos
2 More Replies
snoeprol
by New Contributor II
  • 3666 Views
  • 4 replies
  • 2 kudos

Resolved! Unable to open files with python, but filesystem shows files exist

Dear community,I have the following problem:%fs mv '/FileStore/Tree_point_classification-1.dlpk' '/dbfs/mnt/group22/Tree_point_classification-1.dlpk'I have uploaded a file of a ML-model and have transferred it to the directory with When I now check ...

  • 3666 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

There is dbfs:/dbfs/ displayed maybe file is in /dbfs/dbfs directory? Please check it and try to open with open('/dbfs/dbfs. You can also use "data" from left menu to check what is in dbfs file system more easily.

  • 2 kudos
3 More Replies
Anonymous
by Not applicable
  • 1862 Views
  • 2 replies
  • 0 kudos
  • 1862 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

You can disable download button for notebook results which exports results as csv from admin console - > workspace settings -> advanced section

  • 0 kudos
1 More Replies
User16765131552
by Contributor III
  • 1080 Views
  • 1 replies
  • 1 kudos

Resolved! Saving Files Location

If someone saves a flat file from a cell without specifying any location, where does it save?

screen_shot_2021-04-16_at_1.32.57_pm
  • 1080 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16765131552
Contributor III
  • 1 kudos

In this case they are writing to a directory on the driver.

  • 1 kudos
User16826992666
by Valued Contributor
  • 904 Views
  • 1 replies
  • 0 kudos

How do I know if the number of files are causing performance issues?

I have read and heard that having too many small files can cause performance problems when reading large data sets. But how do I know if that is an issue I am facing?

  • 904 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Databricks SQL endpoint has a query history section which provides additional information to debug / tune queries. One such metric under execution details is the number of files read. For ETL/Data science workloads, you could use the Spark UI of the ...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1338 Views
  • 1 replies
  • 0 kudos

How to get the files with a prefix in Pyspark from s3 bucket?

I have different files in my s3. Now I want to get the files which starts with cop_

  • 1338 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

You are referencing a FileInfo object when calling .startswith()and not a string.The filename is a property of the FileInfo object, so this should work filename.name.startswith('cop_ ') should work.

  • 0 kudos
Labels