- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2023 07:56 AM
Hello,
I am somewhat new to Databricks and am trying to build a Q&A application based on a collection of documents. I need to move .pdf and .docx files from my local machine to storage in Databricks and eventually a document store. My questions are:
- What is the most effective method for uploading my documents to Databricks storage?
- What are the best document stores to use to build this application?
Thank you!
David
- Labels:
-
Data Science
-
Local Machine
-
Question
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2023 09:25 AM
Hi all,
I took an initial stab at task one with some success using the Databricks CLI. Here are the steps below:
- Open Command/Anaconda prompt and enter: pip install databricks-cli
- Go to your Databricks console and under settings find "User Settings" and then "Access Tokens" to generate an access token, save this in a .txt or elsewhere
- Back to the command prompt and type: databricks configure --token and enter your Databricks host site (https://<host-name>.cloud.databricks.com/) as well as the access token you just generated as they are prompted to you by the command prompt
- You should have access but test it by trying to list the contents of the root directory by typing: databricks fs ls dbfs:/
- In command prompt, navigate to the directory of files that you want to transfer to Databricks
- Make a new directory in Databricks with the following example command: databricks fs mkdirs dbfs:/new_dir
- Copy over the files using the following command: databricks fs cp -r C:/Users/source_dir dbfs:/new_dir
This should put all your files into Databricks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2023 09:25 AM
Hi all,
I took an initial stab at task one with some success using the Databricks CLI. Here are the steps below:
- Open Command/Anaconda prompt and enter: pip install databricks-cli
- Go to your Databricks console and under settings find "User Settings" and then "Access Tokens" to generate an access token, save this in a .txt or elsewhere
- Back to the command prompt and type: databricks configure --token and enter your Databricks host site (https://<host-name>.cloud.databricks.com/) as well as the access token you just generated as they are prompted to you by the command prompt
- You should have access but test it by trying to list the contents of the root directory by typing: databricks fs ls dbfs:/
- In command prompt, navigate to the directory of files that you want to transfer to Databricks
- Make a new directory in Databricks with the following example command: databricks fs mkdirs dbfs:/new_dir
- Copy over the files using the following command: databricks fs cp -r C:/Users/source_dir dbfs:/new_dir
This should put all your files into Databricks.