<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Issues loading .txt files from DBFS into Langchain TextLoader() in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/issues-loading-txt-files-from-dbfs-into-langchain-textloader/m-p/4015#M176</link>
    <description>&lt;P&gt;Try using below. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;python components need prefix '/dbfs' in path. since you are using output of &lt;A href="https://dbutils.fs.ls" alt="https://dbutils.fs.ls" target="_blank"&gt;dbutils.fs.ls&lt;/A&gt; it will have prefix as 'dbfs:'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Replace &lt;B&gt;loader = TextLoader(i[0]) &lt;/B&gt; with &lt;B&gt;loader = TextLoader(i[0].replace('dbfs:','/dbfs'))&lt;/B&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 24 May 2023 17:55:26 GMT</pubDate>
    <dc:creator>venkatcrc</dc:creator>
    <dc:date>2023-05-24T17:55:26Z</dc:date>
    <item>
      <title>Issues loading .txt files from DBFS into Langchain TextLoader()</title>
      <link>https://community.databricks.com/t5/machine-learning/issues-loading-txt-files-from-dbfs-into-langchain-textloader/m-p/4014#M175</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am working on building a Langchain QA application in Databricks. I currently have 13 .txt files loaded into the DBFS and am trying to read them in iteratively with TextLoader(), load them into the RecursiveCharacterTextSplitter() from Langchain to chunk them and then add them to a Chroma Database. When running this from my local machine, there is no problem. But the application does not seem to accept files loaded from DBFS. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="guru_error"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/165i2EB7CED1BAB03368/image-size/large?v=v2&amp;amp;px=999" role="button" title="guru_error" alt="guru_error" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Screenshot 2023-05-19 171751"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/161i44BF991F8A154061/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2023-05-19 171751" alt="Screenshot 2023-05-19 171751" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have tried loading these in as string objects then loading them into the TextLoader() but that does not work either. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Has anyone found a workaround to this?&lt;/P&gt;</description>
      <pubDate>Wed, 24 May 2023 15:58:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/issues-loading-txt-files-from-dbfs-into-langchain-textloader/m-p/4014#M175</guid>
      <dc:creator>David_K93</dc:creator>
      <dc:date>2023-05-24T15:58:24Z</dc:date>
    </item>
    <item>
      <title>Re: Issues loading .txt files from DBFS into Langchain TextLoader()</title>
      <link>https://community.databricks.com/t5/machine-learning/issues-loading-txt-files-from-dbfs-into-langchain-textloader/m-p/4015#M176</link>
      <description>&lt;P&gt;Try using below. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;python components need prefix '/dbfs' in path. since you are using output of &lt;A href="https://dbutils.fs.ls" alt="https://dbutils.fs.ls" target="_blank"&gt;dbutils.fs.ls&lt;/A&gt; it will have prefix as 'dbfs:'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Replace &lt;B&gt;loader = TextLoader(i[0]) &lt;/B&gt; with &lt;B&gt;loader = TextLoader(i[0].replace('dbfs:','/dbfs'))&lt;/B&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 24 May 2023 17:55:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/issues-loading-txt-files-from-dbfs-into-langchain-textloader/m-p/4015#M176</guid>
      <dc:creator>venkatcrc</dc:creator>
      <dc:date>2023-05-24T17:55:26Z</dc:date>
    </item>
    <item>
      <title>Re: Issues loading .txt files from DBFS into Langchain TextLoader()</title>
      <link>https://community.databricks.com/t5/machine-learning/issues-loading-txt-files-from-dbfs-into-langchain-textloader/m-p/4016#M177</link>
      <description>&lt;P&gt;I ended up tinkering around and found I needed to use the os package to access it as a '/dbfs/' filepath:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;#Iterate through directory of docs, load, split then add to total list&lt;/P&gt;&lt;P&gt;txt_ls = []&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;for i in os.listdir(dir_ls):&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; filename = os.path.join(dir_ls, i)&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; loader = TextLoader(filename)&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; documents = loader.load()&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; texts = text_splitter.split_documents(documents)&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; txt_ls.append(texts)&lt;/P&gt;</description>
      <pubDate>Wed, 24 May 2023 17:57:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/issues-loading-txt-files-from-dbfs-into-langchain-textloader/m-p/4016#M177</guid>
      <dc:creator>David_K93</dc:creator>
      <dc:date>2023-05-24T17:57:52Z</dc:date>
    </item>
    <item>
      <title>Re: Issues loading .txt files from DBFS into Langchain TextLoader()</title>
      <link>https://community.databricks.com/t5/machine-learning/issues-loading-txt-files-from-dbfs-into-langchain-textloader/m-p/4017#M178</link>
      <description>&lt;P&gt;Hi @David Kersey​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 May 2023 00:33:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/issues-loading-txt-files-from-dbfs-into-langchain-textloader/m-p/4017#M178</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-05-29T00:33:29Z</dc:date>
    </item>
  </channel>
</rss>

