<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic dbdemo LLAM CHATBOT RAG in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/dbdemo-llam-chatbot-rag/m-p/71121#M3053</link>
    <description>&lt;P&gt;i have an issue when running the below code using the default dbdemos in the advanced preparation , i have reduced the&amp;nbsp;&lt;SPAN&gt;chunk_size and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;max_batch_size and running the code in a proper compute resources , could anyone help on that please :&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(spark.readStream.table(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;pdf_raw&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.withColumn(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, F.explode(read_as_chunk(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.withColumn(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, get_embedding(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.selectExpr(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;path as url&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.writeStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.trigger(&lt;/SPAN&gt;&lt;SPAN&gt;availableNow&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;checkpointLocation&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'dbfs:&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;volume_folder&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;/checkpoints/pdf_chunk_1'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.table(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;databricks_pdf_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;).awaitTermination())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;#Let's also add our documentation web page from the simple demo (make sure you run the quickstart demo first)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; table_exists(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;catalog&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;db&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.databricks_documentation'&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(spark.readStream.option(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;skipChangeCommits&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;true&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;).table(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;databricks_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;#skip changes for more stable demo&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.withColumn(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, get_embedding(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.select(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;url&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.writeStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.trigger(&lt;/SPAN&gt;&lt;SPAN&gt;availableNow&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;checkpointLocation&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'dbfs:&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;volume_folder&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;/checkpoints/docs_chunks'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.table(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;databricks_pdf_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;).awaitTermination())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;#.option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/raw_docs')&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;the issue is below :&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;terminated with exception: Job aborted due to stage failure: Task 0 in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage 37.0 (TID 75) (100.64.4.187 executor 0): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed): Fatal Python error: Segmentation fault Current thread 0x00007f08e830f280 (most recent call first): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1399 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 1007 in try_compile_fn File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1395 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 1007 in try_compile_fn File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 61 in _compile_and_register_class File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1375 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/object_detection/region_similarity_calculator.py", line 76 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/object_detection/__init__.py", line 21 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/anchors.py", line 34 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/efficientdet.py", line 23 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/__init__.py", line 1 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/effdet/layoutmodel.py", line 28 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/effdet/__init__.py", line 16 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1128 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/__init__.py", line 17 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1128 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1128 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/models/detectron2.py", line 5 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/models/base.py", line 5 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/inference/layout.py", line 27 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/ocr.py", line 17 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 75 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/auto.py", line 80 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/databricks/spark/python/pyspark/serializers.py", line 572 in loads ... Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, psutil._psutil_linux, psutil._psutil_posix, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.tslibs.strptime, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, regex._regex, tornado.speedups, lxml._elementpath, lxml.etree, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, greenlet._greenlet, yaml._yaml, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._statlib, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, sklearn.__check_build._check_build, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.utils._vector_sentinel, sklearn.feature_extraction._hashing_fast, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_fast, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils.arrayfuncs, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, sklearn.datasets._svmlight_format_fast, PIL._imaging, _cffi_backend, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, matplotlib._image, PIL._imagingft, google._upb._message (total: 246) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:589) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:574) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:123) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.python.BatchIterator.hasNext(ArrowEvalPythonExec.scala:42) at org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeNextBatchToArrowStream(PythonArrowInput.scala:160) at org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeNextBatchToArrowStream$(PythonArrowInput.scala:153) at org.apache.spark.sql.execution.python.BaseArrowPythonRunner.writeNextBatchToArrowStream(ArrowPythonRunner.scala:34) at org.apache.spark.sql.execution.python.PythonArrowInput$ArrowWriter.writeNextInputToStream(PythonArrowInput.scala:126) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.writeAdditionalInputToPythonWorker(PythonRunner.scala:815) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:735) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:104) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:539) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$3(FileFormatWriter.scala:363) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:190) at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:155) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103) at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108) at scala.util.Using$.resource(Using.scala:269) at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:149) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:101) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:986) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:106) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:989) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:877) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:724) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:104) ... 56 more Driver stacktrace: SQLSTATE: XXKST&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;&amp;lt;command-548824972635656&amp;gt;, line 8&lt;/SPAN&gt; &lt;SPAN&gt;1&lt;/SPAN&gt; (spark&lt;SPAN&gt;.&lt;/SPAN&gt;readStream&lt;SPAN&gt;.&lt;/SPAN&gt;table(&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;pdf_raw&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;) &lt;SPAN&gt;2&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;withColumn(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, F&lt;SPAN&gt;.&lt;/SPAN&gt;explode(read_as_chunk(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;))) &lt;SPAN&gt;3&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;withColumn(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, get_embedding(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)) &lt;SPAN&gt;4&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;selectExpr(&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;path as url&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;, &lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;, &lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;) &lt;SPAN&gt;5&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;writeStream &lt;SPAN&gt;6&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;trigger(availableNow&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;True&lt;/SPAN&gt;) &lt;SPAN&gt;7&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;option(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;checkpointLocation&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, &lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;dbfs:&lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;volume_folder&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;/checkpoints/pdf_chunk_1&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;) &lt;SPAN class=""&gt;----&amp;gt; 8&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;table(&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;databricks_pdf_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;awaitTermination()) &lt;SPAN&gt;10&lt;/SPAN&gt; &lt;SPAN&gt;#Let's also add our documentation web page from the simple demo (make sure you run the quickstart demo first)&lt;/SPAN&gt; &lt;SPAN&gt;11&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; table_exists(&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;catalog&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;db&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.databricks_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;HR /&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/errors/exceptions/captured.py:254&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;capture_sql_exception.&amp;lt;locals&amp;gt;.deco&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*a, **kw)&lt;/SPAN&gt; &lt;SPAN&gt;250&lt;/SPAN&gt; converted &lt;SPAN&gt;=&lt;/SPAN&gt; convert_exception(e&lt;SPAN&gt;.&lt;/SPAN&gt;java_exception) &lt;SPAN&gt;251&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; &lt;SPAN class=""&gt;not&lt;/SPAN&gt; &lt;SPAN&gt;isinstance&lt;/SPAN&gt;(converted, UnknownException): &lt;SPAN&gt;252&lt;/SPAN&gt; &lt;SPAN&gt;# Hide where the exception came from that shows a non-Pythonic&lt;/SPAN&gt; &lt;SPAN&gt;253&lt;/SPAN&gt; &lt;SPAN&gt;# JVM exception message.&lt;/SPAN&gt; &lt;SPAN class=""&gt;--&amp;gt; 254&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; converted &lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;None&lt;/SPAN&gt; &lt;SPAN&gt;255&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;:&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Fri, 31 May 2024 03:33:53 GMT</pubDate>
    <dc:creator>hossam</dc:creator>
    <dc:date>2024-05-31T03:33:53Z</dc:date>
    <item>
      <title>dbdemo LLAM CHATBOT RAG</title>
      <link>https://community.databricks.com/t5/get-started-discussions/dbdemo-llam-chatbot-rag/m-p/71121#M3053</link>
      <description>&lt;P&gt;i have an issue when running the below code using the default dbdemos in the advanced preparation , i have reduced the&amp;nbsp;&lt;SPAN&gt;chunk_size and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;max_batch_size and running the code in a proper compute resources , could anyone help on that please :&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(spark.readStream.table(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;pdf_raw&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.withColumn(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, F.explode(read_as_chunk(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.withColumn(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, get_embedding(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.selectExpr(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;path as url&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.writeStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.trigger(&lt;/SPAN&gt;&lt;SPAN&gt;availableNow&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;checkpointLocation&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'dbfs:&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;volume_folder&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;/checkpoints/pdf_chunk_1'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.table(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;databricks_pdf_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;).awaitTermination())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;#Let's also add our documentation web page from the simple demo (make sure you run the quickstart demo first)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;if&lt;/SPAN&gt;&lt;SPAN&gt; table_exists(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;catalog&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;db&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.databricks_documentation'&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;(spark.readStream.option(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;skipChangeCommits&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;true&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;).table(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;databricks_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;#skip changes for more stable demo&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.withColumn(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, get_embedding(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.select(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;url&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.writeStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.trigger(&lt;/SPAN&gt;&lt;SPAN&gt;availableNow&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.option(&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;checkpointLocation&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'dbfs:&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;volume_folder&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;/checkpoints/docs_chunks'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.table(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;databricks_pdf_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;).awaitTermination())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;#.option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/raw_docs')&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;the issue is below :&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;terminated with exception: Job aborted due to stage failure: Task 0 in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage 37.0 (TID 75) (100.64.4.187 executor 0): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed): Fatal Python error: Segmentation fault Current thread 0x00007f08e830f280 (most recent call first): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1399 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 1007 in try_compile_fn File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1395 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 1007 in try_compile_fn File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 61 in _compile_and_register_class File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1375 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/object_detection/region_similarity_calculator.py", line 76 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/object_detection/__init__.py", line 21 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/anchors.py", line 34 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/efficientdet.py", line 23 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/__init__.py", line 1 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/effdet/layoutmodel.py", line 28 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/effdet/__init__.py", line 16 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1128 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/__init__.py", line 17 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1128 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1128 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/models/detectron2.py", line 5 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/models/base.py", line 5 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/inference/layout.py", line 27 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/ocr.py", line 17 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 75 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/auto.py", line 80 in &amp;lt;module&amp;gt; File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 241 in _call_with_frames_removed File "&amp;lt;frozen importlib._bootstrap_external&amp;gt;", line 940 in exec_module File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 690 in _load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1149 in _find_and_load_unlocked File "&amp;lt;frozen importlib._bootstrap&amp;gt;", line 1178 in _find_and_load File "/databricks/spark/python/pyspark/serializers.py", line 572 in loads ... Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, psutil._psutil_linux, psutil._psutil_posix, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.tslibs.strptime, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, regex._regex, tornado.speedups, lxml._elementpath, lxml.etree, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, greenlet._greenlet, yaml._yaml, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._statlib, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, sklearn.__check_build._check_build, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.utils._vector_sentinel, sklearn.feature_extraction._hashing_fast, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_fast, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils.arrayfuncs, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, sklearn.datasets._svmlight_format_fast, PIL._imaging, _cffi_backend, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, matplotlib._image, PIL._imagingft, google._upb._message (total: 246) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:589) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:574) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:123) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.python.BatchIterator.hasNext(ArrowEvalPythonExec.scala:42) at org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeNextBatchToArrowStream(PythonArrowInput.scala:160) at org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeNextBatchToArrowStream$(PythonArrowInput.scala:153) at org.apache.spark.sql.execution.python.BaseArrowPythonRunner.writeNextBatchToArrowStream(ArrowPythonRunner.scala:34) at org.apache.spark.sql.execution.python.PythonArrowInput$ArrowWriter.writeNextInputToStream(PythonArrowInput.scala:126) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.writeAdditionalInputToPythonWorker(PythonRunner.scala:815) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:735) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:104) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:539) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$3(FileFormatWriter.scala:363) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:190) at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:155) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103) at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108) at scala.util.Using$.resource(Using.scala:269) at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:149) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:101) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:986) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:106) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:989) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:877) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:724) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:104) ... 56 more Driver stacktrace: SQLSTATE: XXKST&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;&amp;lt;command-548824972635656&amp;gt;, line 8&lt;/SPAN&gt; &lt;SPAN&gt;1&lt;/SPAN&gt; (spark&lt;SPAN&gt;.&lt;/SPAN&gt;readStream&lt;SPAN&gt;.&lt;/SPAN&gt;table(&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;pdf_raw&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;) &lt;SPAN&gt;2&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;withColumn(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, F&lt;SPAN&gt;.&lt;/SPAN&gt;explode(read_as_chunk(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;))) &lt;SPAN&gt;3&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;withColumn(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, get_embedding(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)) &lt;SPAN&gt;4&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;selectExpr(&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;path as url&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;, &lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;content&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;, &lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;embedding&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;) &lt;SPAN&gt;5&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;writeStream &lt;SPAN&gt;6&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;trigger(availableNow&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;True&lt;/SPAN&gt;) &lt;SPAN&gt;7&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;option(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;checkpointLocation&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;, &lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;dbfs:&lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;volume_folder&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;/checkpoints/pdf_chunk_1&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;) &lt;SPAN class=""&gt;----&amp;gt; 8&lt;/SPAN&gt; &lt;SPAN&gt;.&lt;/SPAN&gt;table(&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;databricks_pdf_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;awaitTermination()) &lt;SPAN&gt;10&lt;/SPAN&gt; &lt;SPAN&gt;#Let's also add our documentation web page from the simple demo (make sure you run the quickstart demo first)&lt;/SPAN&gt; &lt;SPAN&gt;11&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; table_exists(&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;catalog&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;db&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.databricks_documentation&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;HR /&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/spark/python/pyspark/errors/exceptions/captured.py:254&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;capture_sql_exception.&amp;lt;locals&amp;gt;.deco&lt;/SPAN&gt;&lt;SPAN class=""&gt;(*a, **kw)&lt;/SPAN&gt; &lt;SPAN&gt;250&lt;/SPAN&gt; converted &lt;SPAN&gt;=&lt;/SPAN&gt; convert_exception(e&lt;SPAN&gt;.&lt;/SPAN&gt;java_exception) &lt;SPAN&gt;251&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; &lt;SPAN class=""&gt;not&lt;/SPAN&gt; &lt;SPAN&gt;isinstance&lt;/SPAN&gt;(converted, UnknownException): &lt;SPAN&gt;252&lt;/SPAN&gt; &lt;SPAN&gt;# Hide where the exception came from that shows a non-Pythonic&lt;/SPAN&gt; &lt;SPAN&gt;253&lt;/SPAN&gt; &lt;SPAN&gt;# JVM exception message.&lt;/SPAN&gt; &lt;SPAN class=""&gt;--&amp;gt; 254&lt;/SPAN&gt; &lt;SPAN class=""&gt;raise&lt;/SPAN&gt; converted &lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;None&lt;/SPAN&gt; &lt;SPAN&gt;255&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;:&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 31 May 2024 03:33:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/dbdemo-llam-chatbot-rag/m-p/71121#M3053</guid>
      <dc:creator>hossam</dc:creator>
      <dc:date>2024-05-31T03:33:53Z</dc:date>
    </item>
  </channel>
</rss>

