cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

dbdemo LLAM CHATBOT RAG

hossam
New Contributor

i have an issue when running the below code using the default dbdemos in the advanced preparation , i have reduced the chunk_size and max_batch_size and running the code in a proper compute resources , could anyone help on that please :

(spark.readStream.table('pdf_raw')
.withColumn("content", F.explode(read_as_chunk("content")))
.withColumn("embedding", get_embedding("content"))
.selectExpr('path as url', 'content', 'embedding')
.writeStream
.trigger(availableNow=True)
.option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/pdf_chunk_1')
.table('databricks_pdf_documentation').awaitTermination())

#Let's also add our documentation web page from the simple demo (make sure you run the quickstart demo first)
if table_exists(f'{catalog}.{db}.databricks_documentation'😞
(spark.readStream.option("skipChangeCommits", "true").table('databricks_documentation') #skip changes for more stable demo
.withColumn('embedding', get_embedding("content"))
.select('url', 'content', 'embedding')
.writeStream
.trigger(availableNow=True)
.option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/docs_chunks')
.table('databricks_pdf_documentation').awaitTermination())
 

#.option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/raw_docs')
 
the issue is below :
terminated with exception: Job aborted due to stage failure: Task 0 in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage 37.0 (TID 75) (100.64.4.187 executor 0): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed): Fatal Python error: Segmentation fault Current thread 0x00007f08e830f280 (most recent call first): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1399 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 1007 in try_compile_fn File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1395 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 1007 in try_compile_fn File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 61 in _compile_and_register_class File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1375 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/object_detection/region_similarity_calculator.py", line 76 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/object_detection/__init__.py", line 21 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/anchors.py", line 34 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/efficientdet.py", line 23 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/__init__.py", line 1 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/effdet/layoutmodel.py", line 28 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/effdet/__init__.py", line 16 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1128 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/__init__.py", line 17 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1128 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1128 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/models/detectron2.py", line 5 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/models/base.py", line 5 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/inference/layout.py", line 27 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/ocr.py", line 17 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 75 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/auto.py", line 80 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/databricks/spark/python/pyspark/serializers.py", line 572 in loads ... Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, psutil._psutil_linux, psutil._psutil_posix, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.tslibs.strptime, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, regex._regex, tornado.speedups, lxml._elementpath, lxml.etree, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, greenlet._greenlet, yaml._yaml, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._statlib, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, sklearn.__check_build._check_build, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.utils._vector_sentinel, sklearn.feature_extraction._hashing_fast, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_fast, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils.arrayfuncs, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, sklearn.datasets._svmlight_format_fast, PIL._imaging, _cffi_backend, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, matplotlib._image, PIL._imagingft, google._upb._message (total: 246) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:589) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:574) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:123) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.python.BatchIterator.hasNext(ArrowEvalPythonExec.scala:42) at org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeNextBatchToArrowStream(PythonArrowInput.scala:160) at org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeNextBatchToArrowStream$(PythonArrowInput.scala:153) at org.apache.spark.sql.execution.python.BaseArrowPythonRunner.writeNextBatchToArrowStream(ArrowPythonRunner.scala:34) at org.apache.spark.sql.execution.python.PythonArrowInput$ArrowWriter.writeNextInputToStream(PythonArrowInput.scala:126) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.writeAdditionalInputToPythonWorker(PythonRunner.scala:815) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:735) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:104) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:539) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$3(FileFormatWriter.scala:363) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:190) at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:155) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103) at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108) at scala.util.Using$.resource(Using.scala:269) at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:149) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:101) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:986) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:106) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:989) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:877) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:724) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:104) ... 56 more Driver stacktrace: SQLSTATE: XXKST
File <command-548824972635656>, line 8 1 (spark.readStream.table('pdf_raw') 2 .withColumn("content", F.explode(read_as_chunk("content"))) 3 .withColumn("embedding", get_embedding("content")) 4 .selectExpr('path as url', 'content', 'embedding') 5 .writeStream 6 .trigger(availableNow=True) 7 .option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/pdf_chunk_1') ----> 8 .table('databricks_pdf_documentation').awaitTermination()) 10 #Let's also add our documentation web page from the simple demo (make sure you run the quickstart demo first) 11 if table_exists(f'{catalog}.{db}.databricks_documentation'😞

File /databricks/spark/python/pyspark/errors/exceptions/captured.py:254, in capture_sql_exception.<locals>.deco(*a, **kw) 250 converted = convert_exception(e.java_exception) 251 if not isinstance(converted, UnknownException): 252 # Hide where the exception came from that shows a non-Pythonic 253 # JVM exception message. --> 254 raise converted from None 255 else:
1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @hossam

  • Enable detailed logging to get more information about the failure. Look for additional error messages or stack traces.
  • If possible, run the code in a local development environment (outside Databricks) to isolate the issue.
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!