cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

dbdemo LLAM CHATBOT RAG

hossam
New Contributor

i have an issue when running the below code using the default dbdemos in the advanced preparation , i have reduced the chunk_size and max_batch_size and running the code in a proper compute resources , could anyone help on that please :

(spark.readStream.table('pdf_raw')
.withColumn("content", F.explode(read_as_chunk("content")))
.withColumn("embedding", get_embedding("content"))
.selectExpr('path as url', 'content', 'embedding')
.writeStream
.trigger(availableNow=True)
.option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/pdf_chunk_1')
.table('databricks_pdf_documentation').awaitTermination())

#Let's also add our documentation web page from the simple demo (make sure you run the quickstart demo first)
if table_exists(f'{catalog}.{db}.databricks_documentation'😞
(spark.readStream.option("skipChangeCommits", "true").table('databricks_documentation') #skip changes for more stable demo
.withColumn('embedding', get_embedding("content"))
.select('url', 'content', 'embedding')
.writeStream
.trigger(availableNow=True)
.option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/docs_chunks')
.table('databricks_pdf_documentation').awaitTermination())
 

#.option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/raw_docs')
 
the issue is below :
terminated with exception: Job aborted due to stage failure: Task 0 in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage 37.0 (TID 75) (100.64.4.187 executor 0): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed): Fatal Python error: Segmentation fault Current thread 0x00007f08e830f280 (most recent call first): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1399 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 1007 in try_compile_fn File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1395 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 1007 in try_compile_fn File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_recursive.py", line 61 in _compile_and_register_class File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/torch/jit/_script.py", line 1375 in script File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/object_detection/region_similarity_calculator.py", line 76 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/object_detection/__init__.py", line 21 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/anchors.py", line 34 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/efficientdet.py", line 23 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/effdet/__init__.py", line 1 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/effdet/layoutmodel.py", line 28 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/effdet/__init__.py", line 16 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1128 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/layoutparser/models/__init__.py", line 17 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1128 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1128 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/models/detectron2.py", line 5 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/models/base.py", line 5 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured_inference/inference/layout.py", line 27 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/ocr.py", line 17 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 75 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-c3e45618-a396-4dc5-a497-80956ab4f6fd/lib/python3.11/site-packages/unstructured/partition/auto.py", line 80 in <module> File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 940 in exec_module File "<frozen importlib._bootstrap>", line 690 in _load_unlocked File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1178 in _find_and_load File "/databricks/spark/python/pyspark/serializers.py", line 572 in loads ... Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, psutil._psutil_linux, psutil._psutil_posix, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.tslibs.strptime, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, regex._regex, tornado.speedups, lxml._elementpath, lxml.etree, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, greenlet._greenlet, yaml._yaml, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._statlib, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, sklearn.__check_build._check_build, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.utils._vector_sentinel, sklearn.feature_extraction._hashing_fast, sklearn.utils._random, sklearn.utils._seq_dataset, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_fast, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils.arrayfuncs, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, sklearn.datasets._svmlight_format_fast, PIL._imaging, _cffi_backend, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, matplotlib._c_internal_utils, matplotlib._path, kiwisolver._cext, matplotlib._image, PIL._imagingft, google._upb._message (total: 246) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:589) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:574) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:123) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.python.BatchIterator.hasNext(ArrowEvalPythonExec.scala:42) at org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeNextBatchToArrowStream(PythonArrowInput.scala:160) at org.apache.spark.sql.execution.python.BasicPythonArrowInput.writeNextBatchToArrowStream$(PythonArrowInput.scala:153) at org.apache.spark.sql.execution.python.BaseArrowPythonRunner.writeNextBatchToArrowStream(ArrowPythonRunner.scala:34) at org.apache.spark.sql.execution.python.PythonArrowInput$ArrowWriter.writeNextInputToStream(PythonArrowInput.scala:126) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.writeAdditionalInputToPythonWorker(PythonRunner.scala:815) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:735) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:104) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:539) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$3(FileFormatWriter.scala:363) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:190) at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:155) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103) at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108) at scala.util.Using$.resource(Using.scala:269) at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:149) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:101) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:986) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:106) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:989) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:877) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.spark.api.python.BasePythonRunner$ReaderInputStream.read(PythonRunner.scala:724) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:104) ... 56 more Driver stacktrace: SQLSTATE: XXKST
File <command-548824972635656>, line 8 1 (spark.readStream.table('pdf_raw') 2 .withColumn("content", F.explode(read_as_chunk("content"))) 3 .withColumn("embedding", get_embedding("content")) 4 .selectExpr('path as url', 'content', 'embedding') 5 .writeStream 6 .trigger(availableNow=True) 7 .option("checkpointLocation", f'dbfs:{volume_folder}/checkpoints/pdf_chunk_1') ----> 8 .table('databricks_pdf_documentation').awaitTermination()) 10 #Let's also add our documentation web page from the simple demo (make sure you run the quickstart demo first) 11 if table_exists(f'{catalog}.{db}.databricks_documentation'😞

File /databricks/spark/python/pyspark/errors/exceptions/captured.py:254, in capture_sql_exception.<locals>.deco(*a, **kw) 250 converted = convert_exception(e.java_exception) 251 if not isinstance(converted, UnknownException): 252 # Hide where the exception came from that shows a non-Pythonic 253 # JVM exception message. --> 254 raise converted from None 255 else:
0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group