<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: databricks-connect 13.1.0 limitations in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/49760#M1632</link>
    <description>&lt;P&gt;Seemingly our issue doesn't contain a "larger than max" error. Out of curiosity I tried to double the maxRecordsPerBatch to 20000 and reduced it to 200 and it didn't appear to help.&lt;/P&gt;&lt;P&gt;```&lt;BR /&gt;Fatal Python error: Segmentation fault&lt;/P&gt;&lt;P&gt;Thread 0x0000ffff7bfff120 (most recent call first):&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/client/core.py", line 1537 in _ping_handler&lt;BR /&gt;File "/usr/lib/python3.10/threading.py", line 953 in run&lt;BR /&gt;File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner&lt;BR /&gt;File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap&lt;/P&gt;&lt;P&gt;Current thread 0x0000ffffb5415420 (most recent call first):&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/google/protobuf/message.py", line 126 in CopyFrom&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 524 in plan&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 608 in plan&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 799 in plan&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 702 in plan&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 118 in to_proto&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/dataframe.py", line 1654 in toPandas&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/clustering.py", line 138 in run&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/clustering.py", line 195 in _run_stdbscan&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/clustering.py", line 243 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/clustering.py", line 292 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/outages/outage_estimation.py", line 74 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/overrides/sierra_leone/outage_estimation.py", line 23 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/kpis/outage.py", line 551 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/kpis/outage.py", line 168 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "&amp;lt;stdin&amp;gt;", line 1 in &amp;lt;module&amp;gt;&lt;/P&gt;&lt;P&gt;Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, psutil._psutil_linux, psutil._psutil_posix, charset_normalizer.md, grpc._cython.cygrpc, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, google._upb._message, yaml._yaml, shapely.lib, shapely._geos, shapely._geometry_helpers, pyproj._compat, pyproj._datadir, pyproj._network, pyproj._geod, pyproj.list, pyproj._crs, pyproj.database, pyproj._transformer, pyproj._sync, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation (total: 110)&lt;BR /&gt;/usr/local/bin/pyspark: line 60: 232 Segmentation fault $PYSPARK_DRIVER_PYTHON&lt;BR /&gt;```&lt;/P&gt;</description>
    <pubDate>Mon, 23 Oct 2023 19:46:05 GMT</pubDate>
    <dc:creator>jackson-nline</dc:creator>
    <dc:date>2023-10-23T19:46:05Z</dc:date>
    <item>
      <title>databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/37096#M405</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Quite excited to see the new release of databricks-connect, I started writing unit tests running pyspark on a databricks cluster using databricks-connect.&lt;/P&gt;&lt;P&gt;After some successful basic unit tests, I tested just more chained transformations on a dataframe including some forward fills, simple arithmetics, linear regressions slopes calculations via pandas udf. Nothing fancy. Then when running a test, I got the following error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;E           pyspark.errors.exceptions.connect.SparkConnectGrpcException: &amp;lt;_MultiThreadedRendezvous of RPC that terminated with:
E           	status = StatusCode.UNKNOWN
E           	details = ""
E           	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"", grpc_status:2, created_time:"2023-07-06T13:29:00.033340701+00:00"}"&lt;/LI-CODE&gt;&lt;P&gt;I do not get this error when I remove one simple column (a constant literal), and I do not get this error either if I run the same code on Databricks directly.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The error seems to point to grpc and a limitation of databricks-connect. Has anyone encountered this, and is there a place where we can check what current limitations of databricks-connect are?&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jul 2023 14:24:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/37096#M405</guid>
      <dc:creator>thibault</dc:creator>
      <dc:date>2023-07-06T14:24:53Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/37099#M406</link>
      <description>&lt;P&gt;Often times just writing a question helps resolve it. For anyone facing issues with databricks-connect that don't show up using databricks directly, here is a list of limitations (rtfm to me):&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect#limitations" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect#limitations&lt;/A&gt;&lt;/P&gt;&lt;P&gt;And in particular watch for the size of the dataframe. databricks-connect doesn't support dataframes larger than 128 MB, which is not much. Hopefully next releases will allow larger dataframes.&lt;/P&gt;&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jul 2023 14:56:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/37099#M406</guid>
      <dc:creator>thibault</dc:creator>
      <dc:date>2023-07-06T14:56:48Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/37101#M408</link>
      <description>&lt;P&gt;Well my bad, I thought this was the issue, but eventually I reduced the number of rows so that the size became less than 1 MB, and it still failed with the same error, so I still don't know why this fails with databricks-connect, and I have checked that all used spark functions support spark connect.&amp;nbsp;&lt;/P&gt;&lt;P&gt;So if anyone has any idea, thanks for sharing.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jul 2023 15:56:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/37101#M408</guid>
      <dc:creator>thibault</dc:creator>
      <dc:date>2023-07-06T15:56:25Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/39839#M714</link>
      <description>&lt;P&gt;To add a bit more here I don't even this that the 128MB limit is really a limit. You can set "&lt;SPAN class=""&gt;spark&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;connect&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;grpc&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;maxInboundMessageSize" to a larger value and also override the default limit on the client side by using a custom gRPC ChannelBuilder.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2023 10:30:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/39839#M714</guid>
      <dc:creator>jrand</dc:creator>
      <dc:date>2023-08-14T10:30:27Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/41696#M811</link>
      <description>&lt;P&gt;I have the same issue, but with &lt;STRONG&gt;databricks-connect==13.2.1&lt;/STRONG&gt;&lt;BR /&gt;Code to reproduce:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;col_num = 49
data = [tuple([f'val_{n}' for n in range(1, col_num + 1)])]
df = spark.createDataFrame(data=data)
for i in range(1, len(data[0]) + 1):
    df = df.withColumnRenamed(f'_{i}', f'col_{i}')
df.printSchema()&lt;/LI-CODE&gt;&lt;P&gt;The error is a bit different though:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;pyspark.errors.exceptions.connect.SparkConnectGrpcException: &amp;lt;_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = ""
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2023-08-26T12:06:04.41660107+02:00", grpc_status:2, grpc_message:""}"
&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;And if col_num == 48, everytning is fine. If col_num &amp;gt; 49, I will get a segfault without any error message.&lt;/P&gt;</description>
      <pubDate>Sat, 26 Aug 2023 10:28:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/41696#M811</guid>
      <dc:creator>safonov</dc:creator>
      <dc:date>2023-08-26T10:28:07Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/41702#M812</link>
      <description>&lt;P&gt;Looks like it ugly hits some limit of nested messages in protobuf.&lt;BR /&gt;&lt;BR /&gt;Because this script generates a flat plan (as opposed to my example in the previous message) and works fine:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;col_num = 1000
data = [tuple([f'val_{n}' for n in range(1, col_num + 1)])]
df = spark.createDataFrame(data=data)
collumns_renaming = []
for i in range(1, len(data[0]) + 1):
    collumns_renaming.append(
        f.col(f'_{i}').alias(f'col_{i}')
    )
df = df.select(*collumns_renaming)
df.printSchema()&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;But databricks-connect could handle it somehow or prevent segfaults, I suppose. And documented limitations would be nice.&lt;/P&gt;</description>
      <pubDate>Sat, 26 Aug 2023 11:51:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/41702#M812</guid>
      <dc:creator>safonov</dc:creator>
      <dc:date>2023-08-26T11:51:20Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/44712#M1073</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/86419"&gt;@jrand&lt;/a&gt;&amp;nbsp;, can you shine some light as to where and how you set this setting?&lt;BR /&gt;I'm hitting the same issue right when we were started to get excited about databricks-connect.&lt;BR /&gt;i can see the setting in the spark-connect documentation but not in the databricks-connect one, i'm unsure as to where i can override that setting.&lt;BR /&gt;&lt;BR /&gt;i'm on 13.3.0&lt;BR /&gt;&lt;BR /&gt;thanks in advance,&lt;BR /&gt;dsa&lt;/P&gt;</description>
      <pubDate>Thu, 14 Sep 2023 07:27:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/44712#M1073</guid>
      <dc:creator>dsa</dc:creator>
      <dc:date>2023-09-14T07:27:59Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/49417#M1592</link>
      <description>&lt;P&gt;We are having what appears to be the same segfault issue when running some of our larger chained functions (likely with quite large plans). However, we can reliably trigger this by looping over `withColumns` to increase the plan size. Our case is also a segfault and ends up in the protobuf library as well. Has anyone found success with increasing `spark.connect.grpc.maxInboundMessageSize`, besides refactoring to flatten the plan as much as possible.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Oct 2023 19:53:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/49417#M1592</guid>
      <dc:creator>jackson-nline</dc:creator>
      <dc:date>2023-10-17T19:53:22Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/49630#M1621</link>
      <description>&lt;P&gt;I doubled the `s&lt;SPAN class=""&gt;park&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;connect&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;grpc&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;maxInboundMessageSize&lt;/SPAN&gt;` parameter to 256mb but that didn't appear to resolve anything.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Oct 2023 18:12:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/49630#M1621</guid>
      <dc:creator>jackson-nline</dc:creator>
      <dc:date>2023-10-20T18:12:20Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/49715#M1627</link>
      <description>&lt;P&gt;We've got the following info from databricks support which might be of interest for you:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1. &lt;U&gt;“Received message larger than max” error for files with row size smaller than 128MB&lt;/U&gt; &lt;/STRONG&gt;&lt;SPAN&gt;(For example parquet file you provided)&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;We were able to mitigate this issue by adjusting maxRecordsPerBatch.&lt;/LI&gt;&lt;LI&gt;The engineering team has prepared and merged a fix which should resolve this issue.&lt;/LI&gt;&lt;LI&gt;After the fix is deployed, any file containing&lt;STRONG&gt; a row size &lt;/STRONG&gt;smaller than 128 MB will not receive this error.&lt;/LI&gt;&lt;LI&gt;This fix will be part of our next maintenance release (The next tentative maintenance is scheduled between the 23rd to 29th Oct in multiple stages. Meanwhile, you can use &lt;STRONG&gt;maxRecordsPerBatch&lt;/STRONG&gt; config as a mitigation.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;2. &lt;U&gt;“Received message larger than max” error for files with row size larger than 128 MB.&lt;/U&gt;&lt;/STRONG&gt;&lt;SPAN&gt; (For example, the binary file where rows are &amp;gt; 128 MB)&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;We do not officially support rows larger than 128 MB.&lt;/LI&gt;&lt;LI&gt;This request has been taken as a feature request and has been added to our engineering team's backlog for the next quarter to decide if we can allow the user to change GRPC_DEFAULT_OPTIONS&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So as far as i understand it - you can't change the GRPC options when using databricks connect currently.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2023 07:08:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/49715#M1627</guid>
      <dc:creator>dsa</dc:creator>
      <dc:date>2023-10-23T07:08:37Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/49760#M1632</link>
      <description>&lt;P&gt;Seemingly our issue doesn't contain a "larger than max" error. Out of curiosity I tried to double the maxRecordsPerBatch to 20000 and reduced it to 200 and it didn't appear to help.&lt;/P&gt;&lt;P&gt;```&lt;BR /&gt;Fatal Python error: Segmentation fault&lt;/P&gt;&lt;P&gt;Thread 0x0000ffff7bfff120 (most recent call first):&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/client/core.py", line 1537 in _ping_handler&lt;BR /&gt;File "/usr/lib/python3.10/threading.py", line 953 in run&lt;BR /&gt;File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner&lt;BR /&gt;File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap&lt;/P&gt;&lt;P&gt;Current thread 0x0000ffffb5415420 (most recent call first):&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/google/protobuf/message.py", line 126 in CopyFrom&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 524 in plan&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 608 in plan&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 799 in plan&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 702 in plan&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/plan.py", line 118 in to_proto&lt;BR /&gt;File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/connect/dataframe.py", line 1654 in toPandas&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/clustering.py", line 138 in run&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/clustering.py", line 195 in _run_stdbscan&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/clustering.py", line 243 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/clustering.py", line 292 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/analysis/outages/outage_estimation.py", line 74 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/overrides/sierra_leone/outage_estimation.py", line 23 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/kpis/outage.py", line 551 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/kpis/outage.py", line 168 in process&lt;BR /&gt;File "/home/powerwatch-data-analysis/pwdata/pipeline/core/pipeline.py", line 184 in __new__&lt;BR /&gt;File "&amp;lt;stdin&amp;gt;", line 1 in &amp;lt;module&amp;gt;&lt;/P&gt;&lt;P&gt;Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, psutil._psutil_linux, psutil._psutil_posix, charset_normalizer.md, grpc._cython.cygrpc, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, google._upb._message, yaml._yaml, shapely.lib, shapely._geos, shapely._geometry_helpers, pyproj._compat, pyproj._datadir, pyproj._network, pyproj._geod, pyproj.list, pyproj._crs, pyproj.database, pyproj._transformer, pyproj._sync, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.linalg._flinalg, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation (total: 110)&lt;BR /&gt;/usr/local/bin/pyspark: line 60: 232 Segmentation fault $PYSPARK_DRIVER_PYTHON&lt;BR /&gt;```&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2023 19:46:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/49760#M1632</guid>
      <dc:creator>jackson-nline</dc:creator>
      <dc:date>2023-10-23T19:46:05Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect 13.1.0 limitations</title>
      <link>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/58106#M2311</link>
      <description>&lt;P&gt;This is the PR that introduced the configurable limit: &lt;A href="https://github.com/apache/spark/pull/40447/files" target="_blank"&gt;https://github.com/apache/spark/pull/40447/files&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jan 2024 11:10:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/databricks-connect-13-1-0-limitations/m-p/58106#M2311</guid>
      <dc:creator>jrand</dc:creator>
      <dc:date>2024-01-22T11:10:57Z</dc:date>
    </item>
  </channel>
</rss>

