<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Directories added to the Python sys.path do not always work fine on executors for shared access in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130550#M48825</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/14792"&gt;@-werners-&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;The reason why I am keen to use shared mode is due to some imposed templates prepared by the DevOps team. I don't have full control on the clusters or access mode I can use.&lt;BR /&gt;There is a solution proposed by&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp; and I will see how it works in my case.&lt;/P&gt;</description>
    <pubDate>Tue, 02 Sep 2025 20:17:28 GMT</pubDate>
    <dc:creator>Yousry_Ibrahim</dc:creator>
    <dc:date>2025-09-02T20:17:28Z</dc:date>
    <item>
      <title>Directories added to the Python sys.path do not always work fine on executors for shared access mod</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130438#M48795</link>
      <description>&lt;P&gt;Let's assume we have a workspace folder containing two Python files.&lt;/P&gt;&lt;P&gt;module1 with a simple addition function:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def add_numbers(a, b):
  return a + b&lt;/LI-CODE&gt;&lt;P&gt;module2 with a dummy PySpark custom data source:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql.datasource import DataSource, DataSourceReader, InputPartition

class DummyDataSourceReader(DataSourceReader):
    def __init__(self, schema, options):
        self.schema = schema
        self.options = options

    def partitions(self):
        partitions = [InputPartition(x) for x in ["a", "b", "c"]]
        return partitions

    def read(self, partition: InputPartition):
        for row in range(5):
            item = [row, partition.value]
            yield tuple(item)

class DummyDataSource(DataSource):
    def __init__(self, options):
        self.options = options

    @classmethod
    def name(cls):
        return "dummy"

    def schema(self):
        self.options["schema"] = "value int, group string"
        return self.options["schema"]
    
    def reader(self, schema):
        return DummyDataSourceReader(schema, self.options)&lt;/LI-CODE&gt;&lt;P&gt;Now let's create a fresh notebook and connect it to a&lt;STRONG&gt; multi-node&lt;/STRONG&gt; cluster on DBR 16.t with single-user (dedicated) access mode.&lt;/P&gt;&lt;P&gt;First step is to add the location of the two modules to the Python sys path.&lt;BR /&gt;Then consuming code from module1 one via a UDF works fine.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Yousry_Ibrahim_3-1756774969049.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19557i313ACF03E1145483/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Yousry_Ibrahim_3-1756774969049.png" alt="Yousry_Ibrahim_3-1756774969049.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The custom data source works fine as well.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Yousry_Ibrahim_1-1756774189101.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19555i03ACE739804852C4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Yousry_Ibrahim_1-1756774189101.png" alt="Yousry_Ibrahim_1-1756774189101.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Let's switch gears and change the cluster to standard access mode (i.e. shared).&lt;/P&gt;&lt;P&gt;The UDF still works fine but the custom data source breaks with a module not found error.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Yousry_Ibrahim_2-1756774813091.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19556i7A55D35FD032C799/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Yousry_Ibrahim_2-1756774813091.png" alt="Yousry_Ibrahim_2-1756774813091.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I will leave the whole stack trace at the end of this message.&lt;/P&gt;&lt;P&gt;The &lt;A href="https://docs.databricks.com/aws/en/files/workspace-modules" target="_blank" rel="noopener"&gt;docs&lt;/A&gt; have this:&lt;BR /&gt;&lt;U&gt;&lt;EM&gt;In&amp;nbsp;Databricks Runtime&amp;nbsp;13.3 LTS and above, directories added to the Python&amp;nbsp;sys.path, or directories that are structured as&amp;nbsp;&lt;A href="https://docs.python.org/3/tutorial/modules.html#packages" target="_blank" rel="noopener noreferrer"&gt;Python packages&lt;/A&gt;, are automatically distributed to all executors in the cluster. In&amp;nbsp;Databricks Runtime&amp;nbsp;12.2 LTS and below, libraries added to the&amp;nbsp;sys.path&amp;nbsp;must be explicitly installed on executors.&lt;/EM&gt;&lt;/U&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I am using DBR 16.4 LTS hence expect any modules registered via sys.path to be available across all executors and this is what happens with the UDF case. But unfortuantley it does not work well with the PySpark custom data source at least in shared access mode.&lt;BR /&gt;&lt;BR /&gt;Is there anything wrong/missing I am doing or is there a way to make things work in shared access mode.&lt;BR /&gt;P.S. Due to some other factors, I cannot use DABs or build my code into wheels, etc. Prefer to have the clean OOTB behaviour that is found with the UDF.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Stack trace:&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;[&lt;A class="" href="https://docs.databricks.com/error-messages/error-classes.html#python_data_source_error" target="_blank" rel="noopener noreferrer"&gt;PYTHON_DATA_SOURCE_ERROR&lt;/A&gt;] Failed to create Python data source instance: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length return self.loads(obj) ^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads return cloudpickle.loads(obj, encoding=encoding) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'module2' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/databricks/spark/python/pyspark/sql/worker/create_data_source.py", line 86, in main data_source_cls = read_command(pickleSer, infile) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/worker_util.py", line 71, in read_command command = serializer._read_with_length(file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/serializers.py", line 196, in _read_with_length raise SerializationError("Caused by " + traceback.format_exc()) pyspark.serializers.SerializationError: Caused by Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length return self.loads(obj) ^^^^^^^^^^^^^^^ File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads return cloudpickle.loads(obj, encoding=encoding) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'module2' SQLSTATE: 38000 JVM stacktrace: org.apache.spark.sql.AnalysisException at org.apache.spark.sql.errors.QueryCompilationErrors$.pythonDataSourceError(QueryCompilationErrors.scala:2972) at org.apache.spark.sql.execution.datasources.v2.python.UserDefinedPythonDataSourceRunner.receiveFromPython(UserDefinedPythonDataSource.scala:322) at org.apache.spark.sql.execution.datasources.v2.python.UserDefinedPythonDataSourceRunner.receiveFromPython(UserDefinedPythonDataSource.scala:287) at org.apache.spark.sql.execution.python.PythonPlannerRunner.runInPython(PythonPlannerRunner.scala:201) at org.apache.spark.sql.execution.datasources.v2.python.UserDefinedPythonDataSource.createDataSourceInPython(UserDefinedPythonDataSource.scala:72) at org.apache.spark.sql.execution.datasources.v2.python.PythonDataSourceV2.getOrCreateDataSourceInPython(PythonDataSourceV2.scala:50) at org.apache.spark.sql.execution.datasources.v2.python.PythonDataSourceV2.inferSchema(PythonDataSourceV2.scala:60) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:104) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:250) at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.$anonfun$applyOrElse$1(ResolveDataSource.scala:96) at scala.Option.flatMap(Option.scala:271) at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.applyOrElse(ResolveDataSource.scala:94) at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.applyOrElse(ResolveDataSource.scala:58) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:141) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:85) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:141) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:418) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:137) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:133) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:42) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:114) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:113) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:42) at org.apache.spark.sql.catalyst.analysis.ResolveDataSource.apply(ResolveDataSource.scala:58) at org.apache.spark.sql.catalyst.analysis.ResolveDataSource.apply(ResolveDataSource.scala:56) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$16(RuleExecutor.scala:480) at org.apache.spark.sql.catalyst.rules.RecoverableRuleExecutionHelper.processRule(RuleExecutor.scala:629) at org.apache.spark.sql.catalyst.rules.RecoverableRuleExecutionHelper.processRule$(RuleExecutor.scala:613) at org.apache.spark.sql.catalyst.rules.RuleExecutor.processRule(RuleExecutor.scala:131) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$15(RuleExecutor.scala:480) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$14(RuleExecutor.scala:479) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:91) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$13(RuleExecutor.scala:475) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:452) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$22(RuleExecutor.scala:585) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$22$adapted(RuleExecutor.scala:585) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:585) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:349) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeSameContext(Analyzer.scala:507) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:500) at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:406) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:500) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:425) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:341) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:219) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:341) at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.resolveInFixedPoint(HybridAnalyzer.scala:252) at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.$anonfun$apply$1(HybridAnalyzer.scala:96) at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.withTrackedAnalyzerBridgeState(HybridAnalyzer.scala:131) at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.apply(HybridAnalyzer.scala:87) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:487) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:425) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:487) at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyAnalyzed$3(QueryExecution.scala:308) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:548) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$6(QueryExecution.scala:703) at org.apache.spark.sql.execution.SQLExecution$.withExecutionPhase(SQLExecution.scala:152) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$5(QueryExecution.scala:703) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1342) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:696) at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:692) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1462) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:692) at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyAnalyzed$2(QueryExecution.scala:295) at com.databricks.sql.util.MemoryTrackerHelper.withMemoryTracking(MemoryTrackerHelper.scala:80) at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyAnalyzed$1(QueryExecution.scala:294) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.Utils$.doTryWithCallerStacktrace(Utils.scala:1684) at org.apache.spark.util.Utils$.getTryWithCallerStacktrace(Utils.scala:1745) at org.apache.spark.util.LazyTry.get(LazyTry.scala:58) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:340) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:274) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1462) at org.apache.spark.sql.SparkSession.$anonfun$withActiveAndFrameProfiler$1(SparkSession.scala:1469) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at org.apache.spark.sql.SparkSession.withActiveAndFrameProfiler(SparkSession.scala:1469) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:106) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:265) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:224) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformReadRel(SparkConnectPlanner.scala:1811) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformRelation$1(SparkConnectPlanner.scala:190) at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$8(SessionHolder.scala:619) at org.apache.spark.sql.connect.service.SessionHolder.measureSubtreeRelationNodes(SessionHolder.scala:635) at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$6(SessionHolder.scala:618) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.connect.service.SessionHolder.usePlanCache(SessionHolder.scala:616) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:185) at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.transformRelation$1(SparkConnectAnalyzeHandler.scala:119) at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.process(SparkConnectAnalyzeHandler.scala:213) at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.$anonfun$handle$3(SparkConnectAnalyzeHandler.scala:104) at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.$anonfun$handle$3$adapted(SparkConnectAnalyzeHandler.scala:66) at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:464) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1462) at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:464) at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:97) at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:90) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:241) at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:89) at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:463) at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.$anonfun$handle$1(SparkConnectAnalyzeHandler.scala:66) at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.$anonfun$handle$1$adapted(SparkConnectAnalyzeHandler.scala:51) at com.databricks.spark.connect.logging.rpc.SparkConnectRpcMetricsCollectorUtils$.collectMetrics(SparkConnectRpcMetricsCollector.scala:258) at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.handle(SparkConnectAnalyzeHandler.scala:50) at org.apache.spark.sql.connect.service.SparkConnectService.analyzePlan(SparkConnectService.scala:109) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:801) at grpc_shaded.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at grpc_shaded.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) at grpc_shaded.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) at grpc_shaded.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) at grpc_shaded.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) at grpc_shaded.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) at grpc_shaded.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) at com.databricks.spark.connect.service.AuthenticationInterceptor$AuthenticatedServerCallListener.$anonfun$onHalfClose$1(AuthenticationInterceptor.scala:381) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104) at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$3(RequestContext.scala:337) at com.databricks.spark.connect.service.RequestContext$.com$databricks$spark$connect$service$RequestContext$$withLocalProperties(RequestContext.scala:537) at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$2(RequestContext.scala:337) at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49) at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289) at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47) at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44) at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:30) at com.databricks.spark.util.UniverseAttributionContextWrapper.withValue(AttributionContextUtils.scala:242) at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$1(RequestContext.scala:336) at com.databricks.spark.connect.service.RequestContext.withContext(RequestContext.scala:349) at com.databricks.spark.connect.service.RequestContext.runWith(RequestContext.scala:329) at com.databricks.spark.connect.service.AuthenticationInterceptor$AuthenticatedServerCallListener.onHalfClose(AuthenticationInterceptor.scala:381) at grpc_shaded.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:351) at grpc_shaded.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:861) at grpc_shaded.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at grpc_shaded.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.$anonfun$run$1(SparkThreadLocalForwardingThreadPoolExecutor.scala:165) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63) at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.$anonfun$runWithCaptured$6(SparkThreadLocalForwardingThreadPoolExecutor.scala:119) at com.databricks.sql.transaction.tahoe.mst.MSTThreadHelper$.runWithMstTxnId(MSTThreadHelper.scala:57) at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.$anonfun$runWithCaptured$5(SparkThreadLocalForwardingThreadPoolExecutor.scala:118) at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48) at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.$anonfun$runWithCaptured$4(SparkThreadLocalForwardingThreadPoolExecutor.scala:117) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51) at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:116) at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured$(SparkThreadLocalForwardingThreadPoolExecutor.scala:93) at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:162) at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.run(SparkThreadLocalForwardingThreadPoolExecutor.scala:165) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.lang.Thread.run(Thread.java:840)&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;&lt;A target="_blank"&gt;&amp;lt;command-5056821242561777&amp;gt;&lt;/A&gt;, line 3&lt;/SPAN&gt; &lt;SPAN&gt;1&lt;/SPAN&gt; &lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;module2&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; DummyDataSource &lt;SPAN&gt;2&lt;/SPAN&gt; spark&lt;SPAN&gt;.&lt;/SPAN&gt;dataSource&lt;SPAN&gt;.&lt;/SPAN&gt;register(DummyDataSource) &lt;SPAN class=""&gt;----&amp;gt; 3&lt;/SPAN&gt; spark&lt;SPAN&gt;.&lt;/SPAN&gt;read&lt;SPAN&gt;.&lt;/SPAN&gt;format(&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;dummy&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;)&lt;SPAN&gt;.&lt;/SPAN&gt;load()&lt;SPAN&gt;.&lt;/SPAN&gt;display()&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python_shell/lib/dbruntime/monkey_patches.py:73&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;apply_dataframe_display_patch.&amp;lt;locals&amp;gt;.df_display&lt;/SPAN&gt;&lt;SPAN class=""&gt;(df, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;69&lt;/SPAN&gt; &lt;SPAN class=""&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;df_display&lt;/SPAN&gt;(df, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs): &lt;SPAN&gt;70&lt;/SPAN&gt; &lt;SPAN&gt;"""&lt;/SPAN&gt; &lt;SPAN&gt;71&lt;/SPAN&gt; &lt;SPAN&gt;df.display() is an alias for display(df). Run help(display) for more information.&lt;/SPAN&gt; &lt;SPAN&gt;72&lt;/SPAN&gt; &lt;SPAN&gt;"""&lt;/SPAN&gt; &lt;SPAN class=""&gt;---&amp;gt; 73&lt;/SPAN&gt; display(df, &lt;SPAN&gt;*&lt;/SPAN&gt;args, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs)&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/databricks/python_shell/lib/dbruntime/display.py:133&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;Display.display&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, input, *args, **kwargs)&lt;/SPAN&gt; &lt;SPAN&gt;131&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;display_connect_table(&lt;SPAN&gt;input&lt;/SPAN&gt;, &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;kwargs) &lt;SPAN&gt;132&lt;/SPAN&gt; &lt;SPAN class=""&gt;elif&lt;/SPAN&gt; &lt;SPAN&gt;isinstance&lt;/SPAN&gt;(&lt;SPAN&gt;input&lt;/SPAN&gt;, ConnectDataFrame): &lt;SPAN class=""&gt;--&amp;gt; 133&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;input&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;isStreaming: &lt;SPAN&gt;134&lt;/SPAN&gt; handleStreamingConnectDataFramePy4j(&lt;SPAN&gt;input&lt;/SPAN&gt;, &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;entry_point, kwargs) &lt;SPAN&gt;135&lt;/SPAN&gt; &lt;SPAN class=""&gt;else&lt;/SPAN&gt;:&lt;/DIV&gt;&lt;DIV class=""&gt;File &lt;SPAN class=""&gt;/usr/lib/python3.12/functools.py:995&lt;/SPAN&gt;, in &lt;SPAN class=""&gt;cached_property.__get__&lt;/SPAN&gt;&lt;SPAN class=""&gt;(self, instance, owner)&lt;/SPAN&gt; &lt;SPAN&gt;993&lt;/SPAN&gt; val &lt;SPAN&gt;=&lt;/SPAN&gt; cache&lt;SPAN&gt;.&lt;/SPAN&gt;get(&lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;attrname, _NOT_FOUND) &lt;SPAN&gt;994&lt;/SPAN&gt; &lt;SPAN class=""&gt;if&lt;/SPAN&gt; val &lt;SPAN class=""&gt;is&lt;/SPAN&gt; _NOT_FOUND: &lt;SPAN class=""&gt;--&amp;gt; 995&lt;/SPAN&gt; val &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;func(instance) &lt;SPAN&gt;996&lt;/SPAN&gt; &lt;SPAN class=""&gt;try&lt;/SPAN&gt;: &lt;SPAN&gt;997&lt;/SPAN&gt; cache[&lt;SPAN&gt;self&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;attrname] &lt;SPAN&gt;=&lt;/SPAN&gt; val&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;HR /&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 01:05:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130438#M48795</guid>
      <dc:creator>Yousry_Ibrahim</dc:creator>
      <dc:date>2025-09-02T01:05:08Z</dc:date>
    </item>
    <item>
      <title>Re: Directories added to the Python sys.path do not always work fine on executors for shared access</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130518#M48818</link>
      <description>&lt;P&gt;shared access mode does not have the same functionalities as single-user.&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/compute/standard-limitations" target="_self"&gt;https://docs.databricks.com/aws/en/compute/standard-limitations&lt;/A&gt;&lt;BR /&gt;Your issue is not specifically mentioned but chances are real it is because of the access mode.&lt;BR /&gt;Any particular reason why you would want to use shared?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 13:01:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130518#M48818</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2025-09-02T13:01:01Z</dc:date>
    </item>
    <item>
      <title>Re: Directories added to the Python sys.path do not always work fine on executors for shared access</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130548#M48824</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/182071"&gt;@Yousry_Ibrahim&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think this could be serialization problem. I've recreated your scenario using cluster with shared access mode and I got the same error:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1756843086870.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19585i8B60499A7CEFBB7B/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1756843086870.png" alt="szymon_dybczak_0-1756843086870.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;But look what happens if I don't use sys.path.append but instead I use absolute import - now it works:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_1-1756843211873.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19586i78D17D071CE03F35/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_1-1756843211873.png" alt="szymon_dybczak_1-1756843211873.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 20:15:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130548#M48824</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-02T20:15:15Z</dc:date>
    </item>
    <item>
      <title>Re: Directories added to the Python sys.path do not always work fine on executors for shared access</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130550#M48825</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/14792"&gt;@-werners-&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;The reason why I am keen to use shared mode is due to some imposed templates prepared by the DevOps team. I don't have full control on the clusters or access mode I can use.&lt;BR /&gt;There is a solution proposed by&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp; and I will see how it works in my case.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 20:17:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130550#M48825</guid>
      <dc:creator>Yousry_Ibrahim</dc:creator>
      <dc:date>2025-09-02T20:17:28Z</dc:date>
    </item>
    <item>
      <title>Re: Directories added to the Python sys.path do not always work fine on executors for shared access</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130551#M48826</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;, it is a great idea.&lt;BR /&gt;I will give it a go but in my case the notebook and the modules live in completely different hierarchies so there will be some sort of relative path handling.&lt;BR /&gt;The&amp;nbsp;&lt;STRONG&gt;sys.path.append&amp;nbsp;&lt;/STRONG&gt;is declared by Databricks to refelect on the executors also. It works fine with UDFs but not for custom source at least on dedicated mode.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 20:20:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130551#M48826</guid>
      <dc:creator>Yousry_Ibrahim</dc:creator>
      <dc:date>2025-09-02T20:20:55Z</dc:date>
    </item>
    <item>
      <title>Re: Directories added to the Python sys.path do not always work fine on executors for shared access</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130552#M48827</link>
      <description>&lt;P&gt;No problem&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/182071"&gt;@Yousry_Ibrahim&lt;/a&gt;&amp;nbsp;. But I agree with you and this is something that I also wonder about. As you wrote, this path should also be distributed to the workers. However, as we can see, this is not happening.&lt;BR /&gt;In the documentation, there is no mention anywhere that in shared access mode the path will not be added to the executors.&lt;BR /&gt;So in theory what you're trying to do should work.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 20:25:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130552#M48827</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-02T20:25:31Z</dc:date>
    </item>
    <item>
      <title>Re: Directories added to the Python sys.path do not always work fine on executors for shared access</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130562#M48834</link>
      <description>&lt;P class=""&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/182071"&gt;@Yousry_Ibrahim&lt;/a&gt; This is a known limitation with PySpark custom data sources in shared access mode. The issue is that custom data sources serialize differently from UDFs.&lt;/P&gt;&lt;H2&gt;Root Cause&lt;/H2&gt;&lt;P class=""&gt;Custom data sources use cloudpickle serialization, which doesn't properly capture the sys.path modifications in shared clusters. UDFs work because they use a different serialization path.&lt;/P&gt;&lt;H2&gt;Solution 1: Add Module to Spark Files&lt;/H2&gt;&lt;P&gt;# Add the module file explicitly to the Spark context&lt;BR /&gt;spark.sparkContext.addPyFile("/Workspace/path/to/module2.py")&lt;/P&gt;&lt;P&gt;# Now import and register&lt;BR /&gt;from module2 import DummyDataSource&lt;BR /&gt;spark.dataSource.register(DummyDataSource)&lt;BR /&gt;spark.read.format("dummy"). load(). display()&lt;/P&gt;&lt;H2&gt;Solution 2: Package as an Init-Script&lt;/H2&gt;&lt;P class=""&gt;Create an init script that adds your modules to the Python path on all nodes:&lt;BR /&gt;#!/bin/bash&lt;BR /&gt;# cluster-init.sh&lt;BR /&gt;echo "export PYTHONPATH=/Workspace/your_modules:$PYTHONPATH" &amp;gt;&amp;gt; /databricks/spark/conf/spark-env.sh&lt;/P&gt;&lt;H2&gt;Solution 3: Inline the Data Source (Temporary Fix)&lt;/H2&gt;&lt;P class=""&gt;For testing, define the data source directly in the notebook:&lt;BR /&gt;# Define the classes in the notebook itself&lt;BR /&gt;exec(open('/Workspace/path/to/module2.py').read())&lt;BR /&gt;spark.dataSource.register(DummyDataSource)&lt;/P&gt;&lt;H2&gt;Why This Happens&lt;/H2&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;STRONG&gt;UDFs&lt;/STRONG&gt;: Executed in Python worker processes that inherit sys.path&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Data Sources&lt;/STRONG&gt;: Serialized at the driver and deserialized at executors without sys.path context&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Shared Mode&lt;/STRONG&gt;: Additional isolation prevents path propagation&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;The&amp;nbsp;addPyFile approach is the most reliable for shared clusters. It ensures the module is distributed to all executors before deserialization.&lt;/P&gt;&lt;P class=""&gt;Have you tried Solution 1? It should work immediately without a cluster restart.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 21:04:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130562#M48834</guid>
      <dc:creator>ck7007</dc:creator>
      <dc:date>2025-09-02T21:04:55Z</dc:date>
    </item>
    <item>
      <title>Re: Directories added to the Python sys.path do not always work fine on executors for shared access</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130564#M48835</link>
      <description>&lt;P&gt;If this is a well known limitation then share some links with us. For LLMs every problem is well known I guess &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 21:28:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130564#M48835</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-02T21:28:02Z</dc:date>
    </item>
    <item>
      <title>Re: Directories added to the Python sys.path do not always work fine on executors for shared access</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130569#M48837</link>
      <description>&lt;P&gt;Hi all,&lt;BR /&gt;Thanks for the feedback and proposed ideas.&lt;BR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp; Your idea of relative imports work when the module is hosted in a child directory to the current running notebook. It does not work if we need to go up one or two directories and navigate from there.&lt;BR /&gt;&lt;BR /&gt;The error in such case is "&lt;SPAN class=""&gt;ImportError: &lt;/SPAN&gt;&lt;SPAN&gt;attempted relative import with no known parent package".&amp;nbsp;&lt;BR /&gt;I will accept the solution proposed by&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/180185"&gt;@ck7007&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Practically speaking, I have to stick with things like "%run notebook" due to some other limitations.&lt;BR /&gt;Regarding the porposed options:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;SPAN&gt;"spark.sparkContext.addPyFile" is also not supported on shared clusters&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;The CI templates I am bound to use does not allow tweaking init scripts easily&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;I tried the "exec" method before and it is working but feels a bit hacky and will get so complicated if we have modules depending on other modules and so on. It would become tricky easily.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;SPAN&gt;If I have better control, I would have just used dedicated mode and off you go.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks all.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Sep 2025 00:07:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/130569#M48837</guid>
      <dc:creator>Yousry_Ibrahim</dc:creator>
      <dc:date>2025-09-03T00:07:45Z</dc:date>
    </item>
    <item>
      <title>Re: Directories added to the Python sys.path do not always work fine on executors for shared access</title>
      <link>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/151317#M53625</link>
      <description>&lt;P&gt;I've confirmed on Serverless version 5 that a pandas_udf that calls assert "mypath" in sys.path creates an assertion error immediately after i've added mypath on the driver.&amp;nbsp; &amp;nbsp;So, the workers are NOT getting updated (at least immediately) by the sys.path.append(mypath) execution.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2026 18:18:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/directories-added-to-the-python-sys-path-do-not-always-work-fine/m-p/151317#M53625</guid>
      <dc:creator>lprevost</dc:creator>
      <dc:date>2026-03-18T18:18:53Z</dc:date>
    </item>
  </channel>
</rss>

