<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Do Databricks support XLA compilation for TensorFlow models? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/do-databricks-support-xla-compilation-for-tensorflow-models/m-p/34185#M24955</link>
    <description>&lt;P&gt;I don't think this is specific to Databricks, but rather Tensorflow. See &lt;A href="https://stackoverflow.com/questions/68614547/tensorflow-libdevice-not-found-why-is-it-not-found-in-the-searched-path" target="test_blank"&gt;https://stackoverflow.com/questions/68614547/tensorflow-libdevice-not-found-why-is-it-not-found-in-the-searched-path&lt;/A&gt; for a possibly relevant solution.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't see evidence that this is related to libcupti&lt;/P&gt;</description>
    <pubDate>Fri, 17 Dec 2021 02:20:00 GMT</pubDate>
    <dc:creator>sean_owen</dc:creator>
    <dc:date>2021-12-17T02:20:00Z</dc:date>
    <item>
      <title>Do Databricks support XLA compilation for TensorFlow models?</title>
      <link>https://community.databricks.com/t5/data-engineering/do-databricks-support-xla-compilation-for-tensorflow-models/m-p/34181#M24951</link>
      <description>&lt;P&gt;I am defining a sequential Keras model using tensorflow.keras&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Runtime&lt;/B&gt;: Databricks ML 8.3&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Cluster&lt;/B&gt;: Standard NC24 with 4 GPUs per node.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;To enable XLA compilation, I set the following flag:&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;tf.config.optimizer.set_jit(True)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Here is the output when I try to train the model:&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;lt;command-4238178162238395&amp;gt; in train_distributed_tf(train_count, val_count, params)&lt;/P&gt;&lt;P&gt;     18                 metrics=['mean_absolute_error', 'mean_absolute_percentage_error'])&lt;/P&gt;&lt;P&gt;     19 &lt;/P&gt;&lt;P&gt;---&amp;gt; 20   history = model.fit(&lt;/P&gt;&lt;P&gt;     21     distributed_train,&lt;/P&gt;&lt;P&gt;     22     epochs=EPOCHS,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/mlflow/utils/autologging_utils/safety.py in safe_patch_function(*args, **kwargs)&lt;/P&gt;&lt;P&gt;    485 &lt;/P&gt;&lt;P&gt;    486                     if patch_is_class:&lt;/P&gt;&lt;P&gt;--&amp;gt; 487                         patch_function.call(call_original, *args, **kwargs)&lt;/P&gt;&lt;P&gt;    488                     else:&lt;/P&gt;&lt;P&gt;    489                         patch_function(call_original, *args, **kwargs)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/mlflow/utils/autologging_utils/safety.py in call(cls, original, *args, **kwargs)&lt;/P&gt;&lt;P&gt;    151     @classmethod&lt;/P&gt;&lt;P&gt;    152     def call(cls, original, *args, **kwargs):&lt;/P&gt;&lt;P&gt;--&amp;gt; 153         return cls().__call__(original, *args, **kwargs)&lt;/P&gt;&lt;P&gt;    154 &lt;/P&gt;&lt;P&gt;    155     def __call__(self, original, *args, **kwargs):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/mlflow/utils/autologging_utils/safety.py in __call__(self, original, *args, **kwargs)&lt;/P&gt;&lt;P&gt;    162                 # Regardless of what happens during the `_on_exception` callback, reraise&lt;/P&gt;&lt;P&gt;    163                 # the original implementation exception once the callback completes&lt;/P&gt;&lt;P&gt;--&amp;gt; 164                 raise e&lt;/P&gt;&lt;P&gt;    165 &lt;/P&gt;&lt;P&gt;    166 &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/mlflow/utils/autologging_utils/safety.py in __call__(self, original, *args, **kwargs)&lt;/P&gt;&lt;P&gt;    155     def __call__(self, original, *args, **kwargs):&lt;/P&gt;&lt;P&gt;    156         try:&lt;/P&gt;&lt;P&gt;--&amp;gt; 157             return self._patch_implementation(original, *args, **kwargs)&lt;/P&gt;&lt;P&gt;    158         except (Exception, KeyboardInterrupt) as e:&lt;/P&gt;&lt;P&gt;    159             try:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/mlflow/utils/autologging_utils/safety.py in _patch_implementation(self, original, *args, **kwargs)&lt;/P&gt;&lt;P&gt;    214                     self.managed_run = try_mlflow_log(create_managed_run)&lt;/P&gt;&lt;P&gt;    215 &lt;/P&gt;&lt;P&gt;--&amp;gt; 216                 result = super(PatchWithManagedRun, self)._patch_implementation(&lt;/P&gt;&lt;P&gt;    217                     original, *args, **kwargs&lt;/P&gt;&lt;P&gt;    218                 )&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/mlflow/tensorflow.py in _patch_implementation(self, original, inst, *args, **kwargs)&lt;/P&gt;&lt;P&gt;   1086                 _log_early_stop_callback_params(early_stop_callback)&lt;/P&gt;&lt;P&gt;   1087 &lt;/P&gt;&lt;P&gt;-&amp;gt; 1088                 history = original(inst, *args, **kwargs)&lt;/P&gt;&lt;P&gt;   1089 &lt;/P&gt;&lt;P&gt;   1090                 _log_early_stop_callback_metrics(early_stop_callback, history, metrics_logger)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/mlflow/utils/autologging_utils/safety.py in call_original(*og_args, **og_kwargs)&lt;/P&gt;&lt;P&gt;    443                                 disable_warnings=False, reroute_warnings=False,&lt;/P&gt;&lt;P&gt;    444                             &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;&lt;P&gt;--&amp;gt; 445                                 original_result = original(*og_args, **og_kwargs)&lt;/P&gt;&lt;P&gt;    446 &lt;/P&gt;&lt;P&gt;    447                             try_log_autologging_event(&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)&lt;/P&gt;&lt;P&gt;   1098                 _r=1):&lt;/P&gt;&lt;P&gt;   1099               callbacks.on_train_batch_begin(step)&lt;/P&gt;&lt;P&gt;-&amp;gt; 1100               tmp_logs = self.train_function(iterator)&lt;/P&gt;&lt;P&gt;   1101               if data_handler.should_sync:&lt;/P&gt;&lt;P&gt;   1102                 context.async_wait()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)&lt;/P&gt;&lt;P&gt;    826     tracing_count = self.experimental_get_tracing_count()&lt;/P&gt;&lt;P&gt;    827     with trace.Trace(self._name) as tm:&lt;/P&gt;&lt;P&gt;--&amp;gt; 828       result = self._call(*args, **kwds)&lt;/P&gt;&lt;P&gt;    829       compiler = "xla" if self._experimental_compile else "nonXla"&lt;/P&gt;&lt;P&gt;    830       new_tracing_count = self.experimental_get_tracing_count()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)&lt;/P&gt;&lt;P&gt;    886         # Lifting succeeded, so variables are initialized and we can run the&lt;/P&gt;&lt;P&gt;    887         # stateless function.&lt;/P&gt;&lt;P&gt;--&amp;gt; 888         return self._stateless_fn(*args, **kwds)&lt;/P&gt;&lt;P&gt;    889     else:&lt;/P&gt;&lt;P&gt;    890       _, _, _, filtered_flat_args = \&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)&lt;/P&gt;&lt;P&gt;   2940       (graph_function,&lt;/P&gt;&lt;P&gt;   2941        filtered_flat_args) = self._maybe_define_function(args, kwargs)&lt;/P&gt;&lt;P&gt;-&amp;gt; 2942     return graph_function._call_flat(&lt;/P&gt;&lt;P&gt;   2943         filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access&lt;/P&gt;&lt;P&gt;   2944 &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)&lt;/P&gt;&lt;P&gt;   1916         and executing_eagerly):&lt;/P&gt;&lt;P&gt;   1917       # No tape is watching; skip to running the function.&lt;/P&gt;&lt;P&gt;-&amp;gt; 1918       return self._build_call_outputs(self._inference_function.call(&lt;/P&gt;&lt;P&gt;   1919           ctx, args, cancellation_manager=cancellation_manager))&lt;/P&gt;&lt;P&gt;   1920     forward_backward = self._select_forward_and_backward_functions(&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)&lt;/P&gt;&lt;P&gt;    553       with _InterpolateFunctionError(self):&lt;/P&gt;&lt;P&gt;    554         if cancellation_manager is None:&lt;/P&gt;&lt;P&gt;--&amp;gt; 555           outputs = execute.execute(&lt;/P&gt;&lt;P&gt;    556               str(self.signature.name),&lt;/P&gt;&lt;P&gt;    557               num_outputs=self._num_outputs,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)&lt;/P&gt;&lt;P&gt;     57   try:&lt;/P&gt;&lt;P&gt;     58     ctx.ensure_initialized()&lt;/P&gt;&lt;P&gt;---&amp;gt; 59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,&lt;/P&gt;&lt;P&gt;     60                                         inputs, attrs, num_outputs)&lt;/P&gt;&lt;P&gt;     61   except core._NotOkStatusException as e:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;InternalError: 5 root error(s) found.&lt;/P&gt;&lt;P&gt;  (0) Internal:  libdevice not found at ./libdevice.10.bc&lt;/P&gt;&lt;P&gt;	 [[{{node cluster_3_1/xla_compile}}]]&lt;/P&gt;&lt;P&gt;	 [[div_no_nan_33/ReadVariableOp_3/_318]]&lt;/P&gt;&lt;P&gt;  (1) Internal:  libdevice not found at ./libdevice.10.bc&lt;/P&gt;&lt;P&gt;	 [[{{node cluster_3_1/xla_compile}}]]&lt;/P&gt;&lt;P&gt;  (2) Internal:  libdevice not found at ./libdevice.10.bc&lt;/P&gt;&lt;P&gt;	 [[{{node cluster_3_1/xla_compile}}]]&lt;/P&gt;&lt;P&gt;	 [[div_no_nan/_825]]&lt;/P&gt;&lt;P&gt;  (3) Internal:  libdevice not found at ./libdevice.10.bc&lt;/P&gt;&lt;P&gt;	 [[{{node cluster_3_1/xla_compile}}]]&lt;/P&gt;&lt;P&gt;	 [[div_no_nan_26/AddN/_272]]&lt;/P&gt;&lt;P&gt;  (4) Internal:  libdevice not found at ./libdevice.10.bc&lt;/P&gt;&lt;P&gt;	 [[{{node cluster_3_1/xla_compile}}]]&lt;/P&gt;&lt;P&gt;	 [[div_no_nan/_821]]&lt;/P&gt;&lt;P&gt;0 successful operations.&lt;/P&gt;&lt;P&gt;0 derived errors ignored. [Op:__inference_train_function_2599244]&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Function call stack:&lt;/P&gt;&lt;P&gt;train_function -&amp;gt; train_function -&amp;gt; train_function -&amp;gt; train_function -&amp;gt; train_function&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Nov 2021 18:36:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/do-databricks-support-xla-compilation-for-tensorflow-models/m-p/34181#M24951</guid>
      <dc:creator>ray21</dc:creator>
      <dc:date>2021-11-30T18:36:59Z</dc:date>
    </item>
    <item>
      <title>Re: Do Databricks support XLA compilation for TensorFlow models?</title>
      <link>https://community.databricks.com/t5/data-engineering/do-databricks-support-xla-compilation-for-tensorflow-models/m-p/34183#M24953</link>
      <description>&lt;P&gt;Hi @Revanth Pentyala​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you please try with DBR 7.3 ML cluster ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It seems like cupti library  was deprecated staring from 7.6 DBR.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/release-notes/runtime/7.6ml.html#deprecations" target="test_blank"&gt;https://docs.databricks.com/release-notes/runtime/7.6ml.html#deprecations&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It seems the cupti version(9) which comes with ubuntu was not compatible with CUDA(11). The workaround would be to install the compatible cupti package(11) through init script. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However for now you can try with DBR 7.3 ML to see if it works there&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Mathan&lt;/P&gt;</description>
      <pubDate>Thu, 09 Dec 2021 00:30:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/do-databricks-support-xla-compilation-for-tensorflow-models/m-p/34183#M24953</guid>
      <dc:creator>mathan_pillai</dc:creator>
      <dc:date>2021-12-09T00:30:13Z</dc:date>
    </item>
    <item>
      <title>Re: Do Databricks support XLA compilation for TensorFlow models?</title>
      <link>https://community.databricks.com/t5/data-engineering/do-databricks-support-xla-compilation-for-tensorflow-models/m-p/34184#M24954</link>
      <description>&lt;P&gt;Hi @Revanth Pentyala​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Did Mathan's response help you to solve your question/issue? if it did, please mark it as "best" to it can be moved to the top and help others&lt;/P&gt;</description>
      <pubDate>Fri, 10 Dec 2021 23:23:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/do-databricks-support-xla-compilation-for-tensorflow-models/m-p/34184#M24954</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-12-10T23:23:05Z</dc:date>
    </item>
    <item>
      <title>Re: Do Databricks support XLA compilation for TensorFlow models?</title>
      <link>https://community.databricks.com/t5/data-engineering/do-databricks-support-xla-compilation-for-tensorflow-models/m-p/34185#M24955</link>
      <description>&lt;P&gt;I don't think this is specific to Databricks, but rather Tensorflow. See &lt;A href="https://stackoverflow.com/questions/68614547/tensorflow-libdevice-not-found-why-is-it-not-found-in-the-searched-path" target="test_blank"&gt;https://stackoverflow.com/questions/68614547/tensorflow-libdevice-not-found-why-is-it-not-found-in-the-searched-path&lt;/A&gt; for a possibly relevant solution.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't see evidence that this is related to libcupti&lt;/P&gt;</description>
      <pubDate>Fri, 17 Dec 2021 02:20:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/do-databricks-support-xla-compilation-for-tensorflow-models/m-p/34185#M24955</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2021-12-17T02:20:00Z</dc:date>
    </item>
  </channel>
</rss>

