Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have the Databricks VS code extension setup to develop and run jobs remotely. (with Databricks Connect).I enjoy working on notebooks within the native Databricks workspace, especially for exploratory work because I can execute blocks of code step b...
hi @Debayan in the https://learn.microsoft.com/en-us/azure/architecture/databricks-monitoring/application-logs. there is a github repository mentioned https://github.com/mspnp/spark-monitoring ? That repository is marked as maintainance mode. Just...
Make sure to watch the following video https://www.youtube.com/watch?v=DkzwFTC7WWsThis section lists the requirements for Databricks Connect.Only Databricks Runtime 13.0 ML and Databricks Runtime 13.0 are supported.Only clusters that are compatible w...
Connecting several Databricks tables to a Plotly Dash application. Can't seem to find much documentation on the differences between SQL Connector and Connect. Currently using the Connect approach to read tables into pyspark dataframes, is one better ...
Hirun below code line at spark-shell through db-connectIt throw exception:java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance o...
I have a Databricks cluster configured with an instance profile to assume role when accessing an AWS S3 bucket.Accessing the bucket from the notebook using the cluster works properly (the instance profile can assume role to access the bucket).However...
Hello, @lsoewito - My name is Piper, and I'm a moderator for the Databricks community. Welcome and thank you for coming to us with your question. I'm sorry to hear that you're having trouble. Let's give your peers a chance to answer your question. W...
Has anyone successfully used Petastorm + Databricks-Connect + Delta Lake?The use case is being able to use DeltaLake as a data store regardless of whether I want to use the databricks workspace or not for my training tasks.I'm using a cloud-hosted ju...
because its janky or why? I don't need it for customer facing production. More so for if I'm using my own HPC or local workstation, but I want to access data from delta lake. Figured it was easier/preferable to setting up my own spark environment loc...
Hi all,I currently have a cluster configured in databricks with spark-xml (version com.databricks:spark-xml_2.12:0.13.0) which was installed using Maven. The spark-xml library itself works fine with Pyspark when I am using it in a notebook within th...
@Sean Owen I do not believe I have. Do you have any documentation on how to install spark-xml locally? I have tried the following with no luck. IS this what you are referring to?PYSPARK_HOME/bin/pyspark --packages com.databricks:spark-xml_2.12:0.13....
I'd like to edit Databricks notebooks locally using my favorite editor, and then use Databricks Connect to run the notebook remotely on a Databricks cluster that I usually access via the web interface.I run "databricks-connect configure" , as suggest...