Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi ,I have a Databricks job that results in a dashboard post run , I'm able to download the dashboard as HTML from the view job runs page , but I want to automate the process , so I tried using the Databricks API , but it says {"error_code":"INVALID_...
Hi All,I set up a DLT pipeline to create 58 bronze tables and a subsequent DLT live table that joins the 58 bronze tables created in the first step. The pipeline runs successfully most times.My issue is that the pipeline fails once every 3/4 runs say...
@jose_gonzalez @Retired_mod - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening.
With no change in code, i've noticed that my DLT initialization fails and then an automatic rerun succeeds. Can someone help me understand this behavior. Thank you.
@jose_gonzalez - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening.
I have a scheduled job (running in continuous mode) with the following code``` ( spark .readStream .option("checkpointLocation", databricks_checkpoint_location) .option("readChangeFeed", "true") .option("startingVersion", VERSION + 1)...
Hi @Kit Yam Tse Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...
Background:I am attempting to download the google cloud sdk on Databricks. The end goal is to be able to use the sdk to transfer files from a Google Cloud Bucket to Azure Blob Storage using Databricks. (If you have any other ideas for this transfer p...
Thanks you for the response!2 Questions:1. How would you create a cluster with the custom requirements for the google cloud sdk? Is that still possible for a Unity Catalog enabled cluster with Shared Access Mode?2. Is a script action the same as a cl...
Hi Team,I am working on migration from Sql server to databricks environment.I encounter a challenge where Databricks and sql server giving different results for date difference function. Can you please help?--SQL SERVERSELECT DATEDIFF(MONTH , '2007-0...
While I was pretty sure it has to do with T-SQL not following ANSI standards, I could not actually tell you what exactly the difference is. So I asked chatgpt and here we go:The difference between DATEDIFF(month, date1, date2) in T-SQL and ANSI SQL ...
Performance Issues experienced when cluster was upgraded from 10.4 LTS to 11.3 LTS , The notebooks were running fine with the existing cluster. Soon after the upgrade the notebooks started to fail or exhaust memory executors etc . Any suggestions?
Hi all,I started using Azure Spot VMs by switching on the spot option when creating a cluster, however in the Azure billing dashboard, after some months of using spot instances, I only have OnDemand PurchaseType. Does someone guess what could be happ...
staticDataFrame = spark.read.format("csv")\ .option("header", "true").option("inferSchema", "true").load("/FileStore/tables/Consumption_2019/*.csv")
when above, I need an option to skip say first 4 lines on each CSV file, How do I do that?
The option... .option("skipRows", <number of rows to skip>) ...works for me as well. However, I am surprised that the official Spark doc does not list it as a CSV Data Source Option: https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data...
Hi,we are exploring the use of Databricks Statement Execution API for sharing the data through API to different consumer applications, however we have a security requirement to configure TLS Mutual Authentication to limit the consumer application t...
Hello,I am trying to create a permanent UDF from a Python file with dependencies that are not part of the standard Python library.How do I make use of CREATE FUNCTION (External) [1] to create a permanent function in Databricks, using a Python file th...
I have an external location setup "auth_kafka" which is mapped to an abfss url:abfss://{container}@{account}.dfs.core.windows.net/auth/kafkaand, critically, is marked as readonly.Using dbutils.fs I can successfully read the files (i.e. the ls and hea...
I have a single user cluster and I have created a workflow which will read excel file from Azure storage account. For reading excel file I am using com.crealytics:spark-excel_2.13:3.4.1_0.19.0 library on single user all-purpose cluster.I have alread...
Hi @Retired_mod Can you ellaborate few more things:1. When spark-shell installs any maven package, what is the default location where it downloads the jar file ?2. As far as I know default location for jars is "/databricks/jars/" from where spark pic...
Only permissions I can see are select and this gives access to data and that is very unwanted. I only want users to see the metadata, like table/view/column names and descriptions/comments and location and such but not to see any data.
@Uma Maheswara Rao Desula , @Geeta Sai Boddu and @S S ,Thank you for the responses. I have gotten answer from Databricks and it seems this is not possible and this is something that is investigated as a capability.
I have facing a error when I am trying to read data from any MongoDB collection using MongoDB Spark Connector v10.x on Databricks v13.x.The below error appear to start at line #113 of MongoDB Spark Connector Library (v10.2.0): java.lang.NoSuchMethod...