Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I'm trying to create a task where the source is a Python script located in remote GitLab repo. I'm following the instructions HERE and this is how I have the task set up:However, no matter what path I specify all I get is the error below:Cannot read ...
We have recently been in the process of upgrading our Databricks clusters to version 11.3 LTS. As part of this upgrade, we have been working on integrating the logging of Spark events to LogAnalytics using the repository available at https://github.c...
Hi ,I have a Databricks job that results in a dashboard post run , I'm able to download the dashboard as HTML from the view job runs page , but I want to automate the process , so I tried using the Databricks API , but it says {"error_code":"INVALID_...
Hi All,I set up a DLT pipeline to create 58 bronze tables and a subsequent DLT live table that joins the 58 bronze tables created in the first step. The pipeline runs successfully most times.My issue is that the pipeline fails once every 3/4 runs say...
@jose_gonzalez @Retired_mod - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening.
With no change in code, i've noticed that my DLT initialization fails and then an automatic rerun succeeds. Can someone help me understand this behavior. Thank you.
@jose_gonzalez - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening.
I have a scheduled job (running in continuous mode) with the following code``` ( spark .readStream .option("checkpointLocation", databricks_checkpoint_location) .option("readChangeFeed", "true") .option("startingVersion", VERSION + 1)...
Hi @Kit Yam Tse​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...
Background:I am attempting to download the google cloud sdk on Databricks. The end goal is to be able to use the sdk to transfer files from a Google Cloud Bucket to Azure Blob Storage using Databricks. (If you have any other ideas for this transfer p...
Thanks you for the response!2 Questions:1. How would you create a cluster with the custom requirements for the google cloud sdk? Is that still possible for a Unity Catalog enabled cluster with Shared Access Mode?2. Is a script action the same as a cl...
I have been searching for a way to set up backfilling using autoloader with an option to set a "start_date" or "end_date". I am working on ingesting a massive file system but I don't want to ingest everything from the beginning. I have a start date t...
Hi Team,I am working on migration from Sql server to databricks environment.I encounter a challenge where Databricks and sql server giving different results for date difference function. Can you please help?--SQL SERVERSELECT DATEDIFF(MONTH , '2007-0...
While I was pretty sure it has to do with T-SQL not following ANSI standards, I could not actually tell you what exactly the difference is. So I asked chatgpt and here we go:The difference between DATEDIFF(month, date1, date2) in T-SQL and ANSI SQL ...
Performance Issues experienced when cluster was upgraded from 10.4 LTS to 11.3 LTS , The notebooks were running fine with the existing cluster. Soon after the upgrade the notebooks started to fail or exhaust memory executors etc . Any suggestions?
Hi all,I started using Azure Spot VMs by switching on the spot option when creating a cluster, however in the Azure billing dashboard, after some months of using spot instances, I only have OnDemand PurchaseType. Does someone guess what could be happ...
staticDataFrame = spark.read.format("csv")\ .option("header", "true").option("inferSchema", "true").load("/FileStore/tables/Consumption_2019/*.csv")
when above, I need an option to skip say first 4 lines on each CSV file, How do I do that?
The option... .option("skipRows", <number of rows to skip>) ...works for me as well. However, I am surprised that the official Spark doc does not list it as a CSV Data Source Option: https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data...
Hello:Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value. Appreciate your empathic solution.
How do I connect to Azure analysis services from databricks? I need to process the tabular model from databricks. I tried to use adodbapi. while connecting it is failing with error message "windows com error dispatch adodb.connection". please help
I got another use case now "to run dax against Azure Analysis Services model" from AWS databricks. I tried above suggestion from "Jun Yang" and it is erroring out after 30 seconds with the exception that "Login timeout is expired"
Hi,we are exploring the use of Databricks Statement Execution API for sharing the data through API to different consumer applications, however we have a security requirement to configure TLS Mutual Authentication to limit the consumer application t...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.