cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

ChingizK
by New Contributor III
  • 1673 Views
  • 0 replies
  • 0 kudos

Use Python code from a remote Git repository

I'm trying to create a task where the source is a Python script located in remote GitLab repo. I'm following the instructions HERE and this is how I have the task set up:However, no matter what path I specify all I get is the error below:Cannot read ...

03.png
  • 1673 Views
  • 0 replies
  • 0 kudos
Ravikumashi
by Contributor
  • 1822 Views
  • 3 replies
  • 0 kudos

Resolved! Issue with Logging Spark Events to LogAnalytics after Upgrading to Databricks 11.3 LTS

We have recently been in the process of upgrading our Databricks clusters to version 11.3 LTS. As part of this upgrade, we have been working on integrating the logging of Spark events to LogAnalytics using the repository available at https://github.c...

  • 1822 Views
  • 3 replies
  • 0 kudos
Latest Reply
swethaNandan
Contributor
  • 0 kudos

Hi Ravikumashi, Can you please raise a ticket with us so that we can look deeper in to the issue

  • 0 kudos
2 More Replies
Skr7
by New Contributor II
  • 2397 Views
  • 0 replies
  • 0 kudos

Scheduled job output export

Hi ,I have a Databricks job that results in a dashboard post run , I'm able to download the dashboard as HTML from the view job runs page , but I want to automate the process , so I tried using the Databricks API , but it says {"error_code":"INVALID_...

Data Engineering
data engineering
  • 2397 Views
  • 0 replies
  • 0 kudos
Manjula_Ganesap
by Contributor
  • 3829 Views
  • 2 replies
  • 1 kudos

Resolved! Delta Live Table pipeline failure - Table missing

Hi All,I set up a DLT pipeline to create 58 bronze tables and a subsequent DLT live table that joins the 58 bronze tables created in the first step. The pipeline runs successfully most times.My issue is that the pipeline fails once every 3/4 runs say...

Manjula_Ganesap_0-1692373291621.png Manjula_Ganesap_1-1692373340027.png
  • 3829 Views
  • 2 replies
  • 1 kudos
Latest Reply
Manjula_Ganesap
Contributor
  • 1 kudos

@jose_gonzalez @Retired_mod  - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening. 

  • 1 kudos
1 More Replies
Manjula_Ganesap
by Contributor
  • 1655 Views
  • 2 replies
  • 1 kudos

Delta Live Table (DLT) Initialization fails frequently

With no change in code, i've noticed that my DLT initialization fails and then an automatic rerun succeeds. Can someone help me understand this behavior. Thank you.  

Manjula_Ganesap_0-1694002699491.png
  • 1655 Views
  • 2 replies
  • 1 kudos
Latest Reply
Manjula_Ganesap
Contributor
  • 1 kudos

@jose_gonzalez  - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening. 

  • 1 kudos
1 More Replies
Kit
by New Contributor III
  • 4406 Views
  • 2 replies
  • 1 kudos

How to use checkpoint with change data feed

I have a scheduled job (running in continuous mode) with the following code``` ( spark .readStream .option("checkpointLocation", databricks_checkpoint_location) .option("readChangeFeed", "true") .option("startingVersion", VERSION + 1)...

  • 4406 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Kit Yam Tse​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 1 kudos
1 More Replies
editter
by New Contributor II
  • 2012 Views
  • 1 replies
  • 1 kudos

Unable to open a file in dbfs. Trying to move files from Google Bucket to Azure Blob Storage

Background:I am attempting to download the google cloud sdk on Databricks. The end goal is to be able to use the sdk to transfer files from a Google Cloud Bucket to Azure Blob Storage using Databricks. (If you have any other ideas for this transfer p...

Data Engineering
dbfs
Google Cloud SDK
pyspark
tarfile
  • 2012 Views
  • 1 replies
  • 1 kudos
Latest Reply
editter
New Contributor II
  • 1 kudos

Thanks you for the response!2 Questions:1. How would you create a cluster with the custom requirements for the google cloud sdk? Is that still possible for a Unity Catalog enabled cluster with Shared Access Mode?2. Is a script action the same as a cl...

  • 1 kudos
kmorton
by New Contributor
  • 1381 Views
  • 0 replies
  • 0 kudos

Autoloader start and end date for ingestion

I have been searching for a way to set up backfilling using autoloader with an option to set a "start_date" or "end_date". I am working on ingesting a massive file system but I don't want to ingest everything from the beginning. I have a start date t...

Data Engineering
autoloader
backfill
ETL
ingestion
  • 1381 Views
  • 0 replies
  • 0 kudos
AMadan
by New Contributor II
  • 4880 Views
  • 1 replies
  • 0 kudos

Date difference in Months

Hi Team,I am working on migration from Sql server to databricks environment.I encounter a challenge where Databricks and sql server giving different results for date difference function. Can you please help?--SQL SERVERSELECT DATEDIFF(MONTH , '2007-0...

  • 4880 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

While I was pretty sure it has to do with T-SQL not following ANSI standards, I could not actually tell you what exactly the difference is.  So I asked chatgpt and here we go:The difference between DATEDIFF(month, date1, date2) in T-SQL and ANSI SQL ...

  • 0 kudos
alvaro_databric
by New Contributor III
  • 1450 Views
  • 0 replies
  • 0 kudos

Azure Databricks Spot Cost

Hi all,I started using Azure Spot VMs by switching on the spot option when creating a cluster, however in the Azure billing dashboard, after some months of using spot instances, I only have OnDemand PurchaseType. Does someone guess what could be happ...

  • 1450 Views
  • 0 replies
  • 0 kudos
THIAM_HUATTAN
by Valued Contributor
  • 39948 Views
  • 8 replies
  • 2 kudos

Skip number of rows when reading CSV files

staticDataFrame = spark.read.format("csv")\ .option("header", "true").option("inferSchema", "true").load("/FileStore/tables/Consumption_2019/*.csv") when above, I need an option to skip say first 4 lines on each CSV file, How do I do that?

  • 39948 Views
  • 8 replies
  • 2 kudos
Latest Reply
Michael_Appiah
Contributor
  • 2 kudos

The option... .option("skipRows", <number of rows to skip>) ...works for me as well. However, I am surprised that the official Spark doc does not list it as a CSV Data Source Option: https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data...

  • 2 kudos
7 More Replies
omfspartan
by New Contributor III
  • 6665 Views
  • 3 replies
  • 0 kudos

Resolved! Connect and process Azure analysis services

How do I connect to Azure analysis services from databricks? I need to process the tabular model from databricks. I tried to use adodbapi. while connecting it is failing with error message "windows com error dispatch adodb.connection". please help

  • 6665 Views
  • 3 replies
  • 0 kudos
Latest Reply
omfspartan
New Contributor III
  • 0 kudos

I got another use case now "to run dax against Azure Analysis Services model" from AWS databricks. I tried above suggestion from "Jun Yang" and it is erroring out after 30 seconds with the exception that "Login timeout is expired"

  • 0 kudos
2 More Replies
rsamant07
by New Contributor III
  • 943 Views
  • 0 replies
  • 0 kudos

TLS Mutual Authentication for Databricks API

Hi,we are exploring the use of Databricks Statement Execution API for sharing the data through API to different consumer applications, however  we have a security requirement  to configure TLS Mutual Authentication to limit the consumer application t...

  • 943 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels