cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Ravikumashi
by Contributor
  • 1474 Views
  • 3 replies
  • 0 kudos

Resolved! Issue with Logging Spark Events to LogAnalytics after Upgrading to Databricks 11.3 LTS

We have recently been in the process of upgrading our Databricks clusters to version 11.3 LTS. As part of this upgrade, we have been working on integrating the logging of Spark events to LogAnalytics using the repository available at https://github.c...

  • 1474 Views
  • 3 replies
  • 0 kudos
Latest Reply
swethaNandan
New Contributor III
  • 0 kudos

Hi Ravikumashi, Can you please raise a ticket with us so that we can look deeper in to the issue

  • 0 kudos
2 More Replies
Manjula_Ganesap
by Contributor
  • 2977 Views
  • 4 replies
  • 1 kudos

Resolved! Delta Live Table pipeline failure - Table missing

Hi All,I set up a DLT pipeline to create 58 bronze tables and a subsequent DLT live table that joins the 58 bronze tables created in the first step. The pipeline runs successfully most times.My issue is that the pipeline fails once every 3/4 runs say...

Manjula_Ganesap_0-1692373291621.png Manjula_Ganesap_1-1692373340027.png
  • 2977 Views
  • 4 replies
  • 1 kudos
Latest Reply
Manjula_Ganesap
Contributor
  • 1 kudos

@jose_gonzalez @Kaniz_Fatma  - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening. 

  • 1 kudos
3 More Replies
Manjula_Ganesap
by Contributor
  • 1276 Views
  • 3 replies
  • 1 kudos

Delta Live Table (DLT) Initialization fails frequently

With no change in code, i've noticed that my DLT initialization fails and then an automatic rerun succeeds. Can someone help me understand this behavior. Thank you.  

Manjula_Ganesap_0-1694002699491.png
  • 1276 Views
  • 3 replies
  • 1 kudos
Latest Reply
Manjula_Ganesap
Contributor
  • 1 kudos

@jose_gonzalez  - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening. 

  • 1 kudos
2 More Replies
Kit
by New Contributor III
  • 3301 Views
  • 3 replies
  • 1 kudos

How to use checkpoint with change data feed

I have a scheduled job (running in continuous mode) with the following code``` ( spark .readStream .option("checkpointLocation", databricks_checkpoint_location) .option("readChangeFeed", "true") .option("startingVersion", VERSION + 1)...

  • 3301 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Kit Yam Tse​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 1 kudos
2 More Replies
editter
by New Contributor II
  • 1551 Views
  • 2 replies
  • 2 kudos

Unable to open a file in dbfs. Trying to move files from Google Bucket to Azure Blob Storage

Background:I am attempting to download the google cloud sdk on Databricks. The end goal is to be able to use the sdk to transfer files from a Google Cloud Bucket to Azure Blob Storage using Databricks. (If you have any other ideas for this transfer p...

Data Engineering
dbfs
Google Cloud SDK
pyspark
tarfile
  • 1551 Views
  • 2 replies
  • 2 kudos
Latest Reply
editter
New Contributor II
  • 2 kudos

Thanks you for the response!2 Questions:1. How would you create a cluster with the custom requirements for the google cloud sdk? Is that still possible for a Unity Catalog enabled cluster with Shared Access Mode?2. Is a script action the same as a cl...

  • 2 kudos
1 More Replies
AMadan
by New Contributor II
  • 3218 Views
  • 1 replies
  • 0 kudos

Date difference in Months

Hi Team,I am working on migration from Sql server to databricks environment.I encounter a challenge where Databricks and sql server giving different results for date difference function. Can you please help?--SQL SERVERSELECT DATEDIFF(MONTH , '2007-0...

  • 3218 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

While I was pretty sure it has to do with T-SQL not following ANSI standards, I could not actually tell you what exactly the difference is.  So I asked chatgpt and here we go:The difference between DATEDIFF(month, date1, date2) in T-SQL and ANSI SQL ...

  • 0 kudos
alvaro_databric
by New Contributor III
  • 942 Views
  • 1 replies
  • 0 kudos

Azure Databricks Spot Cost

Hi all,I started using Azure Spot VMs by switching on the spot option when creating a cluster, however in the Azure billing dashboard, after some months of using spot instances, I only have OnDemand PurchaseType. Does someone guess what could be happ...

  • 942 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

  There are a few possibilities as to why you only see OnDemand PurchaseType in your Azure billing dashboard: 1. Spot instances were not available: If there were not enough spot instances open at the time of your request, Azure would have automatical...

  • 0 kudos
ChingizK
by New Contributor III
  • 1402 Views
  • 1 replies
  • 0 kudos

Resolved! Hyperopt Error: There are no evaluation tasks, cannot return argmin of task losses.

The trials succeed when the cell in the notebook is executed manually:However, the same process fails when executed as a Workflow: The error simply says that there's an issue with the objective function. However how can that be the case if I'm able t...

01.png 02.png
Data Engineering
hyperopt
Workflows
  • 1402 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ChingizK ,  Error message: "Exception: There are no evaluation tasks, cannot return argmin of task losses".  This error occurs when there are no successful evaluations of the objective function.  Possible reasons for the error when running the co...

  • 0 kudos
data_turtle
by New Contributor
  • 953 Views
  • 1 replies
  • 0 kudos

How do I get AWS costs from my SQL Warehouses?

Hi,How do I find the AWS associated costs from my databricks SQL warehouse usage? I tried using tags but they didn't show up in the AWS cost explorer.My use case is I am running some DBT - Databricks jobs and I want to find the cost for certain jobs....

  • 953 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

 Hi @data_turtle, To find the AWS-associated costs from your Databricks SQL warehouse usage, you can tag clusters and pools, and these tags propagate both to detailed DBU usage reports and to AWS EC2 and AWS EBS instances for cost analysis. However,...

  • 0 kudos
kazinahian
by New Contributor III
  • 2171 Views
  • 1 replies
  • 1 kudos

Resolved! How can I create a new calculated field in databricks by using pyspark.

Hello:Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value. Appreciate your empathic solution. 

Data Engineering
calculation
  • 2171 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @kazinahian, To create a new column called "sub_total" where you want to group by "category", "subcategory", and "monthly" sales value, you can use the groupBy().applyInPandas() function in PySpark. This function implements the "split-apply-combin...

  • 1 kudos
rsamant07
by New Contributor III
  • 726 Views
  • 1 replies
  • 0 kudos

TLS Mutual Authentication for Databricks API

Hi,we are exploring the use of Databricks Statement Execution API for sharing the data through API to different consumer applications, however  we have a security requirement  to configure TLS Mutual Authentication to limit the consumer application t...

  • 726 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @rsamant07, I recommend contacting Databricks support by filing a support ticket.  

  • 0 kudos
THIAM_HUATTAN
by Valued Contributor
  • 31751 Views
  • 8 replies
  • 2 kudos

Skip number of rows when reading CSV files

staticDataFrame = spark.read.format("csv")\ .option("header", "true").option("inferSchema", "true").load("/FileStore/tables/Consumption_2019/*.csv") when above, I need an option to skip say first 4 lines on each CSV file, How do I do that?

  • 31751 Views
  • 8 replies
  • 2 kudos
Latest Reply
Michael_Appiah
New Contributor III
  • 2 kudos

The option... .option("skipRows", <number of rows to skip>) ...works for me as well. However, I am surprised that the official Spark doc does not list it as a CSV Data Source Option: https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data...

  • 2 kudos
7 More Replies
omfspartan
by New Contributor III
  • 5305 Views
  • 3 replies
  • 0 kudos

Resolved! Connect and process Azure analysis services

How do I connect to Azure analysis services from databricks? I need to process the tabular model from databricks. I tried to use adodbapi. while connecting it is failing with error message "windows com error dispatch adodb.connection". please help

  • 5305 Views
  • 3 replies
  • 0 kudos
Latest Reply
omfspartan
New Contributor III
  • 0 kudos

I got another use case now "to run dax against Azure Analysis Services model" from AWS databricks. I tried above suggestion from "Jun Yang" and it is erroring out after 30 seconds with the exception that "Login timeout is expired"

  • 0 kudos
2 More Replies
IvanK
by New Contributor III
  • 2371 Views
  • 2 replies
  • 0 kudos

Register permanent UDF from Python file

Hello,I am trying to create a permanent UDF from a Python file with dependencies that are not part of the standard Python library.How do I make use of CREATE FUNCTION (External) [1] to create a permanent function in Databricks, using a Python file th...

Data Engineering
Create function
python
  • 2371 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @IvanK , The pyspark.sql.functions.udf is a method in PySpark which allows you to create User Defined Functions (UDFs). These UDFs can be used to perform operations that are not defined in Spark. Here is a general way to create a UDF in Databric...

  • 0 kudos
1 More Replies
mwoods
by New Contributor III
  • 5035 Views
  • 2 replies
  • 0 kudos

dbutils.fs.cp requires write permissions on the source

I have an external location setup "auth_kafka" which is mapped to an abfss url:abfss://{container}@{account}.dfs.core.windows.net/auth/kafkaand, critically, is marked as readonly.Using dbutils.fs I can successfully read the files (i.e. the ls and hea...

  • 5035 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @mwoods , In Python, you can use the pickle module to serialize and de-serialize Python object structures. You can save your variables to a file  pickle.dump() and then load them in another notebook using pickle.load(). Here's how you can do it: I...

  • 0 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels