cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

alvaro_databric
by New Contributor III
  • 1160 Views
  • 1 replies
  • 0 kudos

Azure Databricks Spot Cost

Hi all,I started using Azure Spot VMs by switching on the spot option when creating a cluster, however in the Azure billing dashboard, after some months of using spot instances, I only have OnDemand PurchaseType. Does someone guess what could be happ...

  • 1160 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

  There are a few possibilities as to why you only see OnDemand PurchaseType in your Azure billing dashboard: 1. Spot instances were not available: If there were not enough spot instances open at the time of your request, Azure would have automatical...

  • 0 kudos
ChingizK
by New Contributor III
  • 1697 Views
  • 1 replies
  • 0 kudos

Resolved! Hyperopt Error: There are no evaluation tasks, cannot return argmin of task losses.

The trials succeed when the cell in the notebook is executed manually:However, the same process fails when executed as a Workflow: The error simply says that there's an issue with the objective function. However how can that be the case if I'm able t...

01.png 02.png
Data Engineering
hyperopt
Workflows
  • 1697 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ChingizK ,  Error message: "Exception: There are no evaluation tasks, cannot return argmin of task losses".  This error occurs when there are no successful evaluations of the objective function.  Possible reasons for the error when running the co...

  • 0 kudos
data_turtle
by New Contributor
  • 1104 Views
  • 1 replies
  • 0 kudos

How do I get AWS costs from my SQL Warehouses?

Hi,How do I find the AWS associated costs from my databricks SQL warehouse usage? I tried using tags but they didn't show up in the AWS cost explorer.My use case is I am running some DBT - Databricks jobs and I want to find the cost for certain jobs....

  • 1104 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

 Hi @data_turtle, To find the AWS-associated costs from your Databricks SQL warehouse usage, you can tag clusters and pools, and these tags propagate both to detailed DBU usage reports and to AWS EC2 and AWS EBS instances for cost analysis. However,...

  • 0 kudos
kazinahian
by New Contributor III
  • 2592 Views
  • 1 replies
  • 1 kudos

Resolved! How can I create a new calculated field in databricks by using pyspark.

Hello:Great people. I am new to Databricks and pyspark learning. How can I create a new column called "sub_total"? Where I want to group by "category" "subcategory" and "monthly" sales value. Appreciate your empathic solution. 

Data Engineering
calculation
  • 2592 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @kazinahian, To create a new column called "sub_total" where you want to group by "category", "subcategory", and "monthly" sales value, you can use the groupBy().applyInPandas() function in PySpark. This function implements the "split-apply-combin...

  • 1 kudos
rsamant07
by New Contributor III
  • 871 Views
  • 1 replies
  • 0 kudos

TLS Mutual Authentication for Databricks API

Hi,we are exploring the use of Databricks Statement Execution API for sharing the data through API to different consumer applications, however  we have a security requirement  to configure TLS Mutual Authentication to limit the consumer application t...

  • 871 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @rsamant07, I recommend contacting Databricks support by filing a support ticket.  

  • 0 kudos
THIAM_HUATTAN
by Valued Contributor
  • 38075 Views
  • 8 replies
  • 2 kudos

Skip number of rows when reading CSV files

staticDataFrame = spark.read.format("csv")\ .option("header", "true").option("inferSchema", "true").load("/FileStore/tables/Consumption_2019/*.csv") when above, I need an option to skip say first 4 lines on each CSV file, How do I do that?

  • 38075 Views
  • 8 replies
  • 2 kudos
Latest Reply
Michael_Appiah
Contributor
  • 2 kudos

The option... .option("skipRows", <number of rows to skip>) ...works for me as well. However, I am surprised that the official Spark doc does not list it as a CSV Data Source Option: https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data...

  • 2 kudos
7 More Replies
omfspartan
by New Contributor III
  • 6212 Views
  • 3 replies
  • 0 kudos

Resolved! Connect and process Azure analysis services

How do I connect to Azure analysis services from databricks? I need to process the tabular model from databricks. I tried to use adodbapi. while connecting it is failing with error message "windows com error dispatch adodb.connection". please help

  • 6212 Views
  • 3 replies
  • 0 kudos
Latest Reply
omfspartan
New Contributor III
  • 0 kudos

I got another use case now "to run dax against Azure Analysis Services model" from AWS databricks. I tried above suggestion from "Jun Yang" and it is erroring out after 30 seconds with the exception that "Login timeout is expired"

  • 0 kudos
2 More Replies
IvanK
by New Contributor III
  • 2774 Views
  • 2 replies
  • 0 kudos

Register permanent UDF from Python file

Hello,I am trying to create a permanent UDF from a Python file with dependencies that are not part of the standard Python library.How do I make use of CREATE FUNCTION (External) [1] to create a permanent function in Databricks, using a Python file th...

Data Engineering
Create function
python
  • 2774 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @IvanK , The pyspark.sql.functions.udf is a method in PySpark which allows you to create User Defined Functions (UDFs). These UDFs can be used to perform operations that are not defined in Spark. Here is a general way to create a UDF in Databric...

  • 0 kudos
1 More Replies
mwoods
by New Contributor III
  • 8373 Views
  • 2 replies
  • 0 kudos

dbutils.fs.cp requires write permissions on the source

I have an external location setup "auth_kafka" which is mapped to an abfss url:abfss://{container}@{account}.dfs.core.windows.net/auth/kafkaand, critically, is marked as readonly.Using dbutils.fs I can successfully read the files (i.e. the ls and hea...

  • 8373 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @mwoods , In Python, you can use the pickle module to serialize and de-serialize Python object structures. You can save your variables to a file  pickle.dump() and then load them in another notebook using pickle.load(). Here's how you can do it: I...

  • 0 kudos
1 More Replies
MauiWarrior
by New Contributor
  • 5054 Views
  • 1 replies
  • 0 kudos

Installing fpp3 R package on Databricks

In R notebook I am running:     install.packages('fpp3', dependencies = TRUE) And getting back errors:     ERROR: dependency ‘vctrs’ is not available for package ‘slider’I then install 'vctrs' and it again generates similar error that some package is...

  • 5054 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @MauiWarrior , Yes, you can install all required packages at once. You are facing this issue because some packages have dependencies on other packages and they need to be installed in a specific order. You can use the install.packages function w...

  • 0 kudos
Gilg
by Contributor II
  • 4773 Views
  • 1 replies
  • 1 kudos

Resolved! Pivot in Databricks SQL

Hi Team,I have a table that has a key column (column name) and value column (value of the column name). These values are generated dynamically and wanted to pivot the table.Question 1: Is there a way that we can do this without specifying all the col...

Gilg_0-1695088239719.png
  • 4773 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Gilg, In Databricks, performing a pivot operation is impossible without specifying all the columns in the expression list. This is because the pivot operation needs to know exactly which columns to pivot on. However, you can dynamically generate ...

  • 1 kudos
Kaniz_Fatma
by Community Manager
  • 15732 Views
  • 1 replies
  • 0 kudos

Repos

How to transfer repos from one Databricks environment to another Databricks environment without using git?

  • 15732 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

To transfer repos from one Databricks environment to another without using Git, you can use the dbx sync command. Here are the steps: 1. Identify the name of the Databricks Repo you want to transfer to your current Databricks workspace. 2. On your l...

  • 0 kudos
berserkersap
by Contributor
  • 3593 Views
  • 1 replies
  • 2 kudos

Speed Up JDBC Write from Databricks Notebook to MS SQL Server

Hello Everyone,I have a use case where I need to write a delta table from DataBricks to a SQL Server Table using Pyspark/ python/ spark SQL .The delta table I am writing contains around 3 million records and the SQL Server Table is neither partitione...

Data Engineering
JDBC
MS SQL Server
pyspark
Table Write
  • 3593 Views
  • 1 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @berserkersap, Partition the DataFrame before writing to take advantage of parallelism and speed up the process.- Example: df.repartition(10).write.mode("overwrite").option("truncate", True).jdbc(Url, "dbtable", properties=properties) Tune the JDB...

  • 2 kudos
nikhilkumawat
by New Contributor III
  • 6386 Views
  • 4 replies
  • 1 kudos

Install maven package on job cluster

I have a single user cluster and I have created a workflow which will read excel file from Azure storage account. For reading excel file I am using com.crealytics:spark-excel_2.13:3.4.1_0.19.0  library on single user all-purpose cluster.I have alread...

  • 6386 Views
  • 4 replies
  • 1 kudos
Latest Reply
nikhilkumawat
New Contributor III
  • 1 kudos

Hi @Kaniz_Fatma Can you ellaborate few more things:1. When spark-shell installs any maven package, what is the default location where it downloads the jar file ?2. As far as I know default location for jars is "/databricks/jars/" from where spark pic...

  • 1 kudos
3 More Replies
AtanuC
by New Contributor
  • 9648 Views
  • 1 replies
  • 1 kudos

OOP programming in Pyspark on Databricks platform

Hello Expert,I have a doubt so I need your advice and opinion on below query. Does OOP is a good chioce of programming for distributed data processing ? like Pysaprk in Databricks platform ? If not then what it is and what kinfd of challenges could b...

  • 9648 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @AtanuC, Object-Oriented Programming (OOP) is not typically the best choice for distributed data processing tasks like those handled by PySpark on the Databricks platform. The main reason is that OOP is based on the concept of "objects" which can...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels