cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Ashish
by New Contributor II
  • 5156 Views
  • 5 replies
  • 3 kudos

Resolved! Cost of individual jobs running on a shared Databricks cluster

Hi All,I am working on a requirement where I need to calculate the cost of each spark job individually on a shared Azure/AWS Databricks cluster. There can be multiple jobs running on the cluster parallelly.Cost needs to be calculated after job comple...

  • 5156 Views
  • 5 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Ashish Kardam​  (Customer)​ , Does @[werners] (Customer)​ 's or @Alex Ott​ 's replies answer your question?

  • 3 kudos
4 More Replies
daschl
by Contributor
  • 8142 Views
  • 23 replies
  • 13 kudos

Resolved! NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser on Databricks Cloud (but not on Spark Directly)

Hi,I'm working for Couchbase on the Couchbase Spark Connector and noticed something weird which I haven't been able to get to the bottom of so far.For query DataFrames we use the Datasource v2 API and we delegate the JSON parsing to the org.apache.sp...

  • 8142 Views
  • 23 replies
  • 13 kudos
Latest Reply
daschl
Contributor
  • 13 kudos

Since there hasn't been any progress on this for over a month, I applied a workaround and copied the classes into the connector source code so we don't have to rely on the databricks classloader. It seems to work in my testing and will be released wi...

  • 13 kudos
22 More Replies
Nilave
by New Contributor III
  • 3582 Views
  • 4 replies
  • 2 kudos

Resolved! Solution for API hosted on Databricks

I'm using Azure Databricks Python notebooks. We are preparing a front end to display the Databricks tables via API to query the tables. Is there a solution from Databricks to host callable APIs for querying its table and sending it as response to fro...

  • 3582 Views
  • 4 replies
  • 2 kudos
Latest Reply
Nilave
New Contributor III
  • 2 kudos

@Prabakar Ammeappin​  Thanks for the linkAlso was wondering for web page front end will it be more effective to query from SQL Database or from Azure Databricks tables. If from Azure SQL database, is there any efficient way to sync the tables from Az...

  • 2 kudos
3 More Replies
JD2
by Contributor
  • 3559 Views
  • 6 replies
  • 4 kudos

Resolved! Databricks Delta Table

Hello:I am new to databricks and need little help on Delta Table creation.I am having great difficulty to understand creating of delta table and they are:-Do I need to create S3 bucket for Delta Table? If YES then do I have to mount on the mountpoint...

  • 3559 Views
  • 6 replies
  • 4 kudos
Latest Reply
mathan_pillai
Valued Contributor
  • 4 kudos

Hi Jay,I would suggest to start with creating managed delta table. please run a simple commandCREATE TABLE events(id long) USING DELTAThis will create a managed delta table called "events"Then perform %sql describe extended eventsThe above command ...

  • 4 kudos
5 More Replies
Autel
by New Contributor II
  • 2908 Views
  • 4 replies
  • 1 kudos

Resolved! concurrent update to same hive or deltalake table

HI, I'm interested to know if multiple executors to append the same hive table using saveAsTable or insertInto sparksql. will that cause any data corruption? What configuration do I need to enable concurrent write to same hive table? what about the s...

  • 2908 Views
  • 4 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Weide Zhang​ , Does @[werners] (Customer)​ 's reply answer your question?

  • 1 kudos
3 More Replies
cbynum
by New Contributor III
  • 2444 Views
  • 4 replies
  • 1 kudos

Resolved! Terraform authentication with SSO enabled

After enabling SSO on my account I now don't have any way to change my terraform for provisioning AWS workspaces because username/password is disabled. Is there a workaround for this?

  • 2444 Views
  • 4 replies
  • 1 kudos
Latest Reply
cbynum
New Contributor III
  • 1 kudos

Never mind, the account owner creds do work, but I had to add the account owner to all of the workspaces. The terraform didn't give me an informative error, it just hung forever when applying.

  • 1 kudos
3 More Replies
BorislavBlagoev
by Valued Contributor III
  • 2479 Views
  • 5 replies
  • 4 kudos
  • 2479 Views
  • 5 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @Borislav Blagoev​ , Vacuum cleans up files associated with a table.Note:-This command works differently depending on whether you’re working on a Delta or Apache Spark table.Vacuum a Delta table (Delta Lake on Databricks)Recursively vacuum directo...

  • 4 kudos
4 More Replies
Anonymous
by Not applicable
  • 5449 Views
  • 3 replies
  • 5 kudos

Cluster does not have proper permissions to view DBFS mount point to Azure ADLS Gen 2.

I've created other mount points and am now trying to use the OAUTH method. I'm able to define the mount point using the OAUTH Mount to ADLS Gen 2 Storage.I've created an App Registration with Secret, added the App Registration as Contributor to the ...

  • 5449 Views
  • 3 replies
  • 5 kudos
Latest Reply
Gerbastanovic
New Contributor II
  • 5 kudos

Also check if you set the right permissions for the app on the containers ACLhttps://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control

  • 5 kudos
2 More Replies
Erik
by Valued Contributor II
  • 2284 Views
  • 6 replies
  • 2 kudos

Resolved! Powerbi databricks connector should import column description

I posted this idea in ideas.powerbi.com as well, but it is quite unclear to me whether the powerbi databricks connector is in fact made by MS or Databricks, so I post it here as well!It is possible to add comments/descriptions to databricks database ...

  • 2284 Views
  • 6 replies
  • 2 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 2 kudos

@Erik Parmann​  the connector is the colab product of MS and Databricks. But I feel this is a nice feature to add. I request to raise a feature request with us from here - https://ideas.databricks.com/ . Our product team definitely will take a look a...

  • 2 kudos
5 More Replies
Matt_Johnston
by New Contributor III
  • 3416 Views
  • 4 replies
  • 4 kudos

Resolved! Disk Type in Azure Databricks

Hi There,How are the disks tiers determined in Azure Databricks? We are currently using a pool which is using Standard DS3 v2 Virtual Machines, all with Premium SSD disks. Is there a way to change the tier of the disks?Thanks

  • 3416 Views
  • 4 replies
  • 4 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 4 kudos

I think we do not have option to change the disk type at this moment. but I would like to request you to raise a feature request through azure support if you are azure databricks user. if aws you can do the same from - https://docs.databricks.com/res...

  • 4 kudos
3 More Replies
Shridhar
by New Contributor
  • 13753 Views
  • 2 replies
  • 2 kudos

Resolved! Load multiple csv files into a dataframe in order

I can load multiple csv files by doing something like: paths = ["file_1", "file_2", "file_3"] df = sqlContext.read .format("com.databricks.spark.csv") .option("header", "true") .load(paths) But this doesn't seem to preserve the...

  • 13753 Views
  • 2 replies
  • 2 kudos
Latest Reply
Jaswanth_Saniko
New Contributor III
  • 2 kudos

val diamonds = spark.read.format("csv") .option("header", "true") .option("inferSchema", "true") .load("/FileStore/tables/11.csv","/FileStore/tables/12.csv","/FileStore/tables/13.csv")   display(diamonds)This is working for me @Shridhar​ 

  • 2 kudos
1 More Replies
Reza
by New Contributor III
  • 2506 Views
  • 2 replies
  • 0 kudos

Resolved! Can we order the widgets?

I have two text widgets (dbutils.widgets.text). One is called "start date" and another one is "end date". When I create them, they will be shown in alphabetic order (end_date, start_date). Is there any way that we can set the order when we create the...

  • 2506 Views
  • 2 replies
  • 0 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 0 kudos

https://docs.databricks.com/notebooks/widgets.html all options available here I think. @Reza Rajabi​  , but we can crosscheck

  • 0 kudos
1 More Replies
timothy_uk
by New Contributor III
  • 1933 Views
  • 4 replies
  • 0 kudos

Resolved! Zombie .Net Spark Databricks Job (CourseGrainedExecutorBackend)

Hi all,Environment:Nodes: Standard_E8s_v3Databricks Runtime: 9.0.NET for Apache Spark 2.0.0I'm invoking spark submit to run a .Net Spark job hosted in Azure Databricks. The job is written in C#.Net with its only transformation and action, reading a C...

  • 1933 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Timothy Lin​ ,I will recommend to not use spark.stop() or System.exit(0) in your code because it will explicitly stop the Spark context but the graceful shutdown and handshake with databricks' job service does not happen.

  • 0 kudos
3 More Replies
Braxx
by Contributor II
  • 4150 Views
  • 4 replies
  • 3 kudos

Resolved! spark.read excel with formula

For some reason spark is not reading the data correctly from xlsx file in the column with a formula. I am reading it from a blob storage.Consider this simple data set The column "color" has formulas for all the cells like=VLOOKUP(A4,C3:D5,2,0)In case...

image.png image.png
  • 4150 Views
  • 4 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

the formula itself isprobably what is actually stored in the excel file.Excel translates this to NA.I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.You could use a matching function (regexp f.e.) to d...

  • 3 kudos
3 More Replies
chandan_a_v
by Valued Contributor
  • 2653 Views
  • 8 replies
  • 4 kudos

Resolved! Spark Error : RScript (1243) terminated unexpectedly: Cannot call r___RBuffer__initialize().

grid_slice %>% sdf_copy_to(  sc = sc,  name = "grid_slice",  overwrite = TRUE ) %>% sdf_repartition(  partitions = min(n_executors * 3, NROW(grid_slice)),  partition_by = "variable" ) %>% spark_apply(  f = slice_data_wrapper,  columns = c(   variable...

  • 2653 Views
  • 8 replies
  • 4 kudos
Latest Reply
chandan_a_v
Valued Contributor
  • 4 kudos

Hi @Kaniz FatmaDid you find any solution? Please let us know

  • 4 kudos
7 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors