cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mradula
by New Contributor
  • 1451 Views
  • 0 replies
  • 0 kudos

Displaying the queried data from mounted data from Azure Blob storage to databricks is slow

I have mounted by Azure blob storage json file to databricks which has around 18GB and trying to perform a simple count operation on it and I am noticing that it takes 14 mins for the same in the Community edition . seeking answers on whether this is...

14 min count
  • 1451 Views
  • 0 replies
  • 0 kudos
SM
by New Contributor III
  • 9241 Views
  • 8 replies
  • 3 kudos

Resolved! Delta Live Tables has duplicates created by multiple workers

Hello, I am working with Delta Live Tables, I am trying to create a DLT from a combination of Dataframes from a 'for loop' which are unioned and then DLT is created over the Unioned Dataframe. However I noticed that the delta table has duplciates. An...

  • 9241 Views
  • 8 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Shikha Mathew​ - Does your last answer mean that your issue is resolved? Would you be happy to mark whichever answer helped as best? Or, if it wasn't a specific one, would you tell us what worked?

  • 3 kudos
7 More Replies
Direo
by Contributor II
  • 2810 Views
  • 3 replies
  • 4 kudos

Resolved! Which cluster mode should I choose for most efficient graph modelling?

Is there a difference between cluster modes in this case? Can it be that Graphx would work better on single than on standart cluster or high concurrency cluster (for multiple users)? Does less concurrent cluster wourld be more efficient for graph mod...

  • 2810 Views
  • 3 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Direo Direo​ - What do you think of these answers? If either of them stands out as best, would you please mark it that way? If you have more questions, please, bring them on!

  • 4 kudos
2 More Replies
baatchus
by New Contributor III
  • 5538 Views
  • 4 replies
  • 0 kudos

Resolved! parameterize azure storage account name in spark cluster config databricks

wondering if this is to parameterize the azure storage account name part in the spark cluster config in Databricks?I have a working example where the values are referencing secret scopes:spark.hadoop.fs.azure.account.oauth2.client.id.<azurestorageacc...

  • 5538 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Fantastic! Thanks for letting us know!

  • 0 kudos
3 More Replies
tarente
by New Contributor III
  • 15505 Views
  • 8 replies
  • 4 kudos

Resolved! Pyspark - how to save the schema of a csv file in a delta table's column

How to save the schema of a csv file in a delta table's column?In a previous project implemented in Databricks using Scala notebooks, we stored the schema of csv files as a "json string" in a SQL Server table.When we needed to read or write the csv a...

  • 15505 Views
  • 8 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@tarente - Thanks for letting us know.

  • 4 kudos
7 More Replies
TufanRakshit
by New Contributor
  • 5280 Views
  • 1 replies
  • 1 kudos
  • 5280 Views
  • 1 replies
  • 1 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 1 kudos

@Tufan Rakshit​  : Could you please provide a sample example / pseudocode for above requirement that you are trying to implement.

  • 1 kudos
Jeff1
by Contributor II
  • 3464 Views
  • 3 replies
  • 5 kudos

Resolved! How to convert Data Chr Strings to Date Strings

Databricks CommunityNew to Databricks and work in R code. I have a data from with a date field that is a chr string and need to convert to a date field. Tried the standard as.Date(x, format = "%Y-%m-%d") , then tried the dplyr::mutate function and th...

  • 3464 Views
  • 3 replies
  • 5 kudos
Latest Reply
Jeff1
Contributor II
  • 5 kudos

Based upon the initial response I went with:my_data.frame <- my_data.frame %>% mutate(date = to_date(data.frame_variable, "yyyy-mm-dd"))

  • 5 kudos
2 More Replies
shrikant_kulkar
by New Contributor III
  • 1530 Views
  • 0 replies
  • 0 kudos

autoloader schema inference date column

I have hire_date and term_dates in the "MM/dd/YYYY" format in underneath csv files. Schema hint "cloudFiles.schemaHints" : "Hire_Date Date,Term_Date Date" - push data into _rescued_data column due to conversion failure. I am looking out solution to c...

  • 1530 Views
  • 0 replies
  • 0 kudos
Rex
by New Contributor III
  • 24564 Views
  • 7 replies
  • 3 kudos

Resolved! Cannot connect to Databricks SQL Endpoint using PHP and ODBC

I am trying to connect to our Databricks SQL endpoint using PHP in a Docker container.I setup my Docker container to download and configure the ODBC driver as specified here: https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#install-and-c...

  • 24564 Views
  • 7 replies
  • 3 kudos
Latest Reply
Rex
New Contributor III
  • 3 kudos

The problem was that the Databricks SQL driver does not yet support ARM, which my laptop and Docker container was building for. See ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/opt/simba/spark/lib/64/libsparkodbc_sb64.so' : file not ...

  • 3 kudos
6 More Replies
Suman
by New Contributor III
  • 4963 Views
  • 4 replies
  • 2 kudos

Change Data Feed functionality from SQL Endpoint

I am trying to run command to retrieve change data from sql endpoint. It is throwing below error."The input query contains unsupported data source(s).Only csv, json, avro, delta, parquet, orc, text data sources are supported on Databricks SQL."But th...

  • 4963 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

It is separate runtime https://docs.microsoft.com/en-us/azure/databricks/sql/release-notes/#channels

  • 2 kudos
3 More Replies
HQJaTu
by New Contributor III
  • 10286 Views
  • 10 replies
  • 1 kudos

Resolved! Azure Databricks container runtime broken in 9.1 LTS, how to fix?

For stability, I've stuck with LTS. Last Friday my containers stopped working with error message:Py4JException: An exception was raised by the Python Proxy. Return Message: Traceback (most recent call last): File "/databricks/spark/python/lib/py4j-...

  • 10286 Views
  • 10 replies
  • 1 kudos
Latest Reply
HQJaTu
New Contributor III
  • 1 kudos

This is getting worse. Now JDBC write to SQL is failing for same reason. I haven't yet found a solution for this.Am I not supposed to use containers? Python?This is not cool.

  • 1 kudos
9 More Replies
DavideCagnoni
by Contributor
  • 7017 Views
  • 4 replies
  • 1 kudos

How to force pandas_on_spark plots to use all dataframe data?

When I load a table as a `pandas_on_spark` dataframe, and try to e.g. scatterplot two columns, what I obtain is a subset of the desired points. For example, if I try to plot two columns from a table with 1000000 rows, I only see some of the data - i...

  • 7017 Views
  • 4 replies
  • 1 kudos
Latest Reply
DavideCagnoni
Contributor
  • 1 kudos

@Kaniz Fatma​  The problem is not about performance or plotly. It is about the pandas_on_spark dataframe arbitrarily subsampling the input data when plotting, without notifying the user about it.While subsampling is comprehensible and maybe even nece...

  • 1 kudos
3 More Replies
SailajaB
by Databricks Partner
  • 12456 Views
  • 12 replies
  • 4 kudos

Resolved! JSON validation is getting failed after writing Pyspark dataframe to json format

Hi We have to convert transformed dataframe to json format. So we used write and json format on top of final dataframe to convert it to json. But when we validating the output json its not in proper json format.Could you please provide your suggestio...

  • 12456 Views
  • 12 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Sailaja B​ - Does @Aman Sehgal​'s most recent answer help solve the problem? If it does, would you be happy to mark their answer as best?

  • 4 kudos
11 More Replies
Labels