Data Engineering

Forum Posts

Sorted by:

by Mradula • New Contributor

03-06-2022 9:25:44 PM

1451 Views
0 replies
0 kudos

Displaying the queried data from mounted data from Azure Blob storage to databricks is slow

I have mounted by Azure blob storage json file to databricks which has around 18GB and trying to perform a simple count operation on it and I am noticing that it takes 14 mins for the same in the Community edition . seeking answers on whether this is...

Data Engineering

1451 Views
0 replies
0 kudos

03-06-2022 9:25:44 PM

by SM • New Contributor III

02-24-2022 11:40:54 PM

9241 Views
8 replies
3 kudos

Resolved! Delta Live Tables has duplicates created by multiple workers

Hello, I am working with Delta Live Tables, I am trying to create a DLT from a combination of Dataframes from a 'for loop' which are unioned and then DLT is created over the Unioned Dataframe. However I noticed that the delta table has duplciates. An...

Data Engineering

9241 Views
8 replies
3 kudos

02-24-2022 11:40:54 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-06-2022 4:25:10 PM

3 kudos

@Shikha Mathew - Does your last answer mean that your issue is resolved? Would you be happy to mark whichever answer helped as best? Or, if it wasn't a specific one, would you tell us what worked?

3 kudos

03-06-2022 4:25:10 PM

7 More Replies

by Direo • Contributor II

03-01-2022 7:00:46 AM

2810 Views
3 replies
4 kudos

Resolved! Which cluster mode should I choose for most efficient graph modelling?

Is there a difference between cluster modes in this case? Can it be that Graphx would work better on single than on standart cluster or high concurrency cluster (for multiple users)? Does less concurrent cluster wourld be more efficient for graph mod...

Data Engineering

2810 Views
3 replies
4 kudos

03-01-2022 7:00:46 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-06-2022 4:16:27 PM

4 kudos

@Direo Direo - What do you think of these answers? If either of them stands out as best, would you please mark it that way? If you have more questions, please, bring them on!

4 kudos

03-06-2022 4:16:27 PM

2 More Replies

by baatchus • New Contributor III

02-22-2022 3:36:24 AM

5538 Views
4 replies
0 kudos

Resolved! parameterize azure storage account name in spark cluster config databricks

wondering if this is to parameterize the azure storage account name part in the spark cluster config in Databricks?I have a working example where the values are referencing secret scopes:spark.hadoop.fs.azure.account.oauth2.client.id.<azurestorageacc...

Data Engineering

5538 Views
4 replies
0 kudos

02-22-2022 3:36:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-06-2022 2:10:02 PM

0 kudos

Fantastic! Thanks for letting us know!

0 kudos

03-06-2022 2:10:02 PM

3 More Replies

by tarente • New Contributor III

01-27-2022 10:31:58 AM

15505 Views
8 replies
4 kudos

Resolved! Pyspark - how to save the schema of a csv file in a delta table's column

How to save the schema of a csv file in a delta table's column?In a previous project implemented in Databricks using Scala notebooks, we stored the schema of csv files as a "json string" in a SQL Server table.When we needed to read or write the csv a...

Data Engineering

15505 Views
8 replies
4 kudos

01-27-2022 10:31:58 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-06-2022 1:35:23 PM

4 kudos

@tarente - Thanks for letting us know.

4 kudos

03-06-2022 1:35:23 PM

7 More Replies

by TufanRakshit • New Contributor

02-23-2022 7:08:54 AM

5280 Views
1 replies
1 kudos

how to pass the result of one sql query to another sql query dynamically , The first query returns almost 16 k rows and this should be passed to another sql query programmatically for all possible combinations . I am using spark sql

Data Engineering

5280 Views
1 replies
1 kudos

02-23-2022 7:08:54 AM

View Replies

Latest Reply

RKNutalapati
Valued Contributor

03-05-2022 11:12:50 PM

1 kudos

@Tufan Rakshit : Could you please provide a sample example / pseudocode for above requirement that you are trying to implement.

1 kudos

03-05-2022 11:12:50 PM

by venkad • Contributor

03-03-2022 8:02:31 PM

3974 Views
4 replies
1 kudos

Resolved! Next LTS version after 9.1

What is the next DBR LTS version after 9.1 and when it is planned to be released in Azure?

Data Engineering

3974 Views
4 replies
1 kudos

03-03-2022 8:02:31 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-04-2022 4:41:49 PM

1 kudos

I think 10.4 will be LTS and released in the coming weeks.

1 kudos

03-04-2022 4:41:49 PM

3 More Replies

by Jeff1 • Contributor II

02-28-2022 11:03:40 AM

3464 Views
3 replies
5 kudos

Resolved! How to convert Data Chr Strings to Date Strings

Databricks CommunityNew to Databricks and work in R code. I have a data from with a date field that is a chr string and need to convert to a date field. Tried the standard as.Date(x, format = "%Y-%m-%d") , then tried the dplyr::mutate function and th...

Data Engineering

3464 Views
3 replies
5 kudos

02-28-2022 11:03:40 AM

View Replies

Latest Reply

Jeff1
Contributor II

03-04-2022 6:34:01 AM

5 kudos

Based upon the initial response I went with:my_data.frame <- my_data.frame %>% mutate(date = to_date(data.frame_variable, "yyyy-mm-dd"))

5 kudos

03-04-2022 6:34:01 AM

2 More Replies

by shrikant_kulkar • New Contributor III

03-04-2022 6:10:40 AM

1530 Views
0 replies
0 kudos

autoloader schema inference date column

I have hire_date and term_dates in the "MM/dd/YYYY" format in underneath csv files. Schema hint "cloudFiles.schemaHints" : "Hire_Date Date,Term_Date Date" - push data into _rescued_data column due to conversion failure. I am looking out solution to c...

Data Engineering

1530 Views
0 replies
0 kudos

03-04-2022 6:10:40 AM

by Rex • New Contributor III

02-25-2022 7:06:39 PM

24564 Views
7 replies
3 kudos

Resolved! Cannot connect to Databricks SQL Endpoint using PHP and ODBC

I am trying to connect to our Databricks SQL endpoint using PHP in a Docker container.I setup my Docker container to download and configure the ODBC driver as specified here: https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#install-and-c...

Data Engineering

24564 Views
7 replies
3 kudos

02-25-2022 7:06:39 PM

View Replies

Latest Reply

Rex
New Contributor III

02-28-2022 1:50:57 PM

3 kudos

The problem was that the Databricks SQL driver does not yet support ARM, which my laptop and Docker container was building for. See ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/opt/simba/spark/lib/64/libsparkodbc_sb64.so' : file not ...

3 kudos

02-28-2022 1:50:57 PM

6 More Replies

by Suman • New Contributor III

03-03-2022 10:31:08 PM

4963 Views
4 replies
2 kudos

Change Data Feed functionality from SQL Endpoint

I am trying to run command to retrieve change data from sql endpoint. It is throwing below error."The input query contains unsupported data source(s).Only csv, json, avro, delta, parquet, orc, text data sources are supported on Databricks SQL."But th...

Data Engineering

4963 Views
4 replies
2 kudos

03-03-2022 10:31:08 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-04-2022 3:08:09 AM

2 kudos

It is separate runtime https://docs.microsoft.com/en-us/azure/databricks/sql/release-notes/#channels

2 kudos

03-04-2022 3:08:09 AM

3 More Replies

by Fm_world_shop_1 • New Contributor

03-03-2022 11:20:05 PM

849 Views
0 replies
0 kudos

www.scent-sational-waxmelts.co.uk

Ignite your senses with distinctive and Fm World Shop delightful fragrances for your home, Discover scents to set the mood and inspire fragrant memories

Data Engineering

849 Views
0 replies
0 kudos

03-03-2022 11:20:05 PM

by HQJaTu • New Contributor III

01-02-2022 11:46:06 PM

10286 Views
10 replies
1 kudos

Resolved! Azure Databricks container runtime broken in 9.1 LTS, how to fix?

For stability, I've stuck with LTS. Last Friday my containers stopped working with error message:Py4JException: An exception was raised by the Python Proxy. Return Message: Traceback (most recent call last): File "/databricks/spark/python/lib/py4j-...

Data Engineering

10286 Views
10 replies
1 kudos

01-02-2022 11:46:06 PM

View Replies

Latest Reply

HQJaTu
New Contributor III

01-19-2022 12:12:42 AM

1 kudos

This is getting worse. Now JDBC write to SQL is failing for same reason. I haven't yet found a solution for this.Am I not supposed to use containers? Python?This is not cool.

1 kudos

01-19-2022 12:12:42 AM

9 More Replies

by DavideCagnoni • Contributor

02-11-2022 1:09:31 AM

7017 Views
4 replies
1 kudos

How to force pandas_on_spark plots to use all dataframe data?

When I load a table as a `pandas_on_spark` dataframe, and try to e.g. scatterplot two columns, what I obtain is a subset of the desired points. For example, if I try to plot two columns from a table with 1000000 rows, I only see some of the data - i...

Data Engineering

7017 Views
4 replies
1 kudos

02-11-2022 1:09:31 AM

View Replies

Latest Reply

DavideCagnoni
Contributor

03-03-2022 12:07:53 AM

1 kudos

@Kaniz Fatma The problem is not about performance or plotly. It is about the pandas_on_spark dataframe arbitrarily subsampling the input data when plotting, without notifying the user about it.While subsampling is comprehensible and maybe even nece...

1 kudos

03-03-2022 12:07:53 AM

3 More Replies

by SailajaB • Databricks Partner

02-09-2022 10:39:24 PM

12456 Views
12 replies
4 kudos

Resolved! JSON validation is getting failed after writing Pyspark dataframe to json format

Hi We have to convert transformed dataframe to json format. So we used write and json format on top of final dataframe to convert it to json. But when we validating the output json its not in proper json format.Could you please provide your suggestio...

Data Engineering

12456 Views
12 replies
4 kudos

02-09-2022 10:39:24 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-02-2022 9:01:40 AM

4 kudos

@Sailaja B - Does @Aman Sehgal's most recent answer help solve the problem? If it does, would you be happy to mark their answer as best?

4 kudos

03-02-2022 9:01:40 AM

11 More Replies

Databricks Community

Forum Posts

Displaying the queried data from mounted data from Azure Blob storage to databricks is slow

Resolved! Delta Live Tables has duplicates created by multiple workers

Resolved! Which cluster mode should I choose for most efficient graph modelling?

Resolved! parameterize azure storage account name in spark cluster config databricks

Resolved! Pyspark - how to save the schema of a csv file in a delta table's column

how to pass the result of one sql query to another sql query dynamically , The first query returns almost 16 k rows and this should be passed to another sql query programmatically for all possible combinations . I am using spark sql

Resolved! Next LTS version after 9.1

Resolved! How to convert Data Chr Strings to Date Strings

autoloader schema inference date column

Resolved! Cannot connect to Databricks SQL Endpoint using PHP and ODBC

Change Data Feed functionality from SQL Endpoint

www.scent-sational-waxmelts.co.uk

Resolved! Azure Databricks container runtime broken in 9.1 LTS, how to fix?

How to force pandas_on_spark plots to use all dataframe data?

Resolved! JSON validation is getting failed after writing Pyspark dataframe to json format

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template