Machine Learning

by aladda • Databricks Employee

05-14-2021 12:07:59 PM

4136 Views
2 replies
0 kudos

Resolved! How do I use the Copy Into command to copy data into a Delta Table? Looking for examples where you want to have a pre-defined schema

I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with ...

Machine Learning

Reply

4136 Views
2 replies
0 kudos

05-14-2021 12:07:59 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-21-2021 1:32:35 PM

0 kudos

Here's an example for predefined schemaUsing COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below%sql CREATE OR REPLACE TABLE copy_into_bronze_te...

0 kudos

06-21-2021 1:32:35 PM

1 More Replies

by alesventus • Contributor

06-20-2023 2:02:48 AM

3036 Views
1 replies
2 kudos

Pyspark Merge parquet and delta file

Is it possible to use merge command when source file is parquet and destination file is delta? Or both files must delta files? Currently, I'm using this code and I transform parquet into delta and it works. But I want to avoid of this tranformation.T...

Machine Learning

Reply

3036 Views
1 replies
2 kudos

06-20-2023 2:02:48 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-20-2023 8:16:32 PM

2 kudos

Hi @Ales ventus We haven't heard from you since the last response from @Kaniz Fatma , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others...

2 kudos

06-20-2023 8:16:32 PM

by Ismail1 • New Contributor III

05-23-2023 9:25:00 AM

1829 Views
2 replies
0 kudos

Can an HMS-managed table be upgraded to Unity Catalog?

As the question states, I am not getting the option to upgrade managed tables on UC. Is that possible, I can't find anything on the documentation?

Machine Learning

Reply

1829 Views
2 replies
0 kudos

05-23-2023 9:25:00 AM

View Replies

Latest Reply

Ismail1
New Contributor III

05-29-2023 5:35:47 AM

0 kudos

In case anyone else ever faced the same issue

0 kudos

05-29-2023 5:35:47 AM

1 More Replies

by qwerty1 • Contributor

04-26-2023 7:31:00 PM

2059 Views
1 replies
1 kudos

Resolved! What is the disadvantage of using multiple Z-Order columns?

The documentation statesYou can specify multiple columns for ZORDER BY as a comma-separated list. However, the effectiveness of the locality drops with each extra columnWhat does it mean for "effectiveness of the locality to drop" with each extra co...

Machine Learning

Reply

2059 Views
1 replies
1 kudos

04-26-2023 7:31:00 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2023 10:58:19 AM

1 kudos

@Ashwin Bhaskar :Z-ordering is a technique to improve the performance of queries that involve filtering and grouping on specific columns in a large distributed database. When a table is z-ordered on a certain column or set of columns, the data is so...

1 kudos

04-28-2023 10:58:19 AM

by khh2023 • New Contributor

01-25-2023 1:47:52 PM

1584 Views
1 replies
0 kudos

Optimize operation with big increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes

Hello, I have a daily loading process for a delta table and has a ‘optimize table’ step at the end. The optimize operation used to take about 5 minutes, but now takes about 3.5 hours. One thing I noticed from 'describe history' is the operationMetric...

Machine Learning

Reply

1584 Views
1 replies
0 kudos

01-25-2023 1:47:52 PM

View Replies

Latest Reply

mathan_pillai
Databricks Employee

04-27-2023 2:58:38 PM

0 kudos

This is most likely because more files became eligible for compaction (optimize). By default there is a limit of 50 files or so per partition, below which the partition doesn't qualify for optimize. Only if there are 50+ files within a partition the...

0 kudos

04-27-2023 2:58:38 PM

by elgeo • Valued Contributor II

11-28-2022 5:26:02 AM

5027 Views
1 replies
4 kudos

Resolved! Insert into delta table fails

Hello experts. We are trying to execute an insert command with less columns than the target table:Insert into table_name( col1, col2, col10)Select col1, col2, col10from table_name2However the above fails with:Error in SQL statement: DeltaAnalysisExce...

Machine Learning

Reply

5027 Views
1 replies
4 kudos

11-28-2022 5:26:02 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-29-2022 10:51:19 AM

4 kudos

Hi @ELENI GEORGOUSI Yes. When you are doing an insert, your provided schema should match with the target schema else it would throw an error.But you can still insert the data using another approach. Create a dataframe with your data having less colu...

4 kudos

11-29-2022 10:51:19 AM

by elementalM • New Contributor III

09-30-2022 10:41:04 AM

3109 Views
4 replies
4 kudos

Catch-up Structured Stream hangs on last step of write job to delta sync using toTable

I'm running databricks version 10.4 on gcp. I'm running a structured stream trying to process historical files in a delta table on gcp cloud storage. This source delta table is big but maintained with OPTIMIZE.The stream repartitions which seems to b...

Machine Learning

Reply

3109 Views
4 replies
4 kudos

09-30-2022 10:41:04 AM

View Replies

Latest Reply

Anonymous
Not applicable

10-19-2022 3:21:42 AM

4 kudos

Hi @Dwight Branscombe Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

4 kudos

10-19-2022 3:21:42 AM

3 More Replies

by Nachappa • New Contributor III

07-04-2022 4:41:46 AM

5787 Views
4 replies
6 kudos

Data model tool to connect to Databricks or Data lake?

Hi Everyone,From data modeling documentation (Dimensional/ ER Diagram), is there any tool available which can connect to databricks/ data lake and read the table structure directly and also updates the structure of table whenever there is a addition ...

Machine Learning

Reply

5787 Views
4 replies
6 kudos

07-04-2022 4:41:46 AM

View Replies

Latest Reply

Nachappa
New Contributor III

07-08-2022 7:16:44 AM

6 kudos

Hi @Kaniz Fatma , @Prabakar Ammeappin : Thanks for the reply and information. Yes, I am able to connect via DBeaver to Databricks using the JDBC and supported provided link (Sorry for delay in update as I had to try on Trial version of Enterprise D...

6 kudos

07-08-2022 7:16:44 AM

3 More Replies

by MattM • New Contributor III

06-04-2022 8:09:49 AM

1916 Views
1 replies
2 kudos

Mapping Control data - Maintained by Business User

We are ingesting our data from ADLS into databricks as delta table. After raw layer we need to refer to a control\mapping layer which defines certain logic\measure definition. This would be incorporated in the subsequent silver or gold layer. This co...

Machine Learning

Reply

1916 Views
1 replies
2 kudos

06-04-2022 8:09:49 AM

View Replies

Latest Reply

MattM
New Contributor III

06-07-2022 12:52:50 PM

2 kudos

Thanks for your response. Can business user without the help of any script modify any rows in the table after loading it onetime from CSV fiels?

2 kudos

06-07-2022 12:52:50 PM

Databricks Community

Forum Posts

Resolved! How do I use the Copy Into command to copy data into a Delta Table? Looking for examples where you want to have a pre-defined schema

Pyspark Merge parquet and delta file

Can an HMS-managed table be upgraded to Unity Catalog?

Resolved! What is the disadvantage of using multiple Z-Order columns?

Optimize operation with big increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes

Resolved! Insert into delta table fails

Catch-up Structured Stream hangs on last step of write job to delta sync using toTable

Data model tool to connect to Databricks or Data lake?

Mapping Control data - Maintained by Business User