cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aladda
by Databricks Employee
  • 3808 Views
  • 2 replies
  • 0 kudos

Resolved! How do I use the Copy Into command to copy data into a Delta Table? Looking for examples where you want to have a pre-defined schema

I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with ...

  • 3808 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Here's an example for predefined schemaUsing COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below%sql CREATE OR REPLACE TABLE copy_into_bronze_te...

  • 0 kudos
1 More Replies
alesventus
by Contributor
  • 2683 Views
  • 1 replies
  • 2 kudos

Pyspark Merge parquet and delta file

Is it possible to use merge command when source file is parquet and destination file is delta? Or both files must delta files? Currently, I'm using this code and I transform parquet into delta and it works. But I want to avoid of this tranformation.T...

  • 2683 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Ales ventus​ We haven't heard from you since the last response from @Kaniz Fatma​ , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others...

  • 2 kudos
Ismail1
by New Contributor III
  • 1628 Views
  • 2 replies
  • 0 kudos

Can an HMS-managed table be upgraded to Unity Catalog?

As the question states, I am not getting the option to upgrade managed tables on UC. Is that possible, I can't find anything on the documentation?

  • 1628 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ismail1
New Contributor III
  • 0 kudos

In case anyone else ever faced the same issue

  • 0 kudos
1 More Replies
qwerty1
by Contributor
  • 1586 Views
  • 1 replies
  • 1 kudos

Resolved! What is the disadvantage of using multiple Z-Order columns?

The documentation statesYou can specify multiple columns for  ZORDER BY as a comma-separated list. However, the effectiveness of the locality drops with each extra columnWhat does it mean for "effectiveness of the locality to drop" with each extra co...

  • 1586 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Ashwin Bhaskar​ :Z-ordering is a technique to improve the performance of queries that involve filtering and grouping on specific columns in a large distributed database. When a table is z-ordered on a certain column or set of columns, the data is so...

  • 1 kudos
khh2023
by New Contributor
  • 1350 Views
  • 1 replies
  • 0 kudos

Optimize operation with big increase in numRemovedFiles/numRemovedBytes/numAddedFiles/numAddedBytes

Hello, I have a daily loading process for a delta table and has a ‘optimize table’ step at the end. The optimize operation used to take about 5 minutes, but now takes about 3.5 hours. One thing I noticed from 'describe history' is the operationMetric...

image.png
  • 1350 Views
  • 1 replies
  • 0 kudos
Latest Reply
mathan_pillai
Databricks Employee
  • 0 kudos

This is most likely because more files became eligible for compaction (optimize). By default there is a limit of 50 files or so per partition, below which the partition doesn't qualify for optimize. Only if there are 50+ files within a partition the...

  • 0 kudos
elgeo
by Valued Contributor II
  • 4526 Views
  • 1 replies
  • 4 kudos

Resolved! Insert into delta table fails

Hello experts. We are trying to execute an insert command with less columns than the target table:Insert into table_name( col1, col2, col10)Select col1, col2, col10from table_name2However the above fails with:Error in SQL statement: DeltaAnalysisExce...

  • 4526 Views
  • 1 replies
  • 4 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 4 kudos

Hi @ELENI GEORGOUSI​ Yes. When you are doing an insert, your provided schema should match with the target schema else it would throw an error.But you can still insert the data using another approach. Create a dataframe with your data having less colu...

  • 4 kudos
elementalM
by New Contributor III
  • 2659 Views
  • 4 replies
  • 4 kudos

Catch-up Structured Stream hangs on last step of write job to delta sync using toTable

I'm running databricks version 10.4 on gcp. I'm running a structured stream trying to process historical files in a delta table on gcp cloud storage. This source delta table is big but maintained with OPTIMIZE.The stream repartitions which seems to b...

image
  • 2659 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Dwight Branscombe​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 4 kudos
3 More Replies
Nachappa
by New Contributor III
  • 5237 Views
  • 4 replies
  • 6 kudos

Data model tool to connect to Databricks or Data lake?

Hi Everyone,From data modeling documentation (Dimensional/ ER Diagram), is there any tool available which can connect to databricks/ data lake and read the table structure directly and also updates the structure of table whenever there is a addition ...

  • 5237 Views
  • 4 replies
  • 6 kudos
Latest Reply
Nachappa
New Contributor III
  • 6 kudos

Hi @Kaniz Fatma​ , @Prabakar Ammeappin​ : Thanks for the reply and information. Yes, I am able to connect via DBeaver to Databricks using the JDBC and supported provided link (Sorry for delay in update as I had to try on Trial version of Enterprise D...

  • 6 kudos
3 More Replies
MattM
by New Contributor III
  • 1723 Views
  • 1 replies
  • 2 kudos

Mapping Control data - Maintained by Business User

We are ingesting our data from ADLS into databricks as delta table. After raw layer we need to refer to a control\mapping layer which defines certain logic\measure definition. This would be incorporated in the subsequent silver or gold layer. This co...

  • 1723 Views
  • 1 replies
  • 2 kudos
Latest Reply
MattM
New Contributor III
  • 2 kudos

Thanks for your response. Can business user without the help of any script modify any rows in the table after loading it onetime from CSV fiels?

  • 2 kudos
Labels