Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with ...
Here's an example for predefined schemaUsing COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below%sql CREATE OR REPLACE TABLE copy_into_bronze_te...
Is it possible to use merge command when source file is parquet and destination file is delta? Or both files must delta files? Currently, I'm using this code and I transform parquet into delta and it works. But I want to avoid of this tranformation.T...
Hi @Ales ventus We haven't heard from you since the last response from @Kaniz Fatma , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others...
The documentation statesYou can specify multiple columns for ZORDER BY as a comma-separated list. However, the effectiveness of the locality drops with each extra columnWhat does it mean for "effectiveness of the locality to drop" with each extra co...
@Ashwin Bhaskar :Z-ordering is a technique to improve the performance of queries that involve filtering and grouping on specific columns in a large distributed database. When a table is z-ordered on a certain column or set of columns, the data is so...
Hello, I have a daily loading process for a delta table and has a ‘optimize table’ step at the end. The optimize operation used to take about 5 minutes, but now takes about 3.5 hours. One thing I noticed from 'describe history' is the operationMetric...
This is most likely because more files became eligible for compaction (optimize). By default there is a limit of 50 files or so per partition, below which the partition doesn't qualify for optimize. Only if there are 50+ files within a partition the...
Hello experts. We are trying to execute an insert command with less columns than the target table:Insert into table_name( col1, col2, col10)Select col1, col2, col10from table_name2However the above fails with:Error in SQL statement: DeltaAnalysisExce...
Hi @ELENI GEORGOUSI Yes. When you are doing an insert, your provided schema should match with the target schema else it would throw an error.But you can still insert the data using another approach. Create a dataframe with your data having less colu...
I'm running databricks version 10.4 on gcp. I'm running a structured stream trying to process historical files in a delta table on gcp cloud storage. This source delta table is big but maintained with OPTIMIZE.The stream repartitions which seems to b...
Hi @Dwight Branscombe Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....
Hi Everyone,From data modeling documentation (Dimensional/ ER Diagram), is there any tool available which can connect to databricks/ data lake and read the table structure directly and also updates the structure of table whenever there is a addition ...
Hi @Kaniz Fatma , @Prabakar Ammeappin : Thanks for the reply and information. Yes, I am able to connect via DBeaver to Databricks using the JDBC and supported provided link (Sorry for delay in update as I had to try on Trial version of Enterprise D...
We are ingesting our data from ADLS into databricks as delta table. After raw layer we need to refer to a control\mapping layer which defines certain logic\measure definition. This would be incorporated in the subsequent silver or gold layer. This co...