cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aladda
by Honored Contributor II
  • 1658 Views
  • 2 replies
  • 0 kudos

Resolved! How do I use the Copy Into command to copy data into a Delta Table? Looking for examples where you want to have a pre-defined schema

I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with ...

  • 1658 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

Here's an example for predefined schemaUsing COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below%sql CREATE OR REPLACE TABLE copy_into_bronze_te...

  • 0 kudos
1 More Replies
User16869510359
by Esteemed Contributor
  • 1474 Views
  • 2 replies
  • 0 kudos
  • 1474 Views
  • 2 replies
  • 0 kudos
Latest Reply
N_M
New Contributor III
  • 0 kudos

How does COPY_INTO work with table restore?I made some tests, and the restore method does NOT restore the key-store values of the target at the specific version, which means that the data that came after the chosen version cannot be inserted (unless ...

  • 0 kudos
1 More Replies
SRK
by Contributor III
  • 4334 Views
  • 6 replies
  • 3 kudos

How to apply Primary Key constraint in Delta Live Table?

In this blog I can see for dimension and fact tables, the primary key constraint has been applied. Following is the example:-- Store dimensionCREATE OR REPLACE TABLE dim_store( store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, business_key ...

  • 4334 Views
  • 6 replies
  • 3 kudos
Latest Reply
Oliver_Angelil
Valued Contributor II
  • 3 kudos

@SRK Please see a copy of this answer on stackoverflow here. You can use DLT Expectations to have this check (see my previous answer if you're using SQL and not Python):@dlt.table(name="table1",)def create_df():schema = T.StructType([T.StructField("i...

  • 3 kudos
5 More Replies
thushar
by Contributor
  • 1676 Views
  • 3 replies
  • 2 kudos

MetadataChangedException Exception in databricks

Reading around 20 text files from ADLS, doing some transformations, and after that these files are written back to ADLS as a single delta file (all operations are in parallel through the thread pool). Here from 20 threads, it is writing to a single f...

  • 1676 Views
  • 3 replies
  • 2 kudos
Latest Reply
naga_databricks
Contributor
  • 2 kudos

I have seen this problem with Identity column causing concurrency issues. But you seem to be getting similar error when writing to files. I don't know completely know your use case completely here, but would advice retrying this operation by managing...

  • 2 kudos
2 More Replies
alesventus
by New Contributor III
  • 1382 Views
  • 2 replies
  • 3 kudos

Pyspark Merge parquet and delta file

Is it possible to use merge command when source file is parquet and destination file is delta? Or both files must delta files? Currently, I'm using this code and I transform parquet into delta and it works. But I want to avoid of this tranformation.T...

  • 1382 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ales ventus​ We haven't heard from you since the last response from @Kaniz Fatma​ , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others...

  • 3 kudos
1 More Replies
vittal
by New Contributor
  • 648 Views
  • 1 replies
  • 0 kudos

Getting errors in DLT Pipeline while using ML Model

I am getting the following error when I try to run ML Models in Delta live Table Pipeline File "/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-55c61-9b898-2c4b6-d/mlflow/envs/virtualenv_envs/mlflow-888f8c9b966409e6bddca3894244b4df9d1f94c1/lib/pyth...

  • 648 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 0 kudos

@Vittal Pai​  - In general, please follow the below steps for the mlflow CLI error,Step 1: set up API token and create secrets as mentioned in the below documenthttps://docs.databricks.com/machine-learning/manage-model-lifecycle/multiple-workspaces.h...

  • 0 kudos
lurban
by New Contributor
  • 614 Views
  • 1 replies
  • 0 kudos

CloudFilesIllegalStateException: Found mismatched event: key old_file_path doesn't have the prefix: new_file_path

My team currently uses Autoloader and Delta Live Tables to process incremental data from ADLS storage. We are needing to keep the same table and history, but switch the filepath to a different location in storage. When I test a filepath change, I rec...

  • 614 Views
  • 1 replies
  • 0 kudos
Latest Reply
DD_Sharma
New Contributor III
  • 0 kudos

Autoloader doesn't support changing the source path for running job so if you change your source path your stream fails because the source path has changed. However, if you really want to change the path you can change it by using the new checkpoint ...

  • 0 kudos
Anonymous
by Not applicable
  • 509 Views
  • 2 replies
  • 3 kudos

www.databricks.com

Hello Dolly: Democratizing the magic of ChatGPT with open modelsDatabricks has just released a groundbreaking new blog post exploring ChatGPT, an open-source language model with the potential to transform the way we interact with technology. From cha...

  • 509 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Lets get candid! Let me know your initial thoughts about LLM Models, ChatGpt, Dolly.

  • 3 kudos
1 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 2367 Views
  • 2 replies
  • 36 kudos

Delta lake Vs Data lake in Databricks Delta Lake is an open-source storage layer that sits on top of existing data lake storage, such as Azure Data La...

Delta lake Vs Data lake in DatabricksDelta Lake is an open-source storage layer that sits on top of existing data lake storage, such as Azure Data Lake Store or Amazon S3. It provides a more robust and scalable alternative to traditional data lake st...

  • 2367 Views
  • 2 replies
  • 36 kudos
Latest Reply
Meghala
Valued Contributor II
  • 36 kudos

this data is very much informative and i understood much in it so thank you @Aviral Bhardwaj​ sir

  • 36 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 2855 Views
  • 1 replies
  • 4 kudos

Resolved! Insert into delta table fails

Hello experts. We are trying to execute an insert command with less columns than the target table:Insert into table_name( col1, col2, col10)Select col1, col2, col10from table_name2However the above fails with:Error in SQL statement: DeltaAnalysisExce...

  • 2855 Views
  • 1 replies
  • 4 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 4 kudos

Hi @ELENI GEORGOUSI​ Yes. When you are doing an insert, your provided schema should match with the target schema else it would throw an error.But you can still insert the data using another approach. Create a dataframe with your data having less colu...

  • 4 kudos
MA
by New Contributor II
  • 593 Views
  • 1 replies
  • 4 kudos

Stream data from Delta tables replicated with Fivetran into DLT

I'm attempting to stream into a DLT pipeline with data replicated from Fivetran directly into Delta tables in another database than the one that the DLT pipeline uses. This is not an aggregate, and I don't want to recompute the entire data model eac...

  • 593 Views
  • 1 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @M A​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 4 kudos
Shuvi
by New Contributor III
  • 1236 Views
  • 3 replies
  • 5 kudos

Resolved! What is the use case of having Azure Synapse(DWH) and Delta Lake ( Gold) given we can connect BI to delta directly

The curated zone is pushed to cloud data warehouse such as Synapse Dedicated SQL Pools which then acts as a serving layer for BI tools and analyst.I believe we can have models in gold layer and have BI connect to this layer or we can have serverless ...

  • 1236 Views
  • 3 replies
  • 5 kudos
Latest Reply
Shuvi
New Contributor III
  • 5 kudos

Thank you, so for a large workload, where we need lot of optimization we might need Synapse, but for a small/medium workload, we might have to stick to Delta Table

  • 5 kudos
2 More Replies
vaver_3
by New Contributor III
  • 8940 Views
  • 1 replies
  • 5 kudos

Resolved! ingest a .csv file with spaces in column names using Delta Live into a streaming table

How do I ingest a .csv file with spaces in column names using Delta Live into a streaming table? All of the fields should be read using the default behavior .csv files for DLT autoloader - as strings. Running the pipeline gives me an error about in...

  • 8940 Views
  • 1 replies
  • 5 kudos
Latest Reply
vaver_3
New Contributor III
  • 5 kudos

After additional googling on "withColumnRenamed", I was able to replace all spaces in column names with "_" all at once by using select and alias instead:@dlt.view( comment="" ) def vw_raw(): return ( spark.readStream.format("cloudF...

  • 5 kudos
amits
by New Contributor III
  • 1916 Views
  • 8 replies
  • 5 kudos

Tableau extract creation frozen

Heya,I'm having an issue with extract creation from a Delta lake table. Tableau is frozen on "Rows retrieved: X" for too long.I actually succeeded in creating the first extract but saw I was missing a column. I went ahead and did a full rewrite -even...

  • 1916 Views
  • 8 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @Amit Steiner​ â€‹, We haven’t heard from you on the last response from @Prabakar Ammeappin​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. ...

  • 5 kudos
7 More Replies
MadelynM
by New Contributor III
  • 1161 Views
  • 1 replies
  • 7 kudos

2021-07-Webinar--Hassle-Free-Data-Ingestion-Social-1200x628

Thanks to everyone who joined the Hassle-Free Data Ingestion webinar. You can access the on-demand recording here. We're sharing a subset of the phenomenal questions asked and answered throughout the session. You'll find Ingestion Q&A listed first, f...

  • 1161 Views
  • 1 replies
  • 7 kudos
Latest Reply
Emily_S
New Contributor III
  • 7 kudos

Check out Part 2 of this Data Ingestion webinar to find out how to easily ingest semi-structured data at scale into your Delta Lake, including how to use Databricks Auto Loader to ingest JSON data into Delta Lake.

  • 7 kudos
Labels