cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aladda
by Databricks Employee
  • 3823 Views
  • 2 replies
  • 0 kudos

Resolved! How do I use the Copy Into command to copy data into a Delta Table? Looking for examples where you want to have a pre-defined schema

I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with ...

  • 3823 Views
  • 2 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Here's an example for predefined schemaUsing COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below%sql CREATE OR REPLACE TABLE copy_into_bronze_te...

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 2495 Views
  • 2 replies
  • 0 kudos
  • 2495 Views
  • 2 replies
  • 0 kudos
Latest Reply
N_M
Contributor
  • 0 kudos

How does COPY_INTO work with table restore?I made some tests, and the restore method does NOT restore the key-store values of the target at the specific version, which means that the data that came after the chosen version cannot be inserted (unless ...

  • 0 kudos
1 More Replies
SRK
by Contributor III
  • 8471 Views
  • 6 replies
  • 3 kudos

How to apply Primary Key constraint in Delta Live Table?

In this blog I can see for dimension and fact tables, the primary key constraint has been applied. Following is the example:-- Store dimensionCREATE OR REPLACE TABLE dim_store( store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, business_key ...

  • 8471 Views
  • 6 replies
  • 3 kudos
Latest Reply
Oliver_Angelil
Valued Contributor II
  • 3 kudos

@SRK Please see a copy of this answer on stackoverflow here. You can use DLT Expectations to have this check (see my previous answer if you're using SQL and not Python):@dlt.table(name="table1",)def create_df():schema = T.StructType([T.StructField("i...

  • 3 kudos
5 More Replies
thushar
by Contributor
  • 4578 Views
  • 3 replies
  • 2 kudos

MetadataChangedException Exception in databricks

Reading around 20 text files from ADLS, doing some transformations, and after that these files are written back to ADLS as a single delta file (all operations are in parallel through the thread pool). Here from 20 threads, it is writing to a single f...

  • 4578 Views
  • 3 replies
  • 2 kudos
Latest Reply
naga_databricks
Contributor
  • 2 kudos

I have seen this problem with Identity column causing concurrency issues. But you seem to be getting similar error when writing to files. I don't know completely know your use case completely here, but would advice retrying this operation by managing...

  • 2 kudos
2 More Replies
alesventus
by Contributor
  • 2691 Views
  • 1 replies
  • 2 kudos

Pyspark Merge parquet and delta file

Is it possible to use merge command when source file is parquet and destination file is delta? Or both files must delta files? Currently, I'm using this code and I transform parquet into delta and it works. But I want to avoid of this tranformation.T...

  • 2691 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Ales ventus​ We haven't heard from you since the last response from @Kaniz Fatma​ , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others...

  • 2 kudos
vittal
by New Contributor
  • 1198 Views
  • 1 replies
  • 0 kudos

Getting errors in DLT Pipeline while using ML Model

I am getting the following error when I try to run ML Models in Delta live Table Pipeline File "/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-55c61-9b898-2c4b6-d/mlflow/envs/virtualenv_envs/mlflow-888f8c9b966409e6bddca3894244b4df9d1f94c1/lib/pyth...

  • 1198 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@Vittal Pai​  - In general, please follow the below steps for the mlflow CLI error,Step 1: set up API token and create secrets as mentioned in the below documenthttps://docs.databricks.com/machine-learning/manage-model-lifecycle/multiple-workspaces.h...

  • 0 kudos
lurban
by New Contributor
  • 1244 Views
  • 1 replies
  • 0 kudos

CloudFilesIllegalStateException: Found mismatched event: key old_file_path doesn't have the prefix: new_file_path

My team currently uses Autoloader and Delta Live Tables to process incremental data from ADLS storage. We are needing to keep the same table and history, but switch the filepath to a different location in storage. When I test a filepath change, I rec...

  • 1244 Views
  • 1 replies
  • 0 kudos
Latest Reply
DD_Sharma
New Contributor III
  • 0 kudos

Autoloader doesn't support changing the source path for running job so if you change your source path your stream fails because the source path has changed. However, if you really want to change the path you can change it by using the new checkpoint ...

  • 0 kudos
Anonymous
by Not applicable
  • 1202 Views
  • 2 replies
  • 3 kudos

www.databricks.com

Hello Dolly: Democratizing the magic of ChatGPT with open modelsDatabricks has just released a groundbreaking new blog post exploring ChatGPT, an open-source language model with the potential to transform the way we interact with technology. From cha...

  • 1202 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Lets get candid! Let me know your initial thoughts about LLM Models, ChatGpt, Dolly.

  • 3 kudos
1 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 4932 Views
  • 2 replies
  • 36 kudos

Delta lake Vs Data lake in Databricks Delta Lake is an open-source storage layer that sits on top of existing data lake storage, such as Azure Data La...

Delta lake Vs Data lake in DatabricksDelta Lake is an open-source storage layer that sits on top of existing data lake storage, such as Azure Data Lake Store or Amazon S3. It provides a more robust and scalable alternative to traditional data lake st...

  • 4932 Views
  • 2 replies
  • 36 kudos
Latest Reply
Meghala
Valued Contributor II
  • 36 kudos

this data is very much informative and i understood much in it so thank you @Aviral Bhardwaj​ sir

  • 36 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 4532 Views
  • 1 replies
  • 4 kudos

Resolved! Insert into delta table fails

Hello experts. We are trying to execute an insert command with less columns than the target table:Insert into table_name( col1, col2, col10)Select col1, col2, col10from table_name2However the above fails with:Error in SQL statement: DeltaAnalysisExce...

  • 4532 Views
  • 1 replies
  • 4 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 4 kudos

Hi @ELENI GEORGOUSI​ Yes. When you are doing an insert, your provided schema should match with the target schema else it would throw an error.But you can still insert the data using another approach. Create a dataframe with your data having less colu...

  • 4 kudos
MA
by New Contributor II
  • 1314 Views
  • 1 replies
  • 4 kudos

Stream data from Delta tables replicated with Fivetran into DLT

I'm attempting to stream into a DLT pipeline with data replicated from Fivetran directly into Delta tables in another database than the one that the DLT pipeline uses. This is not an aggregate, and I don't want to recompute the entire data model eac...

  • 1314 Views
  • 1 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @M A​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks

  • 4 kudos
Shuvi
by New Contributor III
  • 2498 Views
  • 3 replies
  • 5 kudos

Resolved! What is the use case of having Azure Synapse(DWH) and Delta Lake ( Gold) given we can connect BI to delta directly

The curated zone is pushed to cloud data warehouse such as Synapse Dedicated SQL Pools which then acts as a serving layer for BI tools and analyst.I believe we can have models in gold layer and have BI connect to this layer or we can have serverless ...

  • 2498 Views
  • 3 replies
  • 5 kudos
Latest Reply
Shuvi
New Contributor III
  • 5 kudos

Thank you, so for a large workload, where we need lot of optimization we might need Synapse, but for a small/medium workload, we might have to stick to Delta Table

  • 5 kudos
2 More Replies
vaver_3
by New Contributor III
  • 15486 Views
  • 1 replies
  • 5 kudos

Resolved! ingest a .csv file with spaces in column names using Delta Live into a streaming table

How do I ingest a .csv file with spaces in column names using Delta Live into a streaming table? All of the fields should be read using the default behavior .csv files for DLT autoloader - as strings. Running the pipeline gives me an error about in...

  • 15486 Views
  • 1 replies
  • 5 kudos
Latest Reply
vaver_3
New Contributor III
  • 5 kudos

After additional googling on "withColumnRenamed", I was able to replace all spaces in column names with "_" all at once by using select and alias instead:@dlt.view( comment="" ) def vw_raw(): return ( spark.readStream.format("cloudF...

  • 5 kudos
amits
by New Contributor III
  • 4001 Views
  • 6 replies
  • 4 kudos

Tableau extract creation frozen

Heya,I'm having an issue with extract creation from a Delta lake table. Tableau is frozen on "Rows retrieved: X" for too long.I actually succeeded in creating the first extract but saw I was missing a column. I went ahead and did a full rewrite -even...

  • 4001 Views
  • 6 replies
  • 4 kudos
Latest Reply
Prabakar
Databricks Employee
  • 4 kudos

@Amit Steiner​ what is the size of the table. Do you see any error or does Tableau get frozen without any error? I believe this to be more of a Tableau-related issue than Databricks.What is the version of Tableau that you are using? What is the conne...

  • 4 kudos
5 More Replies
MadelynM
by Databricks Employee
  • 1764 Views
  • 1 replies
  • 7 kudos

2021-07-Webinar--Hassle-Free-Data-Ingestion-Social-1200x628

Thanks to everyone who joined the Hassle-Free Data Ingestion webinar. You can access the on-demand recording here. We're sharing a subset of the phenomenal questions asked and answered throughout the session. You'll find Ingestion Q&A listed first, f...

  • 1764 Views
  • 1 replies
  • 7 kudos
Latest Reply
Emily_S
New Contributor III
  • 7 kudos

Check out Part 2 of this Data Ingestion webinar to find out how to easily ingest semi-structured data at scale into your Delta Lake, including how to use Databricks Auto Loader to ingest JSON data into Delta Lake.

  • 7 kudos
Labels