by
aladda
• Honored Contributor II
- 1658 Views
- 2 replies
- 0 kudos
I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with ...
- 1658 Views
- 2 replies
- 0 kudos
Latest Reply
Here's an example for predefined schemaUsing COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below%sql CREATE OR REPLACE TABLE copy_into_bronze_te...
1 More Replies
- 4334 Views
- 6 replies
- 3 kudos
In this blog I can see for dimension and fact tables, the primary key constraint has been applied. Following is the example:-- Store dimensionCREATE OR REPLACE TABLE dim_store( store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, business_key ...
- 4334 Views
- 6 replies
- 3 kudos
Latest Reply
@SRK Please see a copy of this answer on stackoverflow here. You can use DLT Expectations to have this check (see my previous answer if you're using SQL and not Python):@dlt.table(name="table1",)def create_df():schema = T.StructType([T.StructField("i...
5 More Replies
- 1676 Views
- 3 replies
- 2 kudos
Reading around 20 text files from ADLS, doing some transformations, and after that these files are written back to ADLS as a single delta file (all operations are in parallel through the thread pool). Here from 20 threads, it is writing to a single f...
- 1676 Views
- 3 replies
- 2 kudos
Latest Reply
I have seen this problem with Identity column causing concurrency issues. But you seem to be getting similar error when writing to files. I don't know completely know your use case completely here, but would advice retrying this operation by managing...
2 More Replies
- 1382 Views
- 2 replies
- 3 kudos
Is it possible to use merge command when source file is parquet and destination file is delta? Or both files must delta files? Currently, I'm using this code and I transform parquet into delta and it works. But I want to avoid of this tranformation.T...
- 1382 Views
- 2 replies
- 3 kudos
Latest Reply
Hi @Ales ventus​ We haven't heard from you since the last response from @Kaniz Fatma​ , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others...
1 More Replies
- 648 Views
- 1 replies
- 0 kudos
I am getting the following error when I try to run ML Models in Delta live Table Pipeline File "/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-55c61-9b898-2c4b6-d/mlflow/envs/virtualenv_envs/mlflow-888f8c9b966409e6bddca3894244b4df9d1f94c1/lib/pyth...
- 648 Views
- 1 replies
- 0 kudos
Latest Reply
@Vittal Pai​ - In general, please follow the below steps for the mlflow CLI error,Step 1: set up API token and create secrets as mentioned in the below documenthttps://docs.databricks.com/machine-learning/manage-model-lifecycle/multiple-workspaces.h...
- 614 Views
- 1 replies
- 0 kudos
My team currently uses Autoloader and Delta Live Tables to process incremental data from ADLS storage. We are needing to keep the same table and history, but switch the filepath to a different location in storage. When I test a filepath change, I rec...
- 614 Views
- 1 replies
- 0 kudos
Latest Reply
Autoloader doesn't support changing the source path for running job so if you change your source path your stream fails because the source path has changed. However, if you really want to change the path you can change it by using the new checkpoint ...
- 509 Views
- 2 replies
- 3 kudos
Hello Dolly: Democratizing the magic of ChatGPT with open modelsDatabricks has just released a groundbreaking new blog post exploring ChatGPT, an open-source language model with the potential to transform the way we interact with technology. From cha...
- 509 Views
- 2 replies
- 3 kudos
Latest Reply
Lets get candid! Let me know your initial thoughts about LLM Models, ChatGpt, Dolly.
1 More Replies
- 2367 Views
- 2 replies
- 36 kudos
Delta lake Vs Data lake in DatabricksDelta Lake is an open-source storage layer that sits on top of existing data lake storage, such as Azure Data Lake Store or Amazon S3. It provides a more robust and scalable alternative to traditional data lake st...
- 2367 Views
- 2 replies
- 36 kudos
Latest Reply
this data is very much informative and i understood much in it so thank you @Aviral Bhardwaj​ sir
1 More Replies
by
elgeo
• Valued Contributor II
- 2855 Views
- 1 replies
- 4 kudos
Hello experts. We are trying to execute an insert command with less columns than the target table:Insert into table_name( col1, col2, col10)Select col1, col2, col10from table_name2However the above fails with:Error in SQL statement: DeltaAnalysisExce...
- 2855 Views
- 1 replies
- 4 kudos
Latest Reply
Hi @ELENI GEORGOUSI​ Yes. When you are doing an insert, your provided schema should match with the target schema else it would throw an error.But you can still insert the data using another approach. Create a dataframe with your data having less colu...
by
MA
• New Contributor II
- 593 Views
- 1 replies
- 4 kudos
I'm attempting to stream into a DLT pipeline with data replicated from Fivetran directly into Delta tables in another database than the one that the DLT pipeline uses. This is not an aggregate, and I don't want to recompute the entire data model eac...
- 593 Views
- 1 replies
- 4 kudos
Latest Reply
Hi @M A​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks
by
Shuvi
• New Contributor III
- 1236 Views
- 3 replies
- 5 kudos
The curated zone is pushed to cloud data warehouse such as Synapse Dedicated SQL Pools which then acts as a serving layer for BI tools and analyst.I believe we can have models in gold layer and have BI connect to this layer or we can have serverless ...
- 1236 Views
- 3 replies
- 5 kudos
Latest Reply
Shuvi
New Contributor III
Thank you, so for a large workload, where we need lot of optimization we might need Synapse, but for a small/medium workload, we might have to stick to Delta Table
2 More Replies
- 8940 Views
- 1 replies
- 5 kudos
How do I ingest a .csv file with spaces in column names using Delta Live into a streaming table? All of the fields should be read using the default behavior .csv files for DLT autoloader - as strings. Running the pipeline gives me an error about in...
- 8940 Views
- 1 replies
- 5 kudos
Latest Reply
After additional googling on "withColumnRenamed", I was able to replace all spaces in column names with "_" all at once by using select and alias instead:@dlt.view(
comment=""
)
def vw_raw():
return (
spark.readStream.format("cloudF...
by
amits
• New Contributor III
- 1916 Views
- 8 replies
- 5 kudos
Heya,I'm having an issue with extract creation from a Delta lake table. Tableau is frozen on "Rows retrieved: X" for too long.I actually succeeded in creating the first extract but saw I was missing a column. I went ahead and did a full rewrite -even...
- 1916 Views
- 8 replies
- 5 kudos
Latest Reply
Hi @Amit Steiner​ ​, We haven’t heard from you on the last response from @Prabakar Ammeappin​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. ...
7 More Replies
- 1161 Views
- 1 replies
- 7 kudos
Thanks to everyone who joined the Hassle-Free Data Ingestion webinar. You can access the on-demand recording here. We're sharing a subset of the phenomenal questions asked and answered throughout the session. You'll find Ingestion Q&A listed first, f...
- 1161 Views
- 1 replies
- 7 kudos
Latest Reply
Check out Part 2 of this Data Ingestion webinar to find out how to easily ingest semi-structured data at scale into your Delta Lake, including how to use Databricks Auto Loader to ingest JSON data into Delta Lake.