- 10260 Views
- 4 replies
- 2 kudos
Reading around 20 text files from ADLS, doing some transformations, and after that these files are written back to ADLS as a single delta file (all operations are in parallel through the thread pool). Here from 20 threads, it is writing to a single f...
- 10260 Views
- 4 replies
- 2 kudos
Latest Reply
How can we import the exception "MetadataChangedException"?Or does Databricks recommend to catch / except Exception and parse the string?
3 More Replies
by
aladda
• Databricks Employee
- 4171 Views
- 2 replies
- 0 kudos
I've reviewed the COPY INTO docs here - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-copy-into.html#examples but there's only one simple example. Looking for some additional examples that show loading data from CSV - with ...
- 4171 Views
- 2 replies
- 0 kudos
Latest Reply
Here's an example for predefined schemaUsing COPY INTO with a predefined table schema – Trick here is to CAST the CSV dataset into your desired schema in the select statement of COPY INTO. Example below%sql CREATE OR REPLACE TABLE copy_into_bronze_te...
1 More Replies
- 9857 Views
- 6 replies
- 3 kudos
In this blog I can see for dimension and fact tables, the primary key constraint has been applied. Following is the example:-- Store dimensionCREATE OR REPLACE TABLE dim_store( store_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, business_key ...
- 9857 Views
- 6 replies
- 3 kudos
Latest Reply
@SRK Please see a copy of this answer on stackoverflow here. You can use DLT Expectations to have this check (see my previous answer if you're using SQL and not Python):@dlt.table(name="table1",)def create_df():schema = T.StructType([T.StructField("i...
5 More Replies
- 3055 Views
- 1 replies
- 2 kudos
Is it possible to use merge command when source file is parquet and destination file is delta? Or both files must delta files? Currently, I'm using this code and I transform parquet into delta and it works. But I want to avoid of this tranformation.T...
- 3055 Views
- 1 replies
- 2 kudos
Latest Reply
Hi @Ales ventus We haven't heard from you since the last response from @Kaniz Fatma , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others...
- 1370 Views
- 1 replies
- 0 kudos
I am getting the following error when I try to run ML Models in Delta live Table Pipeline File "/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-55c61-9b898-2c4b6-d/mlflow/envs/virtualenv_envs/mlflow-888f8c9b966409e6bddca3894244b4df9d1f94c1/lib/pyth...
- 1370 Views
- 1 replies
- 0 kudos
Latest Reply
@Vittal Pai - In general, please follow the below steps for the mlflow CLI error,Step 1: set up API token and create secrets as mentioned in the below documenthttps://docs.databricks.com/machine-learning/manage-model-lifecycle/multiple-workspaces.h...
- 1484 Views
- 1 replies
- 0 kudos
My team currently uses Autoloader and Delta Live Tables to process incremental data from ADLS storage. We are needing to keep the same table and history, but switch the filepath to a different location in storage. When I test a filepath change, I rec...
- 1484 Views
- 1 replies
- 0 kudos
Latest Reply
Autoloader doesn't support changing the source path for running job so if you change your source path your stream fails because the source path has changed. However, if you really want to change the path you can change it by using the new checkpoint ...
- 1358 Views
- 2 replies
- 3 kudos
Hello Dolly: Democratizing the magic of ChatGPT with open modelsDatabricks has just released a groundbreaking new blog post exploring ChatGPT, an open-source language model with the potential to transform the way we interact with technology. From cha...
- 1358 Views
- 2 replies
- 3 kudos
Latest Reply
Lets get candid! Let me know your initial thoughts about LLM Models, ChatGpt, Dolly.
1 More Replies
- 6854 Views
- 2 replies
- 36 kudos
Delta lake Vs Data lake in DatabricksDelta Lake is an open-source storage layer that sits on top of existing data lake storage, such as Azure Data Lake Store or Amazon S3. It provides a more robust and scalable alternative to traditional data lake st...
- 6854 Views
- 2 replies
- 36 kudos
Latest Reply
this data is very much informative and i understood much in it so thank you @Aviral Bhardwaj sir
1 More Replies
by
elgeo
• Valued Contributor II
- 5094 Views
- 1 replies
- 4 kudos
Hello experts. We are trying to execute an insert command with less columns than the target table:Insert into table_name( col1, col2, col10)Select col1, col2, col10from table_name2However the above fails with:Error in SQL statement: DeltaAnalysisExce...
- 5094 Views
- 1 replies
- 4 kudos
Latest Reply
Hi @ELENI GEORGOUSI Yes. When you are doing an insert, your provided schema should match with the target schema else it would throw an error.But you can still insert the data using another approach. Create a dataframe with your data having less colu...
by
MA
• New Contributor II
- 1550 Views
- 1 replies
- 4 kudos
I'm attempting to stream into a DLT pipeline with data replicated from Fivetran directly into Delta tables in another database than the one that the DLT pipeline uses. This is not an aggregate, and I don't want to recompute the entire data model eac...
- 1550 Views
- 1 replies
- 4 kudos
Latest Reply
Hi @M A Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks
by
Shuvi
• New Contributor III
- 2890 Views
- 3 replies
- 5 kudos
The curated zone is pushed to cloud data warehouse such as Synapse Dedicated SQL Pools which then acts as a serving layer for BI tools and analyst.I believe we can have models in gold layer and have BI connect to this layer or we can have serverless ...
- 2890 Views
- 3 replies
- 5 kudos
Latest Reply
Shuvi
New Contributor III
Thank you, so for a large workload, where we need lot of optimization we might need Synapse, but for a small/medium workload, we might have to stick to Delta Table
2 More Replies
- 16341 Views
- 1 replies
- 5 kudos
How do I ingest a .csv file with spaces in column names using Delta Live into a streaming table? All of the fields should be read using the default behavior .csv files for DLT autoloader - as strings. Running the pipeline gives me an error about in...
- 16341 Views
- 1 replies
- 5 kudos
Latest Reply
After additional googling on "withColumnRenamed", I was able to replace all spaces in column names with "_" all at once by using select and alias instead:@dlt.view(
comment=""
)
def vw_raw():
return (
spark.readStream.format("cloudF...
by
amits
• New Contributor III
- 4509 Views
- 6 replies
- 4 kudos
Heya,I'm having an issue with extract creation from a Delta lake table. Tableau is frozen on "Rows retrieved: X" for too long.I actually succeeded in creating the first extract but saw I was missing a column. I went ahead and did a full rewrite -even...
- 4509 Views
- 6 replies
- 4 kudos
Latest Reply
@Amit Steiner what is the size of the table. Do you see any error or does Tableau get frozen without any error? I believe this to be more of a Tableau-related issue than Databricks.What is the version of Tableau that you are using? What is the conne...
5 More Replies
- 1929 Views
- 1 replies
- 7 kudos
Thanks to everyone who joined the Hassle-Free Data Ingestion webinar. You can access the on-demand recording here. We're sharing a subset of the phenomenal questions asked and answered throughout the session. You'll find Ingestion Q&A listed first, f...
- 1929 Views
- 1 replies
- 7 kudos
Latest Reply
Check out Part 2 of this Data Ingestion webinar to find out how to easily ingest semi-structured data at scale into your Delta Lake, including how to use Databricks Auto Loader to ingest JSON data into Delta Lake.