cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tom_shaffner
by New Contributor III
  • 5435 Views
  • 3 replies
  • 2 kudos

How to take only the most recent record from a variable number of tables in a stream

Short version: I need a way to take only the most recent record from a variable number of tables in a stream. This is a relatively easy problem in sql or python pandas (group by and take the newest) but in a stream I keep hitting blocks. I could do i...

temp" data-fileid="0698Y00000JF9NlQAL
  • 5435 Views
  • 3 replies
  • 2 kudos
Latest Reply
Håkon_Åmdal
New Contributor III
  • 2 kudos

Did you try storing it all to a DELTA table with a MERGE INTO [1]? You can optionally specify a condition on "WHEN MATCHED" such that you only insert if the timestamp is newer.[1] https://docs.databricks.com/spark/latest/spark-sql/language-manual/del...

  • 2 kudos
2 More Replies
tom_shaffner
by New Contributor III
  • 5316 Views
  • 2 replies
  • 3 kudos

"Detected a data update", what changed?

In streaming flows I periodically get a "Detected a data update" error. This error generally seem to indicate that something has changed in the source table schema, but it's not immediately apparent what. In one case yesterday I pulled the source tab...

  • 5316 Views
  • 2 replies
  • 3 kudos
Latest Reply
tom_shaffner
New Contributor III
  • 3 kudos

@Kaniz Fatma​ , Thanks, that helps. I was assuming this warning indicated a schema evolution, and based on what you say it likely wasn't and I just have to turn on IgnoreChanges any time I have a stream from a table that receives updates/upserts.To b...

  • 3 kudos
1 More Replies
Development
by New Contributor III
  • 2783 Views
  • 8 replies
  • 5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

  • 2783 Views
  • 8 replies
  • 5 kudos
Latest Reply
Development
New Contributor III
  • 5 kudos

@Kaniz Fatma​ @Parker Temple​  I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing  serialization issue ....

  • 5 kudos
7 More Replies
Rahul_Samant
by Contributor
  • 7375 Views
  • 5 replies
  • 3 kudos

Resolved! Bucketing on Delta Tables

getting error as below while creating buckets on delta table.Error in SQL statement: AnalysisException: Delta bucketed tables are not supported.have fall back to parquet table due to this for some use cases. is their any alternative for this. i have...

  • 7375 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Rahul Samant​  , we checked internally on this due to certain limitations bucketing is not supported on delta tables, the only alternative for bucketing is to leverage the z ordering, below is the link for reference https://docs.databricks.com/de...

  • 3 kudos
4 More Replies
athjain
by New Contributor III
  • 1751 Views
  • 5 replies
  • 9 kudos

Resolved! Control visibility of delta tables at sql endpoint

Hi Community,Let's take a scenario where the data from s3 is read to create delta table and then stored on dbfs, and then to query these delta table we used mysql endpoint from where all the delta tables are visible, but we need to control which all ...

  • 1751 Views
  • 5 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

Hey @Athlestan Jain​ Just checking in. Do you think you were able to find a solution to your problem from the above answers?  If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thank you!

  • 9 kudos
4 More Replies
PJ
by New Contributor III
  • 1108 Views
  • 3 replies
  • 3 kudos

Resolved! How should you optimize <1GB delta tables?

I have seen the following documentation that details how you can work with the OPTIMIZE function to improve storage and querying efficiency. However, most of the documentation focuses on big data, 10 GB or larger. I am working with a ~7million row ...

  • 1108 Views
  • 3 replies
  • 3 kudos
Latest Reply
PJ
New Contributor III
  • 3 kudos

Thank you @Hubert Dudek​ !! So I gather from your response that it's totally fine to have a delta table that lives under 1 file that's roughly 211 MB. And I can use OPTIMIZE in conjunction with ZORDER to filter on a frequently filtered, high cardina...

  • 3 kudos
2 More Replies
Constantine
by Contributor III
  • 2027 Views
  • 2 replies
  • 5 kudos

Resolved! Unable to create a partitioned table on s3 data

I write data to s3 like data.write.format("delta").mode("append").option("mergeSchema", "true").save(s3_location)and create a partitioned table likeCREATE TABLE IF NOT EXISTS demo_table USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};whi...

  • 2027 Views
  • 2 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @John Constantine​ , Did the above suggestions provided by @Hubert Dudek​ help your case?

  • 5 kudos
1 More Replies
Constantine
by Contributor III
  • 1358 Views
  • 2 replies
  • 5 kudos

Resolved! Delta Table created on s3 has all null values

I have data in a Spark Dataframe and I write it to an s3 location. It has some complex datatypes like structs etc. When I create the table on top on the s3 location by using CREATE TABLE IF NOT EXISTS table_name USING DELTA LOCATION 's3://.../...';Th...

  • 1358 Views
  • 2 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @John Constantine​ , Did you try the above suggestions?

  • 5 kudos
1 More Replies
Databricks_7045
by New Contributor III
  • 2403 Views
  • 2 replies
  • 4 kudos

Resolved! Connecting Delta Tables from any Tools

Hi Team,To access SQL Tables we use tools like TOAD , SQL SERVER MANAGEMENT STUDIO (SSMS).Is there any tool to connect and access Databricks Delta tables.Please let us know.Thank you

  • 2403 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rajesh Vinukonda​ Hope you are doing well. Thanks for sending in your question. Were you able to find a solution to your query?

  • 4 kudos
1 More Replies
Shehan92
by New Contributor II
  • 2275 Views
  • 3 replies
  • 4 kudos

Resolved! Error in accessing Delta Tables

I'm getting attached error in accessing delta lake tables in the data bricks workspaceSummary of error: Could not connect to md1n4trqmokgnhr.csnrqwqko4ho.ap-southeast-1.rds.amazonaws.com:3306 : Connection resetAttached detailed error

  • 2275 Views
  • 3 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @Shehan Madusanka​ , Are you still seeing the error or were you able to resolve it?

  • 4 kudos
2 More Replies
Abel_Martinez
by Contributor
  • 1101 Views
  • 3 replies
  • 1 kudos

Resolved! Create data bricks service account

Hi all, I need to create service account users who can only query some delta tables. I guess I do that by creating the user and granting select right to the desired tables. But Data bricks requests a mail account for these users. Is there a way to cr...

  • 1101 Views
  • 3 replies
  • 1 kudos
Latest Reply
Abel_Martinez
Contributor
  • 1 kudos

HI @Kaniz Fatma​ , I've checked the link but the standard method requires a mailbox and the user creation using SCIM API looks too complicated. I solved the issue, I created a mailbox for the service account and I created the user using that mailbox....

  • 1 kudos
2 More Replies
prasadvaze
by Valued Contributor II
  • 12717 Views
  • 8 replies
  • 3 kudos

Resolved! How to make delta table column values case-insensitive?

 we have many delta tables with string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case. Its lot of code change to use upper/lower function on column value compare (in ...

  • 12717 Views
  • 8 replies
  • 3 kudos
Latest Reply
lizou
Contributor II
  • 3 kudos

Well, the unintended benefit is now I am using int\big int as surrogate keysfor all tables (preferred in DW). All joins are made on integer data types. Query efficiency is also improved.The string matching using upper() is done only on ETL when com...

  • 3 kudos
7 More Replies
AzureDatabricks
by New Contributor III
  • 6071 Views
  • 7 replies
  • 2 kudos

Resolved! Can we store 300 million records and what is the preferable compute type and config?

How we can persist 300 million records? What is the best option to persist data databricks hive metastore/Azure storage/Delta table?What is the limitations we have for deltatables of databricks in terms of data?We have usecase where testers should be...

  • 6071 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

You can certainly store 300 million records without any problem.The best option kinda depends on the use case. If you want to do a lot of online querying on the table, I suggest using delta lake, which is optimeized (using z-order, bloom filter, par...

  • 2 kudos
6 More Replies
SRS
by New Contributor II
  • 2151 Views
  • 3 replies
  • 5 kudos

Resolved! Delta Tables incremental backup method

Hello,Does anyone tried to create an incremental backup on delta tables? What I mean is to load into the backup storage only the latest parquet files part of the Delta Table and to refresh the _delta_log folder, instead of copying the whole files aga...

  • 2151 Views
  • 3 replies
  • 5 kudos
Latest Reply
jose_gonzalez
Moderator
  • 5 kudos

Hi @Stefan Stegaru​ ,You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek​  mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep...

  • 5 kudos
2 More Replies
Anonymous
by Not applicable
  • 965 Views
  • 2 replies
  • 0 kudos

Resolved! What are the advantages of using Delta if I am using MLflow? How is Delta useful for DS/ML use cases?

I am already using MLflow. What benefit would Delta provide me since I am not really working on Data engineering workloads

  • 965 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sebastian
Contributor
  • 0 kudos

The most important aspect is your experiment can track the version of the data table. So during audits you will be able to trace back why a specific prediction was made.

  • 0 kudos
1 More Replies
Labels