cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tom_shaffner
by New Contributor III
  • 11243 Views
  • 3 replies
  • 2 kudos

How to take only the most recent record from a variable number of tables in a stream

Short version: I need a way to take only the most recent record from a variable number of tables in a stream. This is a relatively easy problem in sql or python pandas (group by and take the newest) but in a stream I keep hitting blocks. I could do i...

temp" data-fileid="0698Y00000JF9NlQAL
  • 11243 Views
  • 3 replies
  • 2 kudos
Latest Reply
HÃ¥kon_Ã…mdal
New Contributor III
  • 2 kudos

Did you try storing it all to a DELTA table with a MERGE INTO [1]? You can optionally specify a condition on "WHEN MATCHED" such that you only insert if the timestamp is newer.[1] https://docs.databricks.com/spark/latest/spark-sql/language-manual/del...

  • 2 kudos
2 More Replies
tom_shaffner
by New Contributor III
  • 10668 Views
  • 1 replies
  • 2 kudos

"Detected a data update", what changed?

In streaming flows I periodically get a "Detected a data update" error. This error generally seem to indicate that something has changed in the source table schema, but it's not immediately apparent what. In one case yesterday I pulled the source tab...

  • 10668 Views
  • 1 replies
  • 2 kudos
Latest Reply
tom_shaffner
New Contributor III
  • 2 kudos

@Kaniz Fatma​ , Thanks, that helps. I was assuming this warning indicated a schema evolution, and based on what you say it likely wasn't and I just have to turn on IgnoreChanges any time I have a stream from a table that receives updates/upserts.To b...

  • 2 kudos
Rahul_Samant
by Contributor
  • 12727 Views
  • 4 replies
  • 4 kudos

Resolved! Bucketing on Delta Tables

getting error as below while creating buckets on delta table.Error in SQL statement: AnalysisException: Delta bucketed tables are not supported.have fall back to parquet table due to this for some use cases. is their any alternative for this. i have...

  • 12727 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rahul Samant​  , we checked internally on this due to certain limitations bucketing is not supported on delta tables, the only alternative for bucketing is to leverage the z ordering, below is the link for reference https://docs.databricks.com/de...

  • 4 kudos
3 More Replies
Development
by New Contributor III
  • 5631 Views
  • 5 replies
  • 5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

  • 5631 Views
  • 5 replies
  • 5 kudos
Latest Reply
Development
New Contributor III
  • 5 kudos

@Kaniz Fatma​ @Parker Temple​  I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing  serialization issue ....

  • 5 kudos
4 More Replies
athjain
by New Contributor III
  • 3483 Views
  • 5 replies
  • 9 kudos

Resolved! Control visibility of delta tables at sql endpoint

Hi Community,Let's take a scenario where the data from s3 is read to create delta table and then stored on dbfs, and then to query these delta table we used mysql endpoint from where all the delta tables are visible, but we need to control which all ...

  • 3483 Views
  • 5 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

Hey @Athlestan Jain​ Just checking in. Do you think you were able to find a solution to your problem from the above answers?  If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thank you!

  • 9 kudos
4 More Replies
PJ
by New Contributor III
  • 2639 Views
  • 3 replies
  • 3 kudos

Resolved! How should you optimize <1GB delta tables?

I have seen the following documentation that details how you can work with the OPTIMIZE function to improve storage and querying efficiency. However, most of the documentation focuses on big data, 10 GB or larger. I am working with a ~7million row ...

  • 2639 Views
  • 3 replies
  • 3 kudos
Latest Reply
PJ
New Contributor III
  • 3 kudos

Thank you @Hubert Dudek​ !! So I gather from your response that it's totally fine to have a delta table that lives under 1 file that's roughly 211 MB. And I can use OPTIMIZE in conjunction with ZORDER to filter on a frequently filtered, high cardina...

  • 3 kudos
2 More Replies
Databricks_7045
by New Contributor III
  • 4212 Views
  • 2 replies
  • 4 kudos

Resolved! Connecting Delta Tables from any Tools

Hi Team,To access SQL Tables we use tools like TOAD , SQL SERVER MANAGEMENT STUDIO (SSMS).Is there any tool to connect and access Databricks Delta tables.Please let us know.Thank you

  • 4212 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rajesh Vinukonda​ Hope you are doing well. Thanks for sending in your question. Were you able to find a solution to your query?

  • 4 kudos
1 More Replies
Constantine
by Contributor III
  • 3629 Views
  • 1 replies
  • 5 kudos

Resolved! Unable to create a partitioned table on s3 data

I write data to s3 like data.write.format("delta").mode("append").option("mergeSchema", "true").save(s3_location)and create a partitioned table likeCREATE TABLE IF NOT EXISTS demo_table USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};whi...

  • 3629 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@John Constantine​ , In CREATE TABLE, you need to specify fields:CREATE TABLE IF NOT EXISTS demo_table (column_a STRING, number INT) USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};and when you save data before creating ...

  • 5 kudos
Constantine
by Contributor III
  • 2368 Views
  • 1 replies
  • 5 kudos

Resolved! Delta Table created on s3 has all null values

I have data in a Spark Dataframe and I write it to an s3 location. It has some complex datatypes like structs etc. When I create the table on top on the s3 location by using CREATE TABLE IF NOT EXISTS table_name USING DELTA LOCATION 's3://.../...';Th...

  • 2368 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@John Constantine​ ,Try to load it as DataFrame (spark.read.delta(path)) and validate what is loading,It could be easier to mount the S3 location as a folder to ensure that all data is there (dbutils or %fs to check) and that the connection is workin...

  • 5 kudos
Abel_Martinez
by Contributor
  • 2171 Views
  • 1 replies
  • 1 kudos

Resolved! Create data bricks service account

Hi all, I need to create service account users who can only query some delta tables. I guess I do that by creating the user and granting select right to the desired tables. But Data bricks requests a mail account for these users. Is there a way to cr...

  • 2171 Views
  • 1 replies
  • 1 kudos
Latest Reply
Abel_Martinez
Contributor
  • 1 kudos

HI @Kaniz Fatma​ , I've checked the link but the standard method requires a mailbox and the user creation using SCIM API looks too complicated. I solved the issue, I created a mailbox for the service account and I created the user using that mailbox....

  • 1 kudos
Shehan92
by New Contributor II
  • 4242 Views
  • 2 replies
  • 4 kudos

Resolved! Error in accessing Delta Tables

I'm getting attached error in accessing delta lake tables in the data bricks workspaceSummary of error: Could not connect to md1n4trqmokgnhr.csnrqwqko4ho.ap-southeast-1.rds.amazonaws.com:3306 : Connection resetAttached detailed error

  • 4242 Views
  • 2 replies
  • 4 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 4 kudos

Caused by: java.sql.SQLNonTransientConnectionException: Could not connect to md1n4trqmokgnhr.csnrqwqko4ho.ap-southeast-1.rds.amazonaws.com:3306 : Connection reset at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:...

  • 4 kudos
1 More Replies
prasadvaze
by Valued Contributor II
  • 23681 Views
  • 7 replies
  • 3 kudos

Resolved! How to make delta table column values case-insensitive?

 we have many delta tables with string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case. Its lot of code change to use upper/lower function on column value compare (in ...

  • 23681 Views
  • 7 replies
  • 3 kudos
Latest Reply
lizou
Contributor II
  • 3 kudos

Well, the unintended benefit is now I am using int\big int as surrogate keysfor all tables (preferred in DW). All joins are made on integer data types. Query efficiency is also improved.The string matching using upper() is done only on ETL when com...

  • 3 kudos
6 More Replies
AzureDatabricks
by New Contributor III
  • 10026 Views
  • 7 replies
  • 2 kudos

Resolved! Can we store 300 million records and what is the preferable compute type and config?

How we can persist 300 million records? What is the best option to persist data databricks hive metastore/Azure storage/Delta table?What is the limitations we have for deltatables of databricks in terms of data?We have usecase where testers should be...

  • 10026 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

You can certainly store 300 million records without any problem.The best option kinda depends on the use case. If you want to do a lot of online querying on the table, I suggest using delta lake, which is optimeized (using z-order, bloom filter, par...

  • 2 kudos
6 More Replies
SRS
by New Contributor II
  • 3983 Views
  • 3 replies
  • 5 kudos

Resolved! Delta Tables incremental backup method

Hello,Does anyone tried to create an incremental backup on delta tables? What I mean is to load into the backup storage only the latest parquet files part of the Delta Table and to refresh the _delta_log folder, instead of copying the whole files aga...

  • 3983 Views
  • 3 replies
  • 5 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 5 kudos

Hi @Stefan Stegaru​ ,You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek​  mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep...

  • 5 kudos
2 More Replies
Anonymous
by Not applicable
  • 1820 Views
  • 2 replies
  • 0 kudos

Resolved! What are the advantages of using Delta if I am using MLflow? How is Delta useful for DS/ML use cases?

I am already using MLflow. What benefit would Delta provide me since I am not really working on Data engineering workloads

  • 1820 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sebastian
Contributor
  • 0 kudos

The most important aspect is your experiment can track the version of the data table. So during audits you will be able to trace back why a specific prediction was made.

  • 0 kudos
1 More Replies
Labels