cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Performance issue with Simba ODBC Driver to perform simple insert command to Deltalake

Indra
New Contributor

Hi,

Our team is using Simba ODBC to perform data loading to Deltalake, and For a table with 3 columns it took around 55 seconds to insert 15 records.

How to improve transactional loading into Deltalake? is there any option from the Simba ODBC driver to leverage Bulk Load for Deltalake (very important for Data Migration) ? Or is there a way in Deltalake to configure its table to support/perform better for transactional workload (very important for Daily Data synchornization from Source to Deltalake) ?

Thanks

1 REPLY 1

Anonymous
Not applicable

@Indra Limenaโ€‹ :

There are several ways to improve transactional loading into Delta Lake:

  1. Use Delta Lake's native Delta JDBC/ODBC connector instead of a third-party ODBC driver like Simba. The native connector is optimized for Delta Lake and supports bulk inserts, which can significantly improve performance.
  2. Use the Delta Lake bulk insert API to load data in batches instead of inserting one record at a time. This can also significantly improve performance.
  3. Use the Delta Lake streaming API to load data in real-time as it becomes available. This can be useful for use cases where you need to load data as soon as it becomes available.
  4. Partition your Delta Lake tables by a key column that you frequently use to filter the data. This can improve query performance by reducing the amount of data that needs to be scanned.
  5. Use Delta Lake's Z-Ordering feature to physically organize the data in the table based on one or more columns. This can further improve query performance by allowing Delta Lake to skip entire files or partitions that don't contain the relevant data.

As for the Simba ODBC driver, it's possible that there is an option to leverage bulk loading, but you would need to consult the documentation or contact the vendor to find out. However, even if there is an option to use bulk loading, it may not be as optimized as the native Delta Lake connector or the bulk insert API.

In general, if you're looking to perform bulk data migration or daily data synchronization from a source system to Delta Lake, it's recommended to use a tool that is optimized for that use case, such as Apache NiFi or Apache Airflow. These tools can handle large volumes of data and provide mechanisms for efficient and reliable data transfer.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group