@Indra Limenaโ :
There are several ways to improve transactional loading into Delta Lake:
- Use Delta Lake's native Delta JDBC/ODBC connector instead of a third-party ODBC driver like Simba. The native connector is optimized for Delta Lake and supports bulk inserts, which can significantly improve performance.
- Use the Delta Lake bulk insert API to load data in batches instead of inserting one record at a time. This can also significantly improve performance.
- Use the Delta Lake streaming API to load data in real-time as it becomes available. This can be useful for use cases where you need to load data as soon as it becomes available.
- Partition your Delta Lake tables by a key column that you frequently use to filter the data. This can improve query performance by reducing the amount of data that needs to be scanned.
- Use Delta Lake's Z-Ordering feature to physically organize the data in the table based on one or more columns. This can further improve query performance by allowing Delta Lake to skip entire files or partitions that don't contain the relevant data.
As for the Simba ODBC driver, it's possible that there is an option to leverage bulk loading, but you would need to consult the documentation or contact the vendor to find out. However, even if there is an option to use bulk loading, it may not be as optimized as the native Delta Lake connector or the bulk insert API.
In general, if you're looking to perform bulk data migration or daily data synchronization from a source system to Delta Lake, it's recommended to use a tool that is optimized for that use case, such as Apache NiFi or Apache Airflow. These tools can handle large volumes of data and provide mechanisms for efficient and reliable data transfer.