cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Discussions
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

validating record count at SQL server database tabbles with migrated azure data lake gen2

sai_sathya
New Contributor III

we are migrating out project from on-premise to azure , so on-premise database is the SQL server that we are using and azure data lake gen2 is the storage location where store data currently and so far we are currently validating record count of each table manually from sql server database tables and similarly we write pyspark code in databricks to write those data as parquet file and we validate record count from pyspark manually every time which is time consuming

is that possible to execute this process to make it automated in order to save time ?

can this be done by using pyspark code or is there any other solution?

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @sai_sathyaAutomating the process of validating record counts from SQL Server database tables and Azure data lake Gen2 can be done with PySpark code.
- PySpark script can connect to the SQL Server database, retrieve record counts for each table, write the data to Azure data lake Gen2 as a Parquet file, and validate the record count.
- The script can be scheduled to run automatically at specific intervals using Databricks jobs.
- Databricks jobs allow the scheduling of notebooks or JARs to automate the data validation process.
- The provided PySpark code demonstrates how to structure the code for this process.
- The code includes steps to connect to SQL Server, retrieve record counts, write data to Azure data lake Gen2, read the Parquet file, and validate record counts.
- The code can be customized by replacing the placeholders <hostname><port><database><table><username><password>, and <path_to_parquet_file> with actual values.

sai_sathya
New Contributor III

how much i ever tried doing that, im using my local system for checking and there are no firewalls or any security blocks this is the error message that i keep recieving and unable to fix it  : 
com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host SATHYA, port 1433 has failed. Error: "SATHYA. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".