I have created an External table to Azure Data Lake Storage Gen2.
The Container has about 200K Json files.
The structure of the json files are created with
```
CREATE EXTERNAL TABLE IF NOT EXISTS dbo.table(
ComponentInfo STRUCT<ComponentHost: STRING, ComponentId: STRING, ComponentName: STRING, ComponentVersion: STRING, SubSystem: STRING>,
CorrelationId STRING,
Event STRUCT<Category: STRING, EventName: STRING, MessageId: STRING, PublishTime: STRING, SubCategory: STRING>,
References STRUCT<CorrelationId: STRING>)
USING org.apache.spark.sql.json OPTIONS ('multiLine' = 'true')
LOCATION 'dbfs:/mnt/mnt'
```
Counting takes such a long time to run and still at stage 62 with 754 tasks. Loading top 200 is fine but is there an incorrect setup that needs to be addressed. I have worked with Spark in AWS and decreased a Insert overwrite query to 1/2 the time so I am wondering if there is a better way to set this up.
Should it be partitioned?
Also, the Databricks Workspace is in US EAST and Storage account in US West 2 - could that be a culprit?
```
select count(*) from dbo.table
```