cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Count on External Table to Azure Data Storage is taking too long

enavuio
New Contributor II

I have created an External table to Azure Data Lake Storage Gen2.

The Container has about 200K Json files.

The structure of the json files are created with

```

CREATE EXTERNAL TABLE IF NOT EXISTS dbo.table(  

  ComponentInfo STRUCT<ComponentHost: STRING, ComponentId: STRING, ComponentName: STRING, ComponentVersion: STRING, SubSystem: STRING>,

  CorrelationId STRING,  

  Event STRUCT<Category: STRING, EventName: STRING, MessageId: STRING, PublishTime: STRING, SubCategory: STRING>,  

  References STRUCT<CorrelationId: STRING>) 

  USING org.apache.spark.sql.json OPTIONS ('multiLine' = 'true') 

  LOCATION 'dbfs:/mnt/mnt'

```

Counting takes such a long time to run and still at stage 62 with 754 tasks. Loading top 200 is fine but is there an incorrect setup that needs to be addressed.  I have worked with Spark in AWS and decreased a Insert overwrite query to 1/2 the time so I am wondering if there is a better way to set this up.

Should it be partitioned?

Also, the Databricks Workspace is in US EAST and Storage account in US West 2 - could that be a culprit?

```

select count(*) from dbo.table

```

2 REPLIES 2

Debayan
Databricks Employee
Databricks Employee

Hi, transient network issues can be a problem. you can refer to https://learn.microsoft.com/en-us/azure/azure-sql/database/troubleshoot-common-errors-issues?view=az... . Also, It will be better to raise a azure support case so that the background network activities can be checked if the whole environment is set in Azure.

Anonymous
Not applicable

Hi @Ena Vuโ€‹ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!