Topics with Label: Partition Column

Forum Posts

Sorted by:

by f2008700 • New Contributor III

06-04-2023 11:51:00 PM

19837 Views
6 replies
7 kudos

Configuring average parquet file size

I have S3 as a data source containing sample TPC dataset (10G, 100G).I want to convert that into parquet files with an average size of about ~256MiB. What configuration parameter can I use to set that?I also need the data to be partitioned. And withi...

Data Engineering

19837 Views
6 replies
7 kudos

06-04-2023 11:51:00 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-09-2023 7:37:16 PM

7 kudos

Hi @Vikas Goel We haven't heard from you since the last response from @Werner Stinckens , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to o...

7 kudos

06-09-2023 7:37:16 PM

5 More Replies

by tech2cloud • New Contributor II

02-11-2023 6:20:28 AM

3291 Views
2 replies
0 kudos

Databricks Autoloader streamReader does not include the partition column as part of output.

I have folder structure at source such as/transaction/date_=2023-01-20/hr_=02/tras01.csv/transaction/date_=2023-01-20/hr_=03/tras02.csvWhere 'date_' and 'hr_' are my partitions and present in the dataset as well. But the streamReader does not read th...

Data Engineering

3291 Views
2 replies
0 kudos

02-11-2023 6:20:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 3:15:19 AM

0 kudos

Hi @Ravi Vishwakarma Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

0 kudos

04-10-2023 3:15:19 AM

1 More Replies

by hari • Contributor

10-14-2022 6:07:06 AM

25181 Views
3 replies
7 kudos

How to add the partition for an existing delta table

We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow conc...

Data Engineering

25181 Views
3 replies
7 kudos

10-14-2022 6:07:06 AM

View Replies

Latest Reply

hari
Contributor

10-18-2022 4:50:33 AM

7 kudos

Updated the description

7 kudos

10-18-2022 4:50:33 AM

2 More Replies

by Phani1 • Databricks MVP

09-28-2022 9:45:35 AM

2551 Views
2 replies
5 kudos

Delta table Concurrent Updates for Non-partitioned tables

When we implemented the concurrent updates on a table which do not have a partition column we ran into ConcurrentAppendException [ensured where the condition is different for each concurrent update statement]So do we need to go by partition approach ...

Data Engineering

2551 Views
2 replies
5 kudos

09-28-2022 9:45:35 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

09-28-2022 12:14:21 PM

5 kudos

Please check that both streaming queries don't use the same checkpoint,Auto increment id can also make problems as it is kept in schemaSchema evolution also can make problems

5 kudos

09-28-2022 12:14:21 PM

1 More Replies

Databricks Community

Configuring average parquet file size

Databricks Autoloader streamReader does not include the partition column as part of output.

How to add the partition for an existing delta table

Delta table Concurrent Updates for Non-partitioned tables