Data Engineering

by andrew0117 • Contributor

01-05-2023 12:03:56 PM

5201 Views
6 replies
2 kudos

index a dataframe from a csv file based on the file's original order (not based on any specific column, based on the entire row) using spark

how to guarantee the index is always following the file's original order no matter what. Currently, I'm using val df = spark.read.options(Map("header"-> "true", "inferSchema" -> "true")).csv("filePath").withColumn("index", monotonically_increasing...

Data Engineering

5201 Views
6 replies
2 kudos

01-05-2023 12:03:56 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-05-2023 1:39:33 PM

2 kudos

monotonically_increasing_id will not as it is to guarantee that every partition has separate ids. What is the whole code? Do you load directory with a lot of CSVs? What "original order" means? Is it csvs ordered by file creation date, by file name? o...

2 kudos

01-05-2023 1:39:33 PM

5 More Replies

by ramankr48 • Contributor II

10-18-2022 4:08:43 AM

17732 Views
5 replies
8 kudos

Resolved! How to get all the tables name with a specific column or columns in a database?

let's say there is a database db in which 700 tables are there, and we need to find all the tables name in which column "project_id" is present.just an example for ubderstanding the questions.

Data Engineering

17732 Views
5 replies
8 kudos

10-18-2022 4:08:43 AM

View Replies

Latest Reply

Anonymous
Not applicable

10-18-2022 4:53:00 AM

8 kudos

databaseName = "db" desiredColumn = "project_id" database = spark.sql(f"show tables in {databaseName} ").collect() tablenames = [] for row in database: cols = spark.table(row.tableName).columns if desiredColumn in cols: tablenames.append(row....

8 kudos

10-18-2022 4:53:00 AM

4 More Replies

Databricks Community

Forum Posts

index a dataframe from a csv file based on the file's original order (not based on any specific column, based on the entire row) using spark

Resolved! How to get all the tables name with a specific column or columns in a database?