Data Engineering

Forum Posts

Sorted by:

by dispersion • New Contributor

02-13-2023 5:39:50 AM

2339 Views
2 replies
1 kudos

Running large volume of SQL queries in Python notebooks. How to minimise overheads/maintenance.

I have around 200 SQL queries id like to run in databricks python notebooks. Id like to avoid creating an ETL process for each of the 200 SQL processes.Any suggestions on how to run the queries in a way that it loops through them so i have minimum am...

Data Engineering

2339 Views
2 replies
1 kudos

02-13-2023 5:39:50 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-21-2023 2:17:23 AM

1 kudos

Hi @Chris French Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

1 kudos

02-21-2023 2:17:23 AM

1 More Replies

by SatishGunjal • New Contributor

07-19-2021 1:42:50 AM

3486 Views
1 replies
0 kudos

Data frame takes long time to print count of rows

We have a pyspark data frame with 50 MN records. We can display records from it, but it takes around 10 minutes to print the shape of dataframe. We aim to use this data for modelling that will take some numerical features based on the final data fra...

Data Engineering

3486 Views
1 replies
0 kudos

07-19-2021 1:42:50 AM

View Replies

Latest Reply

Hanna08
New Contributor II

08-30-2022 2:44:20 AM

0 kudos

Thanks for the detailed explanation. For those who want to have constant technical support for their work processes, I recommend JD Young. Here is only the latest information about the update in the world of information technology solutions and cyber...

0 kudos

08-30-2022 2:44:20 AM

by MallikSunkara • New Contributor II

07-22-2019 7:45:12 AM

10035 Views
4 replies
0 kudos

how to pass arguments and variables to databricks python activity from azure data factory

Data Engineering

10035 Views
4 replies
0 kudos

07-22-2019 7:45:12 AM

View Replies

Latest Reply

CristianIspan
New Contributor II

05-27-2021 6:43:58 AM

0 kudos

try importing argv from sys. Then if you have the parameter added correctly in DataFactory you could get it in your python script typing argv[1] (index 0 is the file path).

0 kudos

05-27-2021 6:43:58 AM

3 More Replies

by SergeyIvanchuk • New Contributor

11-16-2018 2:06:02 PM

13211 Views
4 replies
0 kudos

Seaborn plot display in Databricks

I am using Seaborn version 0.7.1 and matplotlib version 1.5.3 The following code does not display a graph in the end. Any idea how to resolve ? (works in Python CLI on my local computer) import seaborn as sns sns.set(style="darkgrid") tips = sns.lo...

Data Engineering

13211 Views
4 replies
0 kudos

11-16-2018 2:06:02 PM

View Replies

Latest Reply

AbbyLemon
New Contributor II

08-04-2020 2:58:33 PM

0 kudos

I found that you create a similar comparison plot as what you get from seaborn by using the display(sparkdf) and adding multiple columns to the 'Values' section while creating a 'Scatter plot'. You get to the 'Customize Plot' by clicking on the icon ...

0 kudos

08-04-2020 2:58:33 PM

3 More Replies

by ubsingh • New Contributor II

11-07-2019 3:44:50 AM

12862 Views
3 replies
1 kudos

Resolved! I want to create a function in azure Databricks notebook to send a email, based on a filter. Any leads are appriciated.

I have no idea from where to start

Data Engineering

12862 Views
3 replies
1 kudos

11-07-2019 3:44:50 AM

View Replies

Latest Reply

ubsingh
New Contributor II

11-13-2019 1:05:26 AM

1 kudos

Thanks for you help @leedabee. I will go through second option, First one is not applicable in my case.

1 kudos

11-13-2019 1:05:26 AM

2 More Replies

by AmitSukralia • New Contributor

06-02-2019 4:22:04 AM

33915 Views
5 replies
0 kudos

Listing all files under an Azure Data Lake Gen2 container

I am trying to find a way to list all files in an Azure Data Lake Gen2 container. I have mounted the storage account and can see the list of files in a folder (a container can have multiple level of folder hierarchies) if I know the exact path of th...

Data Engineering

33915 Views
5 replies
0 kudos

06-02-2019 4:22:04 AM

View Replies

Latest Reply

Balaji_su
New Contributor II

03-22-2020 10:04:37 AM

0 kudos

stackoverflow.pngfiles.txt

0 kudos

03-22-2020 10:04:37 AM

4 More Replies

by smanickam • New Contributor II

01-01-2020 10:08:46 PM

19996 Views
5 replies
3 kudos

com.databricks.sql.io.FileReadException: Error while reading file dbfs:

I ran the below statement and got the error %python data = sqlContext.read.parquet("/FileStore/tables/ganesh.parquet") display(data) Error: SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure:...

Data Engineering

19996 Views
5 replies
3 kudos

01-01-2020 10:08:46 PM

View Replies

Latest Reply

MatthewSzafir
New Contributor III

02-19-2020 12:41:11 PM

3 kudos

I'm having a similar issue reading a JSON file. It is ~550MB compressed and is on a single line: val cfilename = "c_datafeed_20200128.json.gz" val events = spark.read.json(s"/mnt/c/input1/$cfilename") display(events) The filename is correct and t...

3 kudos

02-19-2020 12:41:11 PM

4 More Replies

by asher • New Contributor II

06-27-2019 11:09:19 AM

9973 Views
1 replies
0 kudos

List all files in a Blob Container

I am trying to find a way to list all files, and related file sizes, in all folders and all sub folders. I guess these are called blobs, in the Databricks world. Anyway, I can easily list all files, and related file sizes, in one single folder, but ...

Data Engineering

9973 Views
1 replies
0 kudos

06-27-2019 11:09:19 AM

View Replies

Latest Reply

asher
New Contributor II

10-14-2019 1:38:26 PM

0 kudos

from azure.storage.blob import BlockBlobService block_blob_service = BlockBlobService(account_name='your_acct_name', account_key='your_acct_key') mylist = [] generator = block_blob_service.list_blobs('rawdata') for blob in generator: mylist.append(...

0 kudos

10-14-2019 1:38:26 PM