- 1530 Views
- 2 replies
- 1 kudos
I have around 200 SQL queries id like to run in databricks python notebooks. Id like to avoid creating an ETL process for each of the 200 SQL processes.Any suggestions on how to run the queries in a way that it loops through them so i have minimum am...
- 1530 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @Chris French Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...
1 More Replies
- 2583 Views
- 1 replies
- 0 kudos
We have a pyspark data frame with 50 MN records. We can display records from it, but it takes around 10 minutes to print the shape of dataframe. We aim to use this data for modelling that will take some numerical features based on the final data fra...
- 2583 Views
- 1 replies
- 0 kudos
Latest Reply
Thanks for the detailed explanation. For those who want to have constant technical support for their work processes, I recommend JD Young. Here is only the latest information about the update in the world of information technology solutions and cyber...
- 8635 Views
- 4 replies
- 0 kudos
how to pass arguments and variables to databricks python activity from azure data factory
- 8635 Views
- 4 replies
- 0 kudos
Latest Reply
try importing argv from sys. Then if you have the parameter added correctly in DataFactory you could get it in your python script typing argv[1] (index 0 is the file path).
3 More Replies
- 10364 Views
- 4 replies
- 0 kudos
I am using Seaborn version 0.7.1 and matplotlib version 1.5.3
The following code does not display a graph in the end. Any idea how to resolve ? (works in Python CLI on my local computer)
import seaborn as sns
sns.set(style="darkgrid")
tips = sns.lo...
- 10364 Views
- 4 replies
- 0 kudos
Latest Reply
I found that you create a similar comparison plot as what you get from seaborn by using the display(sparkdf) and adding multiple columns to the 'Values' section while creating a 'Scatter plot'. You get to the 'Customize Plot' by clicking on the icon ...
3 More Replies
- 11329 Views
- 3 replies
- 1 kudos
I have no idea from where to start
- 11329 Views
- 3 replies
- 1 kudos
Latest Reply
Thanks for you help @leedabee. I will go through second option, First one is not applicable in my case.
2 More Replies
- 27772 Views
- 5 replies
- 0 kudos
I am trying to find a way to list all files in an Azure Data Lake Gen2 container. I have mounted the storage account and can see the list of files in a folder (a container can have multiple level of folder hierarchies) if I know the exact path of th...
- 27772 Views
- 5 replies
- 0 kudos
- 16162 Views
- 5 replies
- 3 kudos
I ran the below statement and got the error
%python
data = sqlContext.read.parquet("/FileStore/tables/ganesh.parquet")
display(data)
Error:
SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure:...
- 16162 Views
- 5 replies
- 3 kudos
Latest Reply
I'm having a similar issue reading a JSON file. It is ~550MB compressed and is on a single line:
val cfilename = "c_datafeed_20200128.json.gz"
val events = spark.read.json(s"/mnt/c/input1/$cfilename")
display(events)
The filename is correct and t...
4 More Replies
by
asher
• New Contributor II
- 8625 Views
- 1 replies
- 0 kudos
I am trying to find a way to list all files, and related file sizes, in all folders and all sub folders. I guess these are called blobs, in the Databricks world. Anyway, I can easily list all files, and related file sizes, in one single folder, but ...
- 8625 Views
- 1 replies
- 0 kudos
Latest Reply
from azure.storage.blob import BlockBlobService block_blob_service = BlockBlobService(account_name='your_acct_name', account_key='your_acct_key') mylist = [] generator = block_blob_service.list_blobs('rawdata') for blob in generator: mylist.append(...