I have a list of dataframes (for this example 2) and want to apply a for-loop to the list of frames to generate 2 new dataframes. To start, here is my starting dataframe called df_final:First, I create 2 dataframes: df2_b2c_fast, df2_b2b_fast:for x i...
Conducting a security review or vendor assessment of Databricks and looking to learn more about our security features, compliance information, and privacy policies?You can find the latest on Databricks security features, architecture, compliance and ...
I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.Following is the issue:java.lang.IllegalArgumentException:...
I got the issue resolved. The issues was by mistake we have duplicate columns in the schema files. Because of that it was showing that error. However, the error is totally mis-leading, that's why didn't able to rectify it.
Hi,I have 10 workspaces linked to different departments. We have overall 4 users doing some activity on these 10 workspaces . I want to get the list of users who are all operating on which tables and what operation they have performed and all in all ...
Hi Ranjit,for tablets, I believe it's hard but if you want to combine all 10 workspaces you can use the databricks API for cluster lists https://docs.databricks.com/dev-tools/api/latest/index.htmland then you can check their IAM roles to understand w...
IntroductionI would like to use Alert feature for monitor job status (from log table) in Databricks-SQL.So, I have write a query in a query notebook (or object) to return result from log table. Also, I have set the alert object for monitoring and tri...
I am not seeing any direct option to export or version control the alert object other than the migrate option.https://docs.databricks.com/sql/api/queries-dashboards.html - check this link, it might help you in other way.
Scope of Data Governance in Databricks. How we can implement it and is there any data limit for this to implement. I would like to know more about Cost wise.
Ask your technical questions at Databricks Office Hours!November 16 - 8:00 AM - 9:00 AM PT: Register HereNovember 30 - 11:00 AM - 12:00 PM PT: Register HereDatabricks Office Hours connects you directly with experts to answer all your Databricks quest...
Q&A Recap from 11/30 Office HoursQ: What is the downside of using z-ordering and auto optimize? It seems like there could be a tradeoff with writing small files (whereas it is good at reading a larger file), is that true?A: By default, Delta Lake on ...
Scenario: I Have a dataframe with more than 1000 rows, each row having a file path and result data column. I need to loop through each row and write files to the file path, with data from the result column.what is the easiest and time effective way ...
Hi,​I agree with Werners, try to avoid loop with Pyspark Dataframe.If your dataframe is small, as you said, only about 1000 rows, you may consider to use Pandas.Thanks.​
I am trying to read a sql file in the repo to string. I have triedwith open("/Workspace/Repos/xx@***.com//file.sql","r") as queryFile:
queryText = queryFile.read()And I get following error.[Errno 1] Operation not permitted: '/Workspace/Repos/***@*...
Following are the details of the requirement:1. I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2. I am using Spark code to read data from Kafka and write into landing...
Hi @Swapnil Kamle​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...
My csv data looks like this‡‡companyId‡‡,‡‡empId‡‡,‡‡regionId‡‡,‡‡companyVersion‡‡,‡‡Question‡‡I tried this codedff = spark.read.option("header", "true").option("inferSchema", "true").option("delimiter", "‡,").csv(f"/mnt/data/path/datafile.csv")But I...
Hi @shamly pt​ I took a bit another approach since I guess no one would be sure of the the encoding of the data you showed. Sample data I took :‡‡companyId‡‡,‡‡empId‡‡,‡‡regionId‡‡,‡‡companyVersion‡‡,‡‡Question‡‡‡‡1‡‡,‡‡121212‡‡,‡‡R‡‡,‡‡1.0A‡‡,‡‡NA‡‡...
Only the GUI seems to allow SELECT and USE_SCHEMA 'account users' permissions on catalogs. Terraform gives me an error. Here is my Terraform config:resource "databricks_grants" "staging" { provider = databricks.workspace catalog = databricks_catalog....
Hi @Andrei Radulescu-Banu​ ,Which version of the provider are you using?I did check the github repo it should work:https://github.com/databricks/terraform-provider-databricks/blob/d65ef3518074a48e079080d94e1ab33a80bf7e0f/catalog/resource_grants.go#L1...
I'm trying to use delta live tables, but if I import even the example notebooks I get a warning saying `ModuleNotFoundError: No module named 'dlt'`. If I try and install via pip it attempts to install a deep learning framework of some sort.I checked ...
Here's the solution I came up with... Replace `import dlt` at the top of your first cell with the following: try:
import dlt # When run in a pipeline, this package will exist (no way to import it here)
except ImportError:
class dlt...
I would need to add a filter condition while ingesting data from a Cosmos Mongo DB using Databricks,I am using the below query to ingest data of a Cosmos Collection:df = spark.read \.format('com.mongodb.spark.sql.DefaultSource') \.option('uri', sourc...
Hi @Swapnil Sarkar​, The error message means the stage name in your aggregation pipeline request wasn't recognised. The solution will be to ensure that all aggregation pipeline names are valid in your request.This article describes common errors and ...
I need to execute union statement which is framed dynamically and stored in string variable. I framed the union statement, but struck with executing the statement. Does anyone know how to execute union statement stored in string variable? I'm using p...
@Dineshkumar Gopalakrishnan​ using python's exec() function can be used to execute a python statement, which in your case could be pyspark union statement. Refer below sample code snippet for your reference.df1 = spark.sparkContext.parallelize([(1, 2...