Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am looking on EXPLAIN EXTENDED plan for a statement.In == Physical Plan == section, I go down to FileScan node and see a lot of ellipsis, like +- FileScan parquet schema.table[Time#8459,TagName#8460,Value#8461,Quality#8462,day#8...
Hi,sometime I notice that running a query takes too long - even simple queries - and next time when I run same query it runs much faster. I have cluster running (DBR 10.4 LTS • 5 workers) and it has constantly several workers.An Example of query is s...
Hi,I want to mount an uncrypted AWS EFS in AWS Databricks. When I do:mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-abcdef.efs.region.amazonaws.com:/ /mnt/efs-uncryptedI get this error:mount.nfs4: moun...
Hello,We are using a Azure Databricks with Standard DS14_V2 Cluster with Runtime 9.1 LTS, Spark 3.1.2 and Scala 2.12 and facing the below issue frequently when running our ETL pipeline. As part of the operation that is failing there are several joins...
Hey man,Please use these configuration in your cluster and it will work,spark.sql.storeAssignmentPolicy LEGACYspark.sql.parquet.binaryAsString truespark.speculation falsespark.sql.legacy.timeParserPolicy LEGACYif it wont work let me know what problem...
I have a list of dataframes (for this example 2) and want to apply a for-loop to the list of frames to generate 2 new dataframes. To start, here is my starting dataframe called df_final:First, I create 2 dataframes: df2_b2c_fast, df2_b2b_fast:for x i...
Conducting a security review or vendor assessment of Databricks and looking to learn more about our security features, compliance information, and privacy policies?You can find the latest on Databricks security features, architecture, compliance and ...
I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.Following is the issue:java.lang.IllegalArgumentException:...
I got the issue resolved. The issues was by mistake we have duplicate columns in the schema files. Because of that it was showing that error. However, the error is totally mis-leading, that's why didn't able to rectify it.
Hi,I have 10 workspaces linked to different departments. We have overall 4 users doing some activity on these 10 workspaces . I want to get the list of users who are all operating on which tables and what operation they have performed and all in all ...
Hi Ranjit,for tablets, I believe it's hard but if you want to combine all 10 workspaces you can use the databricks API for cluster lists https://docs.databricks.com/dev-tools/api/latest/index.htmland then you can check their IAM roles to understand w...
IntroductionI would like to use Alert feature for monitor job status (from log table) in Databricks-SQL.So, I have write a query in a query notebook (or object) to return result from log table. Also, I have set the alert object for monitoring and tri...
I am not seeing any direct option to export or version control the alert object other than the migrate option.https://docs.databricks.com/sql/api/queries-dashboards.html - check this link, it might help you in other way.
Scope of Data Governance in Databricks. How we can implement it and is there any data limit for this to implement. I would like to know more about Cost wise.
Ask your technical questions at Databricks Office Hours!November 16 - 8:00 AM - 9:00 AM PT: Register HereNovember 30 - 11:00 AM - 12:00 PM PT: Register HereDatabricks Office Hours connects you directly with experts to answer all your Databricks quest...
Q&A Recap from 11/30 Office HoursQ: What is the downside of using z-ordering and auto optimize? It seems like there could be a tradeoff with writing small files (whereas it is good at reading a larger file), is that true?A: By default, Delta Lake on ...
Scenario: I Have a dataframe with more than 1000 rows, each row having a file path and result data column. I need to loop through each row and write files to the file path, with data from the result column.what is the easiest and time effective way ...
Hi,I agree with Werners, try to avoid loop with Pyspark Dataframe.If your dataframe is small, as you said, only about 1000 rows, you may consider to use Pandas.Thanks.