Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
HI,I have an init script which works on DBFS location during the cluster start up, but when the same shell script file is placed on ABFSS location (ADLS Gen 2 storage) I get the following init script failure error and the cluster is unable to start.E...
Hi @Saravana KJ I'm sorry you could not find a solution to your problem in the answers provided.Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.I suggest pr...
Hi @imma marra Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...
HI,I am tying to use the approxQuantile() function and populate a list that I made, yet somehow, whenever I try to run the code it's as if the list is empty and there are no values in it.Code is written as below:@dlt.table(name = "customer_order_silv...
Maybe try to use (and the first test in the separate notebook) standard df = spark.read.table("customer_order_silver") to calculate approxQuantile.Of course, you need to set that customer_order_silver has a target location in the catalog, so read us...
HI all,I have a table in MongoDB Atlas that I am trying to read continuously to memory and then will write that file out eventually. However, when I look at the in-memory table it doesn't have the correct schema.Code here:from pyspark.sql.types impo...
Hi @sharonbjehome , This has to be checked thoroughly via a support ticket, did you follow: https://docs.databricks.com/external-data/mongodb.html Also, could you please check with mongodb support, Was this working before?
Hi, I am getting this error increasingly while loading VDS to Dremio. Do you know how I can avoid it?Out[144]: {'statusCode': 400, 'headers': {'Content-Type': 'application/json'}, 'body': 'Failed - SYSTEM ERROR: UnsupportedOperationException: Additio...
hii tried to pull the report from below query%pythondf = quickbasePull('b5zj8k_pbz5_0_cd5h4wbbp77n4nvp56b4u','bqmnP8jm7',24)but it is giving me error report too largethen i tried below%pythonimport pyqbfrom pyspark.sql import *import pandas as pdqbc ...
There are a lot of datasets available in /databricks-datasets/ that you can look through. You'll have to turn them into a table so that you can access them in automl. There are datasets associated with the spark definitive guide and learning spark ...
HI, I'm interested to know if multiple executors to append the same hive table using saveAsTable or insertInto sparksql. will that cause any data corruption? What configuration do I need to enable concurrent write to same hive table? what about the s...
The Hive table will not like this, as the underlying data is parquet format which is not ACID compliant.Delta lake however is:https://docs.delta.io/0.5.0/concurrency-control.htmlYou can see that inserts do not give conflicts.
HI,I have a daily scheduled job which processes the data and write as parquet file in a specific folder structure like root_folder/{CountryCode}/parquetfiles. Where each day job will write new data for countrycode under the folder for countrycodeI am...
Most external consumers will read partition as column when are properly configured (for example Azure Data Factory or Power BI).Only way around is that you will duplicate column with other name (you can not have the same name as it will generate conf...