Dear Community- Get ready to mark your calendars for the upcoming Databricks Community Social event! Happening on June 16th, 2023, this event promises to be the ultimate monthly gathering for everyone in the Databricks Community.Join us for an hour ...
Dear Community, Have you enrolled into the New Large Language Model Courses with edX yet?As Large Language Model (LLM) applications disrupt countless industries, generative AI is becoming an important foundational technology. The demand for LLM-based...
I have scheduled an examination on 1st June 2023 and due to personal reason, I have cancelled the examination on 26th May 2023 (more than 72 hours) but I am yet to receive the refund amount. In the auto generated mail it is mentioned that the refund ...
Hello everyone, I have an intermittent issue when trying to create a Delta table for the first time in Databricks: all the data gets converted into parquet at the specified location but the _delta_log is not created or, if created, it's left empty, t...
Can you list (display) the folder location "deltaLocation"? what files do you see here? have you try to use a new location for testing? do you get the same behavior?
I was updated some scripts when I all of the sudden got a few "internal server errors". I refreshed the webpage a couple of times and now I am unable to login to databricks.When I try to sign in it thinks for a few seconds and then I am rerouted back...
I have a delta table whose size will increases gradually now we have around 1.5 crores of rows while running vacuum command on that table i am getting the below error.ERROR: Job aborted due to stage failure: Task 7 in stage 491.0 failed 4 times, most...
Do you have access to the Executor 7 logs? is there a high GC or some other events that is making the heartbeat timeout? would you be able to check the failed stages?
Does anyone have a recommendation for something along the lines of Pydoc that can be used to aggregate docstrings and the like into documentation pages?I tried Pydoc and it failed because of the magic commands in my repo.
Sphinx could be an option here. It parses and render Databricks notebooks in the documentation. You might want to look into it to see if it fits your needs. However, it may not handle magic commands very well, and it assumes your notebooks are export...
I am working on converting manual global init scripts into a terraform IaC process for multiple environments. Within terraform, we are using the resource "databricks_global_init_script" and set the content_base64 with the following:base64encoded(<<-...
i have a delta table partitioned by a Date column , I'm trying to use the alter table drop partition command but get ALTER TABLE DROP PARTITION` is not supported for Delta tables erroris there a way to do it?
Hi @Hanan Segal Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...
My use-case is to process a dataset worth 100s of partitions in concurrency. The data is partitioned, and they are disjointed. I was facing ConcurrentAppendException due to S3 not supporting the “put-if-absent” consistency guarantee. From Delta Lake ...
Hi, You can refer to https://docs.databricks.com/optimizations/isolation-level.html#conflict-exceptions and recheck if everything is alright. Please let us know if this helps, also please tag @Debayan with your next response which will notify me, Th...
Is there a built-in utility function, e.g., dbutils, that can convert between path strings that start with "dbfs:" and "/dbfs"?Some operations, e.g, copying from one location in DBFS to another using dbutils.fs.cp() expect the path starting with "/db...
Hi @Fijoy Vadakkumpadan Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best a...
I have the following code in a notebook. It is randomly giving me the error, "At least one column must be specified for the table." The error occurs (if at all it occurs) only on the first run after attaching to a cluster.Cluster details:Summary5-1...
We had a databricks job that has strange behavior,when we passing 'output_path' to function saveAsTextFile and not output_path variable the data saved to the following path: s3://dev-databricks-hy1-rootbucket/nvirginiaprod/3219117805926709/output_pa...
I suspect you provided a dbfs path to save the data hence the data saved under your workspace root bucket.For the workspace root bucket, databricks workspace will interact with databricks credential to make sure databricks has access to it and able t...
I am unable to create bloom filter index on my tableCREATE BLOOMFILTER INDEX ON TABLE my_namespace.foo FOR COLUMNS (id OPTIONS (fpp = 0.1, numItems = 6000000))Gives the errorAnalysisException: Table `spark_catalog`.`my_namespace`.`foo` did not specif...
Hi, You can refer to https://issues.apache.org/jira/browse/SPARK-27617 for the above error. Please let us know if this helps, also please tag @Debayan with your next response which will notify me, Thank you!
I am reading the source table which gets updated every day. It is usually append/merge with updates and is occasionally overwritten for other reasons. df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('starting...