- 871 Views
- 1 replies
- 0 kudos
I Have a Dataframe stored in the format of delta into Adls, now when im trying to append new updated rows to that delta lake it should, Is there any way where i can delete the old existing record in delta and add the new updated Record.There is a uni...
- 871 Views
- 1 replies
- 0 kudos
Latest Reply
To achieve this you should use a merge command that will update rows that are existing with the unique ID. This will update the rows that already exist and insert the rows that do not. If you want to do it manually, you could delete rows using the DE...
- 569 Views
- 0 replies
- 1 kudos
am trying to do this tutorial about databricks sql analytics (https://docs.microsoft.com/en-us/azure/databricks/sql/get-started/admin-quickstart) but when i create my databricks workspace i do not have the icon at the bottom of the sidebar to access ...
- 569 Views
- 0 replies
- 1 kudos
- 665 Views
- 0 replies
- 0 kudos
I need to know if there is a way to delete a user from databricks using email only using SCIM api? As of now I can see it can only delete user by ID which means I need to first retrive the ID of the user and then use it to delete.I am using this api ...
- 665 Views
- 0 replies
- 0 kudos
- 4659 Views
- 1 replies
- 0 kudos
I am looking at the memory utilization of the executors and I see the heap utilization of the executor is far less than what is reported in the Ganglia. Why do ganglia report incorrect memory details.
- 4659 Views
- 1 replies
- 0 kudos
Latest Reply
Ganglia reports the memory utilization at the system level. Say for example if the JVM has Xmx value of 100 GB. At some point, it will occupy 100GB and then with a Garbage collection, it will clear off the heap. Once the GC frees up the memory, th...
- 414 Views
- 0 replies
- 0 kudos
I want to read the last modified datetime of the files in data lake in a databricks script. If I could read it efficiently as a column when reading data from data lake, it would be perfect.Thank you:)
- 414 Views
- 0 replies
- 0 kudos
- 1172 Views
- 0 replies
- 1 kudos
I work with parquet files stored in AWS S3 buckets. They are multiple TB in size and partitioned by a numeric column containing integer values between 1 and 200, call it my_partition. I read in and perform compute actions on this data in Databricks w...
- 1172 Views
- 0 replies
- 1 kudos
- 1632 Views
- 1 replies
- 0 kudos
I have ad-hoc one-time streaming queries where I believe checkpoint won't give any value add. Should I still use checkpointing
- 1632 Views
- 1 replies
- 0 kudos
Latest Reply
It's not mandatory. But the strong recommendation is to use Checkpointing for Streaming irrespective of your use case. This is because the default checkpoint location can get a lot of files over time as there is no graceful guaranteed cleaning in pla...
- 819 Views
- 2 replies
- 0 kudos
Its preferable to use spark streaming (with Delta) for batch workloads rather then regular batch. With the trigger.once trigger whenever the streaming job is started it will process whatever is available in the source (kafka/kinesis/File System) and ...
- 819 Views
- 2 replies
- 0 kudos
Latest Reply
The streaming checkpoint mechanism is independent of the Trigger type. The way checkpoint works are it creates an offset file when processing the batch and once the batch is completed it creates a commit file for that batch in the checkpoint director...
1 More Replies
- 565 Views
- 1 replies
- 0 kudos
I have an S3-SQS workload. Is it possible to migrate the workload to autoloader without downtime? What are the migration guidelines.
- 565 Views
- 1 replies
- 0 kudos
Latest Reply
The SQS queue used by the existing application can be utilized by the auto-loader thereby ensuring minimal downtime
- 2659 Views
- 1 replies
- 0 kudos
I have a Delta table that had schema changes in multiple commits. I wanted to track all these schema changes that happened on the Delta table. The "DESCRIBE HISTORY" is not useful as it logs the schema change made by ALTER TABLE operations.
- 2659 Views
- 1 replies
- 0 kudos
Latest Reply
When a write operation is performed with columns added. we are not explicitly showing that in DESCRIBE HISTORY output. Only an entry is made for write. and in the operation Parameters, it's not showing anything about schema evolution. whereas if we d...
- 12847 Views
- 1 replies
- 0 kudos
How can I change the log level of the Spark Driver and executor process?
- 12847 Views
- 1 replies
- 0 kudos
Latest Reply
Change the log level of Driver:%scala
spark.sparkContext.setLogLevel("DEBUG")
spark.sparkContext.setLogLevel("INFO")Change the log level of a particular package in Driver logs:%scala
org.apache.log4j.Logger.getLogger("shaded.databricks.v201809...