Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Log4J: part 2Apparently log4j 2.15 is still vulnerable:https://www.lunasec.io/docs/blog/log4j-zero-day-update-on-cve-2021-45046/So beter use version 2.16.But as mentioned in several topics: Databricks does not use an impacted version.
Hello, I have set up my account storage on Azure with an ADLSGen2 and I have succeeded to save the delta table on my ADLSGen2, from there I have created my delta table on Databricks.From there I am unable to display the summary of my delta table unde...
Hello,Following Hubert comment, in order to create a delta table on Databricks from Azure, I had to use CLONE argument in order to copy the data plus the metadata of my delta table on Azure. In order to set up the connection between Databricks and A...
Is there a way for non admin (at workspace level) or users without having (SELECT, MODIFY on ANY File) to create tables (unmanaged/external) even though they are owner of the database in which they want to create tables in a Table Access Controlled c...
Grant privileges on all the explain tables to non admin user as ... where BIADMIN is the non admin user who wants to generate explain plans. AdvancedMD Login
Hi All,i am trying to install the Gstreamer using the instructions in the below link.https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=debian&pivots=programming-language-pyth...
Hi Community, We got an email from our IT Team regarding Apache Log4J Vulnerability. Just wanted to understand if our implementation will be affected by this or not. We are using the following library or package in our notebooksimport org.apache.log4...
Hi All, I'm just wondering when exactly the billing time starts for the DataBricks cluster? Is starting time included? If cluster creation time takes 3 minutes and query execution only 2, will I pay for 2 or 5?​Thanks in advance! MC
Billing for databricks DBUs starts when Spark Context becomes available. Billing for the cloud provider starts when the request for compute is received and the VMs are starting up.
Hi Everyone can someone help with creating custom queue for auto loader as given here as default FlushwithClose event is not getting created when my data is uploaded to blob as given in azure DB docscloudFiles.queueNameThe name of the Azure queue. If...
you need to setup notification service for blob/adls like here https://docs.databricks.com/spark/latest/structured-streaming/auto-loader-gen2.html#cloud-resource-managementsetUpNotificationServices will return queue name which later can be used in au...
dayofweek: https://docs.databricks.com/sql/language-manual/functions/dayofweek.htmlweekday : https://docs.databricks.com/sql/language-manual/functions/weekday.htmlAccording to the documentation , they both are synonym functions. But when I use it I n...
That's correct for weekday moday=0 for dayofweek Sunday=1.​​You can also look for documentation here https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.dayofweek.html​https://spark.apache.org/docs/latest/api/sql/index...
Hi,Any affect of CVE-2021-44228 problem on Databricks platform?Is there any action that needs to be done by Databricks customer related to CVE-2021-44228?
You can use Audit logs to fetch this dataQuery:%sqlSELECT DISTINCT userIdentity.email, sourceIPAddressFROM audit_logsWHERE serviceName = "accounts" AND actionName LIKE "%login%"Please find below the docs to analyse the Audit logshttps://docs.databric...
I have this notebook which is scheduled by Data Factory on a daily basis.It works fine, up to today. All of a sudden I keep on getting NullpointerException when writing the data.After some searching online, I disabled AQE. But this does not help.Th...
After some tests it seems that if I run the notebook on an interactive cluster, I only get 80% of load (Ganglia metrics).If I run the same notebook on a job cluster with the same VM types etc (so the only difference is interactive vs job), I get over...
Question - When you set a reoccuring job to simply update a notebook, does databricks clear the state of the notebook prior to executing the notebook? If not, can I configure it to make sure it clears the state before running?
What happens if we change the logic for the delta live tables and we do an incremental update. Does the table get reset (refresh) automatically or would it only apply the logic to new incoming data? would we have to trigger a reset in this case?
Here is my finding on when to refresh (reset) the table: If it is a complete table all the changes would be apply automatically. If the table is incremental table, you need to do a manually reset (full refresh).
Hi All, I am new to Databricks and am writing my first program.Note: Code Shown Below:I am creating a table with 3 columns to store data. 2 of the columns will be appended in from data that I have in another table.When I run my append query into the...
Hi Hubert,Your answer moves me closer to being able to update pieces of a 26 field MMR_Restated table in pieces are the correct fields values are calculated Thru the process. I have been looking for a way to be able to update in "pieces"...... 2 fie...