How to set just the right processingTime for readStream to maximize the performance? Based on which factors it depends and is there a way to measure this?
Thanks @Ajay Pandey​ and @Nandini N​ for your answers. I wanted to know more about what should I do in order to do it properly. Should I change processing times (1, 5, 10, 30, 60 seconds) and see how it affects running job in terms of time and CPU/me...
I have an avro schema for my Kafka topic. In that schema it has defaults. I would like to exclude the defaulted columns from databricks and just let them default as an empty array. Sample avro, trying to not provide the UserFields because I can't...
Hi @Adam Rink​ , Please go through the following blog. Let me know if it helps.https://docs.databricks.com/spark/latest/structured-streaming/avro-dataframe.html#example-with-schema-registry
Hello Everyone,I am thrilled to announce that we have our first winner for the raffle contest - @Uma Maheswara Rao Desula​ Please join me in congratulating him on this remarkable achievement!UmaMahesh, your dedication and hard work have paid off, and...
@Uma Maheswara Rao Desula​ Congratulations on this well deserved win!! Can't wait for you to meet our Community peers at the Data + AI Summit 2023 in SFO.
The course "Apache Sparkâ„¢ Programming with Databricks" requires data sources such as training/ecommerce/events/events.parquet. Are these available as CSV files? My company's databricks configuration does not allow me to mount to such repositories, bu...
Hi @Tim Tremper​, The specific dataset you mentioned, "training/ecommerce/events/events.parquet", is in Parquet format, but you can easily convert it into a CSV format using Apache Spark™ on Databricks.Here's a step-by-step guide to convert the Parqu...
I am a Databricks Certified Data Engineer Associate, I want my company to become Databricks Consulting Partner. What is the number of Databricks certified employees required for that?
Hi @Sagnik Mandal​, As of my knowledge cutoff in September 2021, to become a Databricks Consulting Partner, your company needs to have at least two employees with Databricks certifications. However, requirements may change over time or vary based on ...
Hello I am new to the databricks usage. I am trying to create some complex transformation in a delta table pipeline. I have some table that are on streaming mode to collect data from an S3 then the silver layer to start the transformation but it seem...
Hi @Anasse Berahab​, You might be experiencing a synchronization issue between your silver and gold layers in your Delta Lake pipeline. To address this, you can use trigger and awaitTermination options to control the execution of your streaming queri...
Hi Support TeamWe want to connect to tables in Azure Databricks via Power BI. We are able to connect this via Power BI Desktop but when we try to Publish the same, we can see the dataset associated does not refresh and throws error from Powerbi.comIt...
Hi NoorThank you soo much for your response. Please see the below details for the error message. I just got to know that Power BI are Azure Databricks are in different tenants. Do you think it causes any issues? Do we need VNet peering to be configur...
I am not able to create jobs via jobs API in databricks.Error=INVALID_PARAMETER_VALUE: Job settings must be specified.I simply copied the JSON file and saved it. Loaded the same JSON file and tried to create the job via API but the got the above erro...
@Pritam Arya​ I had the same problem today. In order to use the JSON that you can get from the GUI in an existing job, in a request to the Jobs API, you want to use just the JSON that is the value of the settings key.
@keenan_jones7​ I had the same problem today. It looks like you've copied and pasted the JSON that Databricks displays in the GUI when you select View JSON from the dropdown menu when viewing a job.In order to use that JSON in a request to the Jobs ...
Hello Databricks community,I'm working on a pipeline and would like to implement a common use case using Delta Live Tables. The pipeline should include the following steps:Incrementally load data from Table A as a batch.If the pipeline has previously...
Hi @Valentin Rosca​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...
Hello everyone, I'm using DLT (Delta Live Tables) and I've implemented some Change Data Capture for deduplication purposes. Now I am creating a downstream table that will read the DLT as a stream (dlt.read_stream("<tablename>")). I keep receiving thi...
In DLT read_stream, we can't use ignoreChanges / ignoreDeletes. These are the configs helps to avoid the failures but it is actually ignoring the operations done on the upstream. So you need to manually perform the deletes or updates in the downstrea...
I want to create a cluster policy that is referenced by most of our repos/jobs so we have one place to update whenever there is a spark version change or when we need to add additional spark configurations. I figured cluster policies might be a good ...
Hi @Colter Nattrass​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...
I created a schema with that route as a managed location.(abfss://~~@~~.dfs.core.windows.net/dejeong/)However, I dropped shcema with the cascade option, and also entered the azure portal and deleted the path directly. and made it again(abfss://~~@~~....
Hi @jin park​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...
I am using the delta format and occasionaly get the following error:-"xx.parquet referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement"FS...
## Delta check when a file was added
%scala
(oldest-version-available to newest-version-available).map { version =>
var df = spark.read.json(f"<delta-table-location>/_delta_log/$version%020d.json").where("add is not null").select("add.path")
var ...
I have have started getting an error message when running the following optimize command:-deltaTable.optimize().executeCompaction()Error:-java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Number of records changed after Optimi...
@Dean Lovelace​ :The error message suggests that the number of records in the Delta table changed after the optimize() command was run. The optimize() command is used to improve the performance of Delta tables by removing small files and compacting l...