cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ogi
by New Contributor II
  • 690 Views
  • 4 replies
  • 1 kudos

Setting right processingTime

How to set just the right processingTime for readStream to maximize the performance? Based on which factors it depends and is there a way to measure this?

  • 690 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ogi
New Contributor II
  • 1 kudos

Thanks @Ajay Pandey​ and @Nandini N​ for your answers. I wanted to know more about what should I do in order to do it properly. Should I change processing times (1, 5, 10, 30, 60 seconds) and see how it affects running job in terms of time and CPU/me...

  • 1 kudos
3 More Replies
AdamRink
by New Contributor III
  • 1008 Views
  • 3 replies
  • 0 kudos

Resolved! Apply Avro defaults when writing to Confluent Kafka

I have an avro schema for my Kafka topic. In that schema it has defaults. I would like to exclude the defaulted columns from databricks and just let them default as an empty array. Sample avro, trying to not provide the UserFields because I can't...

  • 1008 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Adam Rink​ , Please go through the following blog. Let me know if it helps.https://docs.databricks.com/spark/latest/structured-streaming/avro-dataframe.html#example-with-schema-registry

  • 0 kudos
2 More Replies
Anonymous
by Not applicable
  • 612 Views
  • 2 replies
  • 4 kudos

Hello Everyone, I am thrilled to announce that we have our first winner for the raffle contest - @Uma Maheswara Rao Desula​ Please join me in congratu...

Hello Everyone,I am thrilled to announce that we have our first winner for the raffle contest - @Uma Maheswara Rao Desula​ Please join me in congratulating him on this remarkable achievement!UmaMahesh, your dedication and hard work have paid off, and...

Winner1
  • 612 Views
  • 2 replies
  • 4 kudos
Latest Reply
Sujitha
Community Manager
  • 4 kudos

@Uma Maheswara Rao Desula​  Congratulations on this well deserved win!! Can't wait for you to meet our Community peers at the Data + AI Summit 2023 in SFO.

  • 4 kudos
1 More Replies
Tim_T
by New Contributor
  • 551 Views
  • 1 replies
  • 0 kudos

Are training/ecommerce data tables available as CSVs?

The course "Apache Sparkâ„¢ Programming with Databricks" requires data sources such as training/ecommerce/events/events.parquet. Are these available as CSV files? My company's databricks configuration does not allow me to mount to such repositories, bu...

  • 551 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Tim Tremper​, The specific dataset you mentioned, "training/ecommerce/events/events.parquet", is in Parquet format, but you can easily convert it into a CSV format using Apache Spark™ on Databricks.Here's a step-by-step guide to convert the Parqu...

  • 0 kudos
sagnikml
by New Contributor III
  • 1708 Views
  • 1 replies
  • 3 kudos

How to become Databricks Consulting Partner?

I am a Databricks Certified Data Engineer Associate, I want my company to become Databricks Consulting Partner. What is the number of Databricks certified employees required for that?

  • 1708 Views
  • 1 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Sagnik Mandal​, As of my knowledge cutoff in September 2021, to become a Databricks Consulting Partner, your company needs to have at least two employees with Databricks certifications. However, requirements may change over time or vary based on ...

  • 3 kudos
anasse
by New Contributor II
  • 511 Views
  • 1 replies
  • 0 kudos

Delta table cannot reach into info from previous table in the pipeline

Hello I am new to the databricks usage. I am trying to create some complex transformation in a delta table pipeline. I have some table that are on streaming mode to collect data from an S3 then the silver layer to start the transformation but it seem...

  • 511 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Anasse Berahab​, You might be experiencing a synchronization issue between your silver and gold layers in your Delta Lake pipeline. To address this, you can use trigger and awaitTermination options to control the execution of your streaming queri...

  • 0 kudos
rohit8491
by New Contributor III
  • 2613 Views
  • 4 replies
  • 8 kudos

Azure Databricks Connectivity with Power BI Cloud - Firewall Whitelisting

Hi Support TeamWe want to connect to tables in Azure Databricks via Power BI. We are able to connect this via Power BI Desktop but when we try to Publish the same, we can see the dataset associated does not refresh and throws error from Powerbi.comIt...

  • 2613 Views
  • 4 replies
  • 8 kudos
Latest Reply
rohit8491
New Contributor III
  • 8 kudos

Hi NoorThank you soo much for your response. Please see the below details for the error message. I just got to know that Power BI are Azure Databricks are in different tenants. Do you think it causes any issues? Do we need VNet peering to be configur...

  • 8 kudos
3 More Replies
Pritam
by New Contributor II
  • 1561 Views
  • 3 replies
  • 0 kudos

Not able create Job via Jobs api in databricks

I am not able to create jobs via jobs API in databricks.Error=INVALID_PARAMETER_VALUE: Job settings must be specified.I simply copied the JSON file and saved it. Loaded the same JSON file and tried to create the job via API but the got the above erro...

  • 1561 Views
  • 3 replies
  • 0 kudos
Latest Reply
rAlex
New Contributor II
  • 0 kudos

@Pritam Arya​  I had the same problem today. In order to use the JSON that you can get from the GUI in an existing job, in a request to the Jobs API, you want to use just the JSON that is the value of the settings key.

  • 0 kudos
2 More Replies
keenan_jones7
by New Contributor II
  • 8628 Views
  • 3 replies
  • 5 kudos

Cannot create job through Jobs API

import requests import json instance_id = 'abcd.azuredatabricks.net' api_version = '/api/2.0' api_command = '/jobs/create' url = f"https://{instance_id}{api_version}{api_command}" headers = {'Authorization': 'Bearer myToken'} params = { "settings...

  • 8628 Views
  • 3 replies
  • 5 kudos
Latest Reply
rAlex
New Contributor II
  • 5 kudos

@keenan_jones7​ I had the same problem today. It looks like you've copied and pasted the JSON that Databricks displays in the GUI when you select View JSON from the dropdown menu when viewing a job.In order to use that JSON in a request to the Jobs ...

  • 5 kudos
2 More Replies
Valentin1
by New Contributor III
  • 3711 Views
  • 5 replies
  • 1 kudos

Delta Live Tables Incremental Batch Loads & Failure Recovery

Hello Databricks community,I'm working on a pipeline and would like to implement a common use case using Delta Live Tables. The pipeline should include the following steps:Incrementally load data from Table A as a batch.If the pipeline has previously...

  • 3711 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Valentin Rosca​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 1 kudos
4 More Replies
adrianlwn
by New Contributor III
  • 6655 Views
  • 18 replies
  • 17 kudos

How to activate ignoreChanges in Delta Live Table read_stream ?

Hello everyone, I'm using DLT (Delta Live Tables) and I've implemented some Change Data Capture for deduplication purposes. Now I am creating a downstream table that will read the DLT as a stream (dlt.read_stream("<tablename>")). I keep receiving thi...

  • 6655 Views
  • 18 replies
  • 17 kudos
Latest Reply
gopínath
New Contributor II
  • 17 kudos

In DLT read_stream, we can't use ignoreChanges / ignoreDeletes. These are the configs helps to avoid the failures but it is actually ignoring the operations done on the upstream. So you need to manually perform the deletes or updates in the downstrea...

  • 17 kudos
17 More Replies
Colter
by New Contributor II
  • 965 Views
  • 3 replies
  • 0 kudos

Is there a way to use cluster policies within jobs api to define cluster configuration rather than in the jobs api itself?

I want to create a cluster policy that is referenced by most of our repos/jobs so we have one place to update whenever there is a spark version change or when we need to add additional spark configurations. I figured cluster policies might be a good ...

  • 965 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Colter Nattrass​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 0 kudos
2 More Replies
tototox
by New Contributor III
  • 1299 Views
  • 3 replies
  • 2 kudos

dbutils.fs.ls overlaps with managed storage error

I created a schema with that route as a managed location.(abfss://~~@~~.dfs.core.windows.net/dejeong/)However, I dropped shcema with the cascade option, and also entered the azure portal and deleted the path directly. and made it again(abfss://~~@~~....

  • 1299 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @jin park​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...

  • 2 kudos
2 More Replies
Dean_Lovelace
by New Contributor III
  • 1203 Views
  • 3 replies
  • 4 kudos

What is the Pyspark equivalent of FSCK REPAIR TABLE?

I am using the delta format and occasionaly get the following error:-"xx.parquet referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement"FS...

  • 1203 Views
  • 3 replies
  • 4 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 4 kudos

## Delta check when a file was added %scala (oldest-version-available to newest-version-available).map { version => var df = spark.read.json(f"<delta-table-location>/_delta_log/$version%020d.json").where("add is not null").select("add.path") var ...

  • 4 kudos
2 More Replies
Dean_Lovelace
by New Contributor III
  • 2016 Views
  • 3 replies
  • 0 kudos

Delta Table Optimize Error

I have have started getting an error message when running the following optimize command:-deltaTable.optimize().executeCompaction()Error:-java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Number of records changed after Optimi...

  • 2016 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Dean Lovelace​ :The error message suggests that the number of records in the Delta table changed after the optimize() command was run. The optimize() command is used to improve the performance of Delta tables by removing small files and compacting l...

  • 0 kudos
2 More Replies
Labels
Top Kudoed Authors