Dear Community, Hope you are doing well.For the last couple of days I am seeing very strange issues with my DLT pipeline, So every 60-70 mins it is getting failed in continuous mode, with the ERROR; INTERNAL_ERROR: Communication lost with driver. Clu...
Hello @Debayan , I am facing same issue, while running Delta live table, This job is running in produtcuion, but it's not working in dev, i have tried to increae the worker nodes but no use. Can you please help on this.
Question about spark checkpoints and offsets in a running streamWhen the stream started I needed tons of partitions, so we've set it with spark.conf to 5000As expected offsets in the checkpoint contain this info and the job used this value. Then we'...
@Jose Gonzalez​ thanks for that information! This is super useful. I was struggling why my streaming still using 200 partitions. This is quite a paint for me because changing checkpoint will insert all data from the source. Do you know where this can...
Hello Team,I am trying to use the below libraries in data bricks .But they are not supporting.import com.microsoft.spark.sqlanalyticsfrom com.microsoft.spark.sqlanalytics.Constants import ConstantsPlease advise the correct Libraries nameRegardsRohit
Hi @Rohit Kulkarni​ , We haven't heard from you on the last response from @Akash Bhat​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to oth...
Hi Team,I am building a DLT pipeline and planning to use APPLY_CHANGES from Bronze to Silver. In the bronze table, a column has a json value. This value contains questions and answers as key, value pair and can change depending on list of questions h...
Hi @Gilg,
Question 1:APPLY_CHANGES in Delta Live Tables is designed to handle changes in the data. However, it does not inherently understand the structure of JSON data. It treats a JSON column as a single value and can detect changes in the JSON co...
What is the problem?I am getting this error every time I run a python notebook on my Repo in Databricks.BackgroundThe notebook where I am getting the error is a notebook that creates a dataframe and the last step is to write the dataframe to a Delta ...
Hi @Sara Corral​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...
User sessions automatically timeout after six hours of idle time. This is not configurable like @Kunal Gaurav​ mentioned. Please raise a feature request if you have a requirement to configure this.Now, in Azure you could configure AAD refresh token ...
I'm on Demo and Lab in Dataframes section. I've imported the dbc into my company cluster and has run "%run ./Includes/Classroom-Setup" successfully. When i run the 1st sql command %sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/m...
I had the same issue and solved it like this:In the includes folder, there is a reset notebook, run the first command, this unmounts all mounted databases.Go back to the ASP 1.2 notebook and run the %run ./Includes/Classroom-Setup codeblock.Then run ...
When I tried to read the SFTP (CSV file) in Databricks I'm getting the below error"JSchException: Algorithm negotiation fail"Code:var df = spark.read.options(Map("header"->"true","host"->"20.118.190.30","username"->"user","password"->"pass","fileForm...
@MonishKumar Could you provide the entire exception ?From the one line error message, I suspect this is due to the SSL cipher suites required by the SFTP server is not available on cluster. You can run below to get the cipher suites that sftp require...
Hi allIn the minimal example below you can see that executing a merge statement trigger recomputation of a persisted dataframe. How does this happen? from delta.tables import DeltaTable
table_name = "hive_metastore.default.test_table"
# initializ...
Hi @FabriceDeseyn , The recomputation of the persisted dataframe occurs due to the nature of the merge operation in Delta Lake. When a merge operation is executed, it triggers a re-evaluation of the DataFrame, which includes re-computing any persiste...
I have a DLT code to create 40+ bronze tables. The tables are created on top of the latest parquet files for each of those tables. While executing the pipeline, sometimes I notice that the graph is different than the regular one i see. I do not under...
@Kaniz - Thank you for your response. There is no change in the table dependencies. The code to create the individual raw tables look like this: The input to this is always the same 40 tables with only the underlying parquet file changing. I cant un...
We are in the process of implementing data mesh in our organization. When trying to help the different teams produce raw data, the absolute majority want to do this through their APIs. We tried an implementation where we did web-requests directly fro...
Hi @samuraidjakk ,
- Databricks provides a REST API that allows for programmatically managing the platform- Optimized connectors are available for many data formats and cloud services- Auto Loader can efficiently process new data files as they arriv...
Hi guys I have a question regarding this merge step and I am a new beginner for Databricks, trying to do some study in data warehousing, but couldn't figure it out by myself. need your help with it. Appreciate your help in advance. I got this questi...
Hi @Julie285720, The merge operation in Databricks upserts data from a source table into a target table.
In your case, the source table people10mupdates and the target table is people10m.
The merge operation works as follows:- It matches each recor...
I would like to know why I am getting this error when I tried to earn badges for lakehouse fundamentals. I can't access the quiz page. Can you please help on this?Getting below error:-403FORBIDDENYou don't have permission to access this page2023-08-...
create job with cli, but can not set the permission with cli,have to use rest api to set permission:https://docs.databricks.com/api/workspace/permissions/setbelow is my command in windows to set permission:curl -X PUT https://my-workspace-url.azureda...
for the below query:select base64(
aes_encrypt(
'00',
to_binary(secret('secret-scope', 'data-protection-secret'), 'BASE64')
)
)when the value to encrypt is 2 characters, the return is always "REDACTED_POSSIBLE_SECRET_ACC...