Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi All,In our ETL Framework, we have four layers Raw, Foundation ,Trusted & Unified .In raw we are copying the file in JSON Format from a source, using ADF pipeline .In the next Layer(i.e. Foundation) we are flattening the Json Files and converting t...
Hi @DataBricks_User9 c Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...
spark.sql('UPDATE kct SET ABC_basevalue = kct.ABC_KBETR * ABS(T5.UKUSA) FROM cond_azure_try_delta kct INNER JOIN VW_TCU_delta AS T5 ON kct.WRS = T5.TCU AND kct.ABC_konwa=t5.fc INNER JOIN tcu_delta t ON(kct.WRS=t.FC) WHERE kct.marked="1"')
Hi @sreekanth kesaram Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell u...
No matter GPU cluster of which size I create, cuda total capacity is always ~16 Gb. Does anyone know what is the issue?The code I use to get the total capacity:torch.cuda.get_device_properties(0).total_memory
Hi @Simon Zhang Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...
I'm using Azure Event Hubs Connector https://github.com/Azure/azure-event-hubs-spark to connect an Even Hub.When I install this library from Maven , then everything works, I can access lib classes using JVM:connection_string = "<connection_string>"
s...
Hi @blackcoffee AR Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...
For a KPI dashboard, we need to know the exact size of the data in a catalog and also all schemas inside the catalogs. What is the best way to do this? We tried to iterate over all tables and sum the sizeInBytes using the DESCRIBE DETAIL command for ...
Hi @Anant Pingle Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...
Hello,May i know when can i expect The instructor-led Advanced Data Engineering with Databricks course for Databricks Certified Data Engineer Professional will be available in 2023..or any references to prepare would be great.
@sarath endluri I saw an ILT training through my Partners Academy, scheduled for April 24-27. Today I just got an e-mail saying it was canceled because they needed to update the content. I asked for further info and got told it's going to be a while...
I am consuming an IoT stream with thousands of different signals using Structured Streaming. During processing of the stream, I need to know the previous timestamp and value for each signal in the micro batch. The signal stream is eventually written ...
@Suteja Kanuri Tried the above on streaming DFBut facing the below errorAttributeError: 'DataFrame' object has no attribute 'groupByKey'Can you please let me know DBR runtime
Hi @Rituparna Das Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...
I know scheduling options are there, but it doesn't consider interdependency like don't execute job2 unless job1 has been executed.I know, script based, or API based triggers are also there, but I am looking for UI based triggers like how we can orch...
Hi @satyam rastogi Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...
I want to make a connection to Databricks with KNIME. To do this I am using "Create Databricks Environment" node. I have made the following configuration:1. Installed Databricks Simba JDBC driver 2. Made the necessary configuration in Create Databric...
Hi @Geethanjali Nataraj Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell...
Using autoloader, I'm reading daily data partitioned by well. The data has a specific schema, but if there's no value for a column it isn't present in the json. For a specific column on a specific table I'm getting an error like:Cannot convert long ...
Hi @Jordan Fox Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...
Hi everyone!
I would like to know how spark stops the connection when reading from a sql database using the JDBC format.
Also, if there is a way to check when the connection is active or manually stop it, I also would like to know.
Thank you in advan...
Hi @João Peixoto Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...
Hi team, our application for partner onboarding was processed a year ago. The last email we received was from Procurement Operations Specialist confirming our information has been added to the system. However, we never received any advise on the next...
Hi @Siddhesh Gaikwad Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us...
I have mounted my path from Databricks to AzureADLS Gen1. using SPN as service accuntuntill yesterday everything was ok, but today I see, I can view all older deleted folders. I can not see them in ADLS . but my databricks dbutils.fs.ls() shows them....
Hi @pankaj bhatt Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...
Hi @Rituparna Das Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...