Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
ObjectiveWithin the context of a delta live table, I'm trying to merge two streams aggregation, but run into challenges. Is it possible to achieve such a join?ContextAssume- table trades stores a list of trades with their associated time stamps- tabl...
Hello ,This is my first post here and I am a total beginner with DataBricks and spark.Working on an IoT Cloud project with azure , I'm looking to set up a continuous stream processing of data.A current architecture already exists thanks to Stream Ana...
So the event hub creates files (json/csv) on adls.You can read those files into databricks with the spark.read.csv/json method. If you want to read many files in one go, you can use wildcards.f.e. spark.read.json("/mnt/datalake/bronze/directory/*/*...
I passed my DE associate exam, but unable to see/download my certificate on credentials.databricks.com. I am using the same email as the one on Kryterion on webassessor.com/databricks.I can log invto Kryterion and see that I have passed the exam
September 2022 Featured Member Interview Aman Sehgal - @AmanSehgal Pronouns: He, Him Company: CyberCXJob Title: Senior Data Engineer Could you give a brief description of your professional journey to date? A. I started my career as software develope...
Thank you @Lindsay Olson​ and @Christy Seto​ for interviewing me and nominating me as this months featured member. It's a pleasure to be member of Databricks community and I'm looking forward to contribute more in future.To all the community members...
I'm trying to create delta live table on top of json files placed in azure blob. The json files contains white spaces in column names instead of renaming I tried `columnMapping` table property which let me create the table with spaces but the column ...
I have a new (bronze) table that I want to write to - the initial table load (refresh) csv file is placed in folder a, the incremental changes (inserts/updates/deletes) csv files are placed in folder b. I've written a notebook that can load one OR t...
When I try to convert a notebook into a job I frequently run into an issue with writing to the local filesystem. For this particular example, I did all my notebook testing with a bytestream for small files. When I tried to run as a job, I used the me...
I was able to fix it. It was an issue with the nested files on the SFTP. I had to ensure that the parent folders were being created as well. Splitting out the local path and file made it easier to ensure that it existed with os.path.exists() and os.m...
Hello, I am trying to use Metrics and Ganglia UI to monitor the state of my clusters better. But, I am seeing that the visuals are not coming up. I have tried opening on Chrome and microsoft edge, it shows same. Is there something that I need to inst...
I dont exactly know what was the issue. But, it seems to be related to some kind of network security. Apparently, my IT team had set up a separate vm and making the changes for that specific vm to be able to use Ganglia from there. I end up RDP into ...
I am running a Delta Live Pipeline that explodes JSON docs into small Delta Live Tables. The docs can receive multiple updates over the lifecycle of the transaction. I am curating the data via medallion architecture, when I run an API /update with {"...
Hey there @Danny Aguirre​ Does @Prabakar Ammeappin​ response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!
Hi,I'm quite new here. I'm trying to perform a deployment of python file with dbx command. The file contains libraries to be installed. How may I deploy the file (together with its dependencies) to databricks?Here are the commands I currently run:`db...
Hi @Di Lin​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
When I created a cluster on a new deployed Azure data bricks , It’s not starting and giving below message "Bootstrap Timeout" Please try again later, Instance bootstrap Timeout Failure message: Bootstrap script took too long and timeout. please try a...
Hi @Bin Ep​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
Code is working good if data greater than target date (>) is selected :SELECT xyz.ID,xyz.Gender,xyz.geography,xyz.code,xyz.delivery_status,abc.department_codeFROM v.table1 as xyzleft join y.table2 as abconxyz.ID = abc.ID AND xyz.code = abc.cod...
Hi @Rishabh Shankar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...
Hi,I'm receiving error while logging in or signing up on CE.Error 503 first byte timeoutfirst byte timeoutError 54113Details: cache-bom4725-BOM 1659706650 579764974Varnish cache serverscreenshot attached below:- Thanks, any help is appreciated from t...
Hi @Ghazanfar Uruj​ Does @Prabakar Ammeappin​ answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you.We'd love to hear from you.Thanks!
The schedule button isn't saving my schedule information for a databricks sql query. After I hit save and open the schedule again it has reverted to 'Never'. The query itself according to the past executions pane is not running according to the sched...
Hi @Wally Plourde​ Does @Rohit Rajendran​ response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!