Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi,I'm very new to Terraform. Currently, I'm trying to automate the service principal setup process using Terraform.Following this example, I successfully created a service principal and an access token. However, when I tried adding databricks_git_cr...
I have some python code which takes parquet files from an adlsv2 location and merges it into delta tables (run as a workflow job on a schedule)I have a try catch wrapper around this so that any files that fail get moved into a failed folder using dbu...
That's the problem - it's not being locked (or fs.mv() isn't checking/honoring the lock). The upload process/tool is a 3rd-prty external toolI can see via the upload tool that the file upload is 'in progress'I can also see the 0 byte destination file...
if you are using 'delta.columnMapping.mode' = 'name' on your table i could not get it to work, without that line .. for the not matched .. WHEN NOT MATCHED THEN INSERT (columnname,columnName2) values(columnname,columnName2)WHEN MATCHED Then UPDAT...
Platform:AWS Databricksenabled "Turn on the new, updated code editor" in Notebook SettingsMacOS 12.5.1Firefox 104.0.2When I attempt to drag a notebook cell to move it, the tab crashes and causes my computer to run out of memory. I profiled the tab to...
Check out some of the questions from fellow users during our last Office Hours. All these questions were answered live by a Databricks expert!Q: What's the best way of using a UDF in a class?A: You need to define your class and then register the func...
ObjectiveWithin the context of a delta live table, I'm trying to merge two streams aggregation, but run into challenges. Is it possible to achieve such a join?ContextAssume- table trades stores a list of trades with their associated time stamps- tabl...
Hello ,This is my first post here and I am a total beginner with DataBricks and spark.Working on an IoT Cloud project with azure , I'm looking to set up a continuous stream processing of data.A current architecture already exists thanks to Stream Ana...
So the event hub creates files (json/csv) on adls.You can read those files into databricks with the spark.read.csv/json method. If you want to read many files in one go, you can use wildcards.f.e. spark.read.json("/mnt/datalake/bronze/directory/*/*...
I passed my DE associate exam, but unable to see/download my certificate on credentials.databricks.com. I am using the same email as the one on Kryterion on webassessor.com/databricks.I can log invto Kryterion and see that I have passed the exam
September 2022 Featured Member Interview Aman Sehgal - @AmanSehgal Pronouns: He, Him Company: CyberCXJob Title: Senior Data Engineer Could you give a brief description of your professional journey to date? A. I started my career as software develope...
Thank you @Lindsay Olson and @Christy Seto for interviewing me and nominating me as this months featured member. It's a pleasure to be member of Databricks community and I'm looking forward to contribute more in future.To all the community members...
I'm trying to create delta live table on top of json files placed in azure blob. The json files contains white spaces in column names instead of renaming I tried `columnMapping` table property which let me create the table with spaces but the column ...
I have a new (bronze) table that I want to write to - the initial table load (refresh) csv file is placed in folder a, the incremental changes (inserts/updates/deletes) csv files are placed in folder b. I've written a notebook that can load one OR t...
When I try to convert a notebook into a job I frequently run into an issue with writing to the local filesystem. For this particular example, I did all my notebook testing with a bytestream for small files. When I tried to run as a job, I used the me...
I was able to fix it. It was an issue with the nested files on the SFTP. I had to ensure that the parent folders were being created as well. Splitting out the local path and file made it easier to ensure that it existed with os.path.exists() and os.m...
Hello, I am trying to use Metrics and Ganglia UI to monitor the state of my clusters better. But, I am seeing that the visuals are not coming up. I have tried opening on Chrome and microsoft edge, it shows same. Is there something that I need to inst...
I dont exactly know what was the issue. But, it seems to be related to some kind of network security. Apparently, my IT team had set up a separate vm and making the changes for that specific vm to be able to use Ganglia from there. I end up RDP into ...
I am running a Delta Live Pipeline that explodes JSON docs into small Delta Live Tables. The docs can receive multiple updates over the lifecycle of the transaction. I am curating the data via medallion architecture, when I run an API /update with {"...
Hey there @Danny Aguirre Does @Prabakar Ammeappin response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!