Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have a DLT pipeline that reads data in S3 into an append-only bronze layer using Autoloader. The data sink needs to be changed to a new s3 bucket in a new account, and data in the existing s3 bucket migrated to the new one.Will Autoloader still be ...
Hi SamAdams,How are you doing today?, As per my understanding, You're on the right track here! When you change the S3 path for Autoloader, even if the files are exactly the same and just copied from the old bucket, Autoloader will treat them as new f...
WE have a databricks Job that will aggregate some data and create some data tables. This needs to be exported out in a PDF format.I have seen a few python libraries that can generate PDF, but was wondering if the PDF can be generated and dropped in a...
Hi ,I am trying to create catalog and database its not allowing databricks , please suggest .Here my code .base_dir = "/mnt/files"spark.sql(f"CREATE CATALOG IF NOT EXISTS dev")spark.sql(f"CREATE DATABASE IF NOT EXISTS dev.demo_db") first i ne...
I got a similar error trying create a catalog with "databricks.sdk" library I resolved it add the parameter "storage_root": w.catalogs.create(name=c.name, storage_root='s3://databricks-workspace-bucket/unity-catalog/426335709') In my case all catalog...
Hi, One of our clients is asking to switch from our API feed to have weather data delivered automatically to their Cloud Storage. What steps do I need to take from my end? Do I need to join Databricks to do so? Thanks. Tom
I'm trying to install Maven Libraries on the job cluster (non interactive cluster) as part of databricks workflow. I've added the context in the cluster configuration as part of deployment which I cant find the same in the post deployment configurati...
I found the workaround. Below are the steps:1. Add the required library to the Allowed list at the workspace level (require workspace/metastore admin access); you might need coordinates groupdd:artifactId:version2. At the task level, include under De...
Hi all,I am able to deploy Databricks assets to the target workspace. Jobs and workflows can also be created successfully.But I have aspecial requirement, that I copy the note books to the target folder on databricks workspace.Example:on Local I have...
Get Reliable PayPal Account Assistance Are you looking for a seamless way to manage international transactions? Our team provides secure solutions to help you verify your PayPal account and ensure hassle-free payment acceptance across the globe.
Please help me configure/choose the cluster configuration. I need to process and merge 6 million records into Azure SQL DB. At the end of the week, 9 billion records need to be processed and merged into Azure SQL DB, and a few transformations need to...
It will depend on the transformations and how you're loading them. Assuming it's mostly in spark, I recommend starting small using a job compute cluster with autoscaling enabled for cost efficiency. For daily loads (6 million records), a driver and 2...
Hi everyone,I'm new to Databricks and working on the "Data Ingestion with Delta Lake" course. I encountered a permission error with the following query:Can anyone help with this?Thanks!
Hello @walgt!
Apologies for the inconvenience. This was a known issue, but it has now been fixed! You should now be able to run your query without any problems.
Thanks for your patience!
Hi I am using auto loader to fetch some records stored in two files. Please see below my code. It fetches records from two files correctly and then it starts fetching NULL records. I attach option("cleanSource", ) to readStream. But it is ...
Hi all, I am managing one project in Databricks, with one more coming soon. Can anyone guide me on how to use Unity Catalog or any other method for this?"
Hi Team,We have updated our clusters DBR version, later we got to know that some of our jobs started failing, now we wanted to revert to DBR version to the previos one only but we forgot the DBR version on which job was running fine.Is there any way ...
Hey @ayushmangal72 , try using the Databricks Job Run API (/api/2.2/jobs/runs/list) to fetch older run IDs for the job.Once you have the run_id, make a request to the API at /api/2.2/jobs/runs/get. You'll be able to find the DBR version in the API r...
I am trying to run a job with (1) custom containers, and (2) via an instance pool. Here's the setup:The custom container is just the DBR-provided `databricksruntime/standard:12.2-LTS`The instance pool is defined via the UI (see screenshot, below).At ...
I have an issue with DAB where all the project files, starting from root ., get deployed to the /files folder in the bundle. I would prefer being able to deploy certain util notebooks, but not all the files of the project. I'm able to not deploy any ...