Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hey everyone, im using autoloader x soda.I'm new to both,The idea is to ingest with quality checks in my silver table for every batch in a continuous ingestion.I tried to configure soda as str just like the docs show, but its seems that it keeps on t...
Hello,I am facing an issue with my workflow.I have a job (name it main job) that, among others, runs 5 concurrent tasks, which are defined as jobs (not notebooks).Each of these jobs is identical to the others (name them sub-job-1), with the only diff...
Hi all,Just wanted to raise a question regarding Databricks workbooks and viewing the results in the cells. For the example provided in the screenshot I want to view the results of an excel formula that has been applied to a cell in our workbooks. Fo...
Hey guys,I've been looking for some docs on how autoloader manages the source outage, I am currently running the following code: dfBronze = (spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.schema(json_schema_b...
Hi @sakuraDev ,1. Using the availableNow trigger to process all available data immediately and then stop the query. As you noticed your data was processed once and now you need to trigger the process once again to process new files.2. Changing the tr...
We have to deliver a Databricks Finops Assessment project. I am trying to write a proposal for it. I haven't done one before. I have created a general process of how the assessment will look like and then restructured it using gpt.Plz give your feedb...
AI/BI Dashboards offer a robust solution for securely sharing visualizations, and insights throughout your organization. You can easily share these dashboards with users within your Databricks workspace, across other workspaces in your organization, ...
Hi Rishabh,Nice post, AI/BI Dashboards make it easy to share data securely within and across workspaces, even with view-only users. This way, everyone gets the right info while keeping things controlled. Excited to learn more about the key features!A...
Hi everyone,I am currently trying to enforce the following schema: StructType([
StructField("site", StringType(), True),
StructField("meter", StringType(), True),
StructField("device_time", StringType(), True),
StructField("data", St...
Hi @sakuraDev ,I'm afraid your assumption is wrong. Here you define data field as struct type and the result is as expected. So once you have this column as struct type, you can refer to nested object using dot notation. So if you would like to get e...
Hi All, Need one help. Is there any possibility to trigger a Databricks SQL Alert as a email notification to group of users/individual users without schedule option.We can add the Email id in the destinations but it will trigger an alert only if we s...
HI there, can you provide a bit more detail - why do you need email addresses if you don't send an alert? Are you trying to email when the job finishes? Or do you want to send the results?
Hi all,I am looking for advice on what would be the best approach when it comes to CI/CD in Databricks and repo in general. What would be the best approach; to have main branch and branch off of it or? How will changes be propagated from dev to qa an...
Hi @Stellar, Setting up a robust CI/CD (Continuous Integration/Continuous Deployment) pipeline for Databricks involves thoughtful planning and adherence to best practices.
Let’s break down the key aspects:
Development Workflow:
Branching Strateg...
We are setting up new DLT Pipelines using the DLT-Meta package. Everything is going well in bringing our data in from Landing to our Bronze layer when we keep the onboarding JSON fairly vanilla. However, we are hitting an issue when using the cdc_app...
I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image. I am using it successfully in my job compute instances but I'm running into problems trying to use it in a DLT job. Here are my overrides for the standard job c...
@Kaniz_Fatma - any chance I can get a definitive answer to this question? I know I can %pip install in DLT jobs but graphframes requires a maven type install as it uses underlying java/scala modules/jar files. A related question is whether there i...
I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage d...
Hi @ggsmith ,If you use Delta Live Tables then checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name.
I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the below error, an...
@DBUser2 wrote:I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the ...
Hello all.We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no depende...
Thank you for sharing this @Kaniz_Fatma. @dashawn did you were able to check Kaniz's docs? do you still need help or shall you accept Kaniz's solution?
Hi all,I have a question regarding Workflows and queuing of job runs. I'm running into a case where jobs are running longer than expected and result in job runs being queued, which is expected and desired. However, in this particular case we only nee...