Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Has anyone else run into the issue where applying libraries through a compute policy just completely does not work?I'm trying to insane some pretty basic Python libraries from PyPI (pytest and paramiko, for example) and it is failing on 13.3 and 14.3...
Hello everyone,I’m very new to Delta Live Tables (and Delta Tables too), so please forgive me if this question has been asked here before.Some context: I have over 100M records stored in a Postgres table. I can connect to this table using the convent...
If I'm reading this right you created a materialized view to prep your data in Postgres. You may not need to do that, and it will also limit your integration options. It puts more work on Postgres, usually creates more data to move, and will not as m...
Hello Team, I have a zip file in ADLS Gen 2. The folder I am using is mounted and when I run command : dbutils.fs.ls(path) it lists all the files(including the zip require). However, when I try to read the zip using 'zipfile' module, it displays 'Fil...
I wanted to send variables from control-m software which is used to call a data bricks job. The data bricks job is designed to call a notebook. The notebook should use the attributes which are sent by control-m. Can someone help me in this scenario o...
I know I can make the workflow job retry automatically by adding following properties in the YAML file: max_retries or min_retry_interval_millis.However, I cannot find similar attributes in any DLT pipeline document. When I ask copilot it gives this ...
Hi @guangyi ,In DLT you have following two properties that you can set:pipelines.maxFlowRetryAttemptsType: intThe maximum number of attempts to retry a flow before failing a pipeline update when a retryable failure occurs.The default value is two. By...
Hi Community.I am account admin but can't create a catalog on Databricks. Unity Catalog has been enabled.I don't even see the Create Catalog button.The assistant gave this advice. Is it correctTo grant the necessary permissions, you can follow these ...
Hi,I created a view my_view in a schema project_schema in Unity catalog catalog_dev that is a select * from a table my_table in my common_schema in the same catalog.I gave a service principal full grants on the project_schema. It is a owner of the sc...
Hello, I would like to know the best way to insert Datafactory activity logs into my Databricks delta table, so that I can use dashbosrd and create monitoring in Databricks itself , can you help me? I would like every 5 minutes for all activity logs ...
How fancy do you want to go? You can send ADF diagnostic settings to an event hub and stream them into a delta table in Databricks. Or you can send them to a storage account and build a workflow with 5 minute interval that loads the storage blob into...
Information in this post Speed Up Data Flow: Databricks and SAS | Databricks Blog led me to using spark-sas7dbat package to read SAS files and save to delta for downstream processes with great results. I was able to load very large files quickly that...
Hi, i'm trying to run the notebooks but it doesn't do any activity.I had to create a cluster in order to start my code.pressing the play button inside of notebook does nothing at all.and the 'compute' , pressing play there on the clusters gives the e...
This is very common issue I see with community edition. I suppose the only work around is to create new cluster each time. More info on stackoverflow:https://stackoverflow.com/questions/69072694/databricks-community-edition-cluster-wont-start
Due to dependencies, if one of our cells errors then we want the notebook to stop executing.We've noticed some odd behaviour when executing notebooks depending on if "Run all cells in this notebook" is selected from the header versus "Run All Below"....
Has this been implemented? I have created a job using notebook. My notebook has 6 cells and if the code in first cell fails it should not run the rest of the cells
I'm seeking validation from experts regarding the data quality and consistency checks we're implementing as part of a data migration using Spark and Databricks.Our migration involves transferring data from Azure Data Lake to a different data lake. As...
Hi @Coders, I'd also consider some profiling checks for column stats and distribution just to be sure everything is consistent after the migration.Afterwards, you should consider the best-practice of implementing some data quality validations on the ...
We are trying to build data quality process for initial file level or data ingestion level for bronze and add more specific business times for silver and business related aggregates for golden layer.
Hi @laksh!You could take a look at Rudol Data Quality, it has native Databricks integration and covers both basic an advanced data quality checks. Basic checks can be configured by non-technical roles using a no-code interface, but there's also the o...
Hi there!You could also take a look at Rudol, it has native Databricks support and covers Data Quality validations and Data Governance enabling non-technical roles such as Business Analysts or Data Stewards to be part of data quality as well with no-...
Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.