cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Kayla
by Valued Contributor II
  • 1484 Views
  • 1 replies
  • 0 kudos

Resolved! Compute Policy Does Not Install Libraries

Has anyone else run into the issue where applying libraries through a compute policy just completely does not work?I'm trying to insane some pretty basic Python libraries from PyPI (pytest and paramiko, for example) and it is failing on 13.3 and 14.3...

  • 1484 Views
  • 1 replies
  • 0 kudos
rk1994
by New Contributor
  • 2591 Views
  • 2 replies
  • 0 kudos

Incrementally ingesting from a static db into a Delta Table

Hello everyone,I’m very new to Delta Live Tables (and Delta Tables too), so please forgive me if this question has been asked here before.Some context: I have over 100M records stored in a Postgres table. I can connect to this table using the convent...

  • 2591 Views
  • 2 replies
  • 0 kudos
Latest Reply
rdmeyersf
New Contributor II
  • 0 kudos

If I'm reading this right you created a materialized view to prep your data in Postgres. You may not need to do that, and it will also limit your integration options. It puts more work on Postgres, usually creates more data to move, and will not as m...

  • 0 kudos
1 More Replies
Gareema
by Contributor
  • 2732 Views
  • 3 replies
  • 1 kudos

Not able to unzip the zip file with mount and unity catalog

Hello Team, I have a zip file in ADLS Gen 2. The folder I am using is mounted and when I run command : dbutils.fs.ls(path) it lists all the files(including the zip require). However, when I try to read the zip using 'zipfile' module, it displays 'Fil...

Gareema_0-1721743802964.png
  • 2732 Views
  • 3 replies
  • 1 kudos
Latest Reply
Witold
Honored Contributor
  • 1 kudos

@Gareema, since you're using UC, can you use Volumes instead? It basically replaces the old mount approach.

  • 1 kudos
2 More Replies
Venkat369
by New Contributor II
  • 2953 Views
  • 4 replies
  • 2 kudos

How to send variables from control-m to Data bricks Jobs

I wanted to send variables from control-m software which is used to call a data bricks job. The data bricks job is designed to call a notebook. The notebook should use the attributes which are sent by control-m. Can someone help me in this scenario o...

  • 2953 Views
  • 4 replies
  • 2 kudos
Latest Reply
Witold
Honored Contributor
  • 2 kudos

@Venkat369 What is wrong with the link I provided? It actually shows you how to do it. If not, please be more precise.

  • 2 kudos
3 More Replies
guangyi
by Contributor III
  • 2643 Views
  • 1 replies
  • 1 kudos

Resolved! Is there a way to let the DLT pipeline retry by itself?

I know I can make the workflow job retry automatically by adding following properties in the YAML file: max_retries or min_retry_interval_millis.However, I cannot find similar attributes in any DLT pipeline document. When I ask copilot it gives this ...

Screenshot 2024-07-26 at 14.03.50.png
  • 2643 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @guangyi ,In DLT you have following two properties that you can set:pipelines.maxFlowRetryAttemptsType: intThe maximum number of attempts to retry a flow before failing a pipeline update when a retryable failure occurs.The default value is two. By...

  • 1 kudos
WynanddB
by New Contributor III
  • 1376 Views
  • 1 replies
  • 1 kudos

Resolved! Can't create Catalog on Databricks on AWS

Hi Community.I am account admin but can't create a catalog on Databricks. Unity Catalog has been enabled.I don't even see the Create Catalog button.The assistant gave this advice. Is it correctTo grant the necessary permissions, you can follow these ...

  • 1376 Views
  • 1 replies
  • 1 kudos
tomph
by New Contributor II
  • 9619 Views
  • 3 replies
  • 3 kudos

Cannot read from view if no access to underlying table

Hi,I created a view my_view in a schema project_schema in Unity catalog catalog_dev that is a select * from a table my_table in my common_schema in the same catalog.I gave a service principal full grants on the project_schema. It is a owner of the sc...

  • 9619 Views
  • 3 replies
  • 3 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 3 kudos

@tomph That's how views work in the most of the engines.You need to grant the permissions to the underlying tables.

  • 3 kudos
2 More Replies
8b1tz
by Contributor
  • 14199 Views
  • 24 replies
  • 6 kudos

Resolved! ADF logs into Databricks

Hello, I would like to know the best way to insert Datafactory activity logs into my Databricks delta table, so that I can use dashbosrd and create monitoring in Databricks itself , can you help me? I would like every 5 minutes for all activity logs ...

  • 14199 Views
  • 24 replies
  • 6 kudos
Latest Reply
jacovangelder
Databricks MVP
  • 6 kudos

How fancy do you want to go? You can send ADF diagnostic settings to an event hub and stream them into a delta table in Databricks. Or you can send them to a storage account and build a workflow with 5 minute interval that loads the storage blob into...

  • 6 kudos
23 More Replies
MKE
by New Contributor
  • 1966 Views
  • 0 replies
  • 0 kudos

Unity Catalogue and SAS data using spark-sas7dbat

Information in this post Speed Up Data Flow: Databricks and SAS | Databricks Blog led me to using spark-sas7dbat package to read SAS files and save to delta for downstream processes with great results. I was able to load very large files quickly that...

  • 1966 Views
  • 0 replies
  • 0 kudos
gazzyjuruj
by Contributor II
  • 16247 Views
  • 5 replies
  • 12 kudos

Cluster start is currently disabled ?

Hi, i'm trying to run the notebooks but it doesn't do any activity.I had to create a cluster in order to start my code.pressing the play button inside of notebook does nothing at all.and the 'compute' , pressing play there on the clusters gives the e...

  • 16247 Views
  • 5 replies
  • 12 kudos
Latest Reply
mrp12
New Contributor III
  • 12 kudos

This is very common issue I see with community edition. I suppose the only work around is to create new cluster each time. More info on stackoverflow:https://stackoverflow.com/questions/69072694/databricks-community-edition-cluster-wont-start

  • 12 kudos
4 More Replies
lei_armstrong
by New Contributor II
  • 14186 Views
  • 6 replies
  • 7 kudos

Resolved! Executing Notebooks - Run All Cells vs Run All Below

Due to dependencies, if one of our cells errors then we want the notebook to stop executing.We've noticed some odd behaviour when executing notebooks depending on if "Run all cells in this notebook" is selected from the header versus "Run All Below"....

  • 14186 Views
  • 6 replies
  • 7 kudos
Latest Reply
sukanya09
New Contributor II
  • 7 kudos

Has this been implemented? I have created a job using notebook. My notebook has 6 cells and if the code in first cell fails it should not run the rest of the cells 

  • 7 kudos
5 More Replies
Coders
by New Contributor II
  • 4205 Views
  • 4 replies
  • 0 kudos

Feedback on the data quality and consistency checks in Spark

I'm seeking validation from experts regarding the data quality and consistency checks we're implementing as part of a data migration using Spark and Databricks.Our migration involves transferring data from Azure Data Lake to a different data lake. As...

  • 4205 Views
  • 4 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor III
  • 0 kudos

Hi @Coders, I'd also consider some profiling checks for column stats and distribution just to be sure everything is consistent after the migration.Afterwards, you should consider the best-practice of implementing some data quality validations on the ...

  • 0 kudos
3 More Replies
laksh
by New Contributor II
  • 6179 Views
  • 5 replies
  • 3 kudos

What kind of data quality rules that can be run using unity catalog

We are trying to build data quality process for initial file level or data ingestion level for bronze and add more specific business times for silver and business related aggregates for golden layer.

  • 6179 Views
  • 5 replies
  • 3 kudos
Latest Reply
joarobles
New Contributor III
  • 3 kudos

Hi @laksh!You could take a look at Rudol Data Quality, it has native Databricks integration and covers both basic an advanced data quality checks. Basic checks can be configured by non-technical roles using a no-code interface, but there's also the o...

  • 3 kudos
4 More Replies
Phani1
by Databricks MVP
  • 10806 Views
  • 5 replies
  • 0 kudos

Data Quality in Databricks

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

  • 10806 Views
  • 5 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor III
  • 0 kudos

Looks nice! However I don't see Databricks support in the docs

  • 0 kudos
4 More Replies
narendra11
by New Contributor
  • 2353 Views
  • 4 replies
  • 1 kudos

Resolved! getting Status code: 301 Moved Permanently error

getting this error while running the cells Failed to upload command result to DBFS. Error message: Status code: 301 Moved Permanently, Error message: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>PermanentRedirect</Code><Message>The bucket you ...

  • 2353 Views
  • 4 replies
  • 1 kudos
Latest Reply
stefano0929
New Contributor II
  • 1 kudos

Same problem and I don't know how to solve.. Here an example of cell that has always worked correctly but from yesterday it stopped.# Compute the correlation matrixcorrelation_matrix = data.corr()# Set up the matplotlib figureplt.figure(figsize=(14, ...

  • 1 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels