cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MKE
by New Contributor
  • 1275 Views
  • 0 replies
  • 0 kudos

Unity Catalogue and SAS data using spark-sas7dbat

Information in this post Speed Up Data Flow: Databricks and SAS | Databricks Blog led me to using spark-sas7dbat package to read SAS files and save to delta for downstream processes with great results. I was able to load very large files quickly that...

  • 1275 Views
  • 0 replies
  • 0 kudos
gazzyjuruj
by Contributor II
  • 14642 Views
  • 5 replies
  • 12 kudos

Cluster start is currently disabled ?

Hi, i'm trying to run the notebooks but it doesn't do any activity.I had to create a cluster in order to start my code.pressing the play button inside of notebook does nothing at all.and the 'compute' , pressing play there on the clusters gives the e...

  • 14642 Views
  • 5 replies
  • 12 kudos
Latest Reply
mrp12
New Contributor III
  • 12 kudos

This is very common issue I see with community edition. I suppose the only work around is to create new cluster each time. More info on stackoverflow:https://stackoverflow.com/questions/69072694/databricks-community-edition-cluster-wont-start

  • 12 kudos
4 More Replies
lei_armstrong
by New Contributor II
  • 12051 Views
  • 6 replies
  • 7 kudos

Resolved! Executing Notebooks - Run All Cells vs Run All Below

Due to dependencies, if one of our cells errors then we want the notebook to stop executing.We've noticed some odd behaviour when executing notebooks depending on if "Run all cells in this notebook" is selected from the header versus "Run All Below"....

  • 12051 Views
  • 6 replies
  • 7 kudos
Latest Reply
sukanya09
New Contributor II
  • 7 kudos

Has this been implemented? I have created a job using notebook. My notebook has 6 cells and if the code in first cell fails it should not run the rest of the cells 

  • 7 kudos
5 More Replies
Coders
by New Contributor II
  • 3348 Views
  • 4 replies
  • 0 kudos

Feedback on the data quality and consistency checks in Spark

I'm seeking validation from experts regarding the data quality and consistency checks we're implementing as part of a data migration using Spark and Databricks.Our migration involves transferring data from Azure Data Lake to a different data lake. As...

  • 3348 Views
  • 4 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor III
  • 0 kudos

Hi @Coders, I'd also consider some profiling checks for column stats and distribution just to be sure everything is consistent after the migration.Afterwards, you should consider the best-practice of implementing some data quality validations on the ...

  • 0 kudos
3 More Replies
laksh
by New Contributor II
  • 5123 Views
  • 5 replies
  • 3 kudos

What kind of data quality rules that can be run using unity catalog

We are trying to build data quality process for initial file level or data ingestion level for bronze and add more specific business times for silver and business related aggregates for golden layer.

  • 5123 Views
  • 5 replies
  • 3 kudos
Latest Reply
joarobles
New Contributor III
  • 3 kudos

Hi @laksh!You could take a look at Rudol Data Quality, it has native Databricks integration and covers both basic an advanced data quality checks. Basic checks can be configured by non-technical roles using a no-code interface, but there's also the o...

  • 3 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 2223 Views
  • 1 replies
  • 1 kudos

What is the Data Quality Framework do you use/recomend ?

Hi guys,In your opinion what is the best Data Quality Framework (or techinique) do you recommend ? 

Data Engineering
dataquality
  • 2223 Views
  • 1 replies
  • 1 kudos
Latest Reply
joarobles
New Contributor III
  • 1 kudos

Hi there!You could also take a look at Rudol, it has native Databricks support and covers Data Quality validations and Data Governance enabling non-technical roles such as Business Analysts or Data Stewards to be part of data quality as well with no-...

  • 1 kudos
Phani1
by Valued Contributor II
  • 9655 Views
  • 5 replies
  • 0 kudos

Data Quality in Databricks

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

  • 9655 Views
  • 5 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor III
  • 0 kudos

Looks nice! However I don't see Databricks support in the docs

  • 0 kudos
4 More Replies
narendra11
by New Contributor
  • 1616 Views
  • 4 replies
  • 1 kudos

Resolved! getting Status code: 301 Moved Permanently error

getting this error while running the cells Failed to upload command result to DBFS. Error message: Status code: 301 Moved Permanently, Error message: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>PermanentRedirect</Code><Message>The bucket you ...

  • 1616 Views
  • 4 replies
  • 1 kudos
Latest Reply
stefano0929
New Contributor II
  • 1 kudos

Same problem and I don't know how to solve.. Here an example of cell that has always worked correctly but from yesterday it stopped.# Compute the correlation matrixcorrelation_matrix = data.corr()# Set up the matplotlib figureplt.figure(figsize=(14, ...

  • 1 kudos
3 More Replies
hayden_blair
by New Contributor III
  • 2151 Views
  • 3 replies
  • 3 kudos

Resolved! Delta Live Table automatic table removal and schema update

Hello, I made a delta live table workflow that created 3 streaming tables in unity catalog. I then removed the source code for the 3rd table from the workflow and reran. After about a week, the 3rd streaming table is no longer available in unity cata...

  • 2151 Views
  • 3 replies
  • 3 kudos
Latest Reply
hayden_blair
New Contributor III
  • 3 kudos

This makes sense @raphaelblg! Just to confirm my understanding, is the following statement true:If I remove the source code for a unity catalog DLT streaming table from a DLT pipeline and wait 7 days, that table will be dropped from unity catalog, an...

  • 3 kudos
2 More Replies
dpc
by New Contributor III
  • 1143 Views
  • 2 replies
  • 0 kudos

Returing and reusing the identity value

Hello I have a table that has a column defined as an identity (BIGINT GENERATED ALWAYS AS IDENTITY)I will be inserting rows into this table in parallelHow can I get the identity and use that within a pipelineParallel is relevant as there will be mult...

  • 1143 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @dpc ,What you're trying to achieve does not make sense in the context of identity columns. Look at below entry from documentation. So, the answer is - if you want to have concurrent transaction, don't use identity columns Declaring an identity co...

  • 0 kudos
1 More Replies
mbaas
by New Contributor III
  • 1859 Views
  • 4 replies
  • 4 kudos

Resolved! Temporary streaming tables (CDC)

I am currently using the `apply_changes` feature. I saw for the regular decorator `dlt.table` you can create temporary tables. I do not see the option you could use this feature with `dlt.create_streaming_table(`, in the sql version it looks it is su...

  • 1859 Views
  • 4 replies
  • 4 kudos
Latest Reply
Icassatti
New Contributor III
  • 4 kudos

Read this articles:Delta Live Tables Python language reference - Azure Databricks | Microsoft LearnThe APPLY CHANGES APIs: Simplify change data capture with Delta Live Tables - Azure Databricks | Microsoft LearnEven you could define as temporary, it ...

  • 4 kudos
3 More Replies
joaogilsa
by New Contributor II
  • 2679 Views
  • 3 replies
  • 1 kudos

Resolved! Delete folder using Databricks CLI

Hello,I am trying to delete a folder and its content using databricks cli, but I'm getting the following error:databricks workspace delete /Workspace/Users/XXX/XXX --profile DEFAULT --recursive trueError: expected to have the absolute path of the not...

  • 2679 Views
  • 3 replies
  • 1 kudos
Latest Reply
joaogilsa
New Contributor II
  • 1 kudos

Thank you for the help, @szymon_dybczak, it worked!

  • 1 kudos
2 More Replies
FerArribas
by Contributor
  • 12097 Views
  • 4 replies
  • 6 kudos

Resolved! Redirect error in access to web app in Azure Databricks with private front endpoint

I have created a workspace with private endpoint in Azure following this guide:https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/private-linkOnce I have created the private link of type browser_authent...

  • 12097 Views
  • 4 replies
  • 6 kudos
Latest Reply
flomader
New Contributor II
  • 6 kudos

You don't need a CNAME record.Go to your private link resource in Azure and click on Settings > DNS Configuration. Make sure you have created private link A records for all the FQDNs listed under 'Custom DNS records'. You have most likely missed one ...

  • 6 kudos
3 More Replies
yvishal519
by Contributor
  • 1222 Views
  • 2 replies
  • 3 kudos

Resolved! Databricks DLT with Hive Metastore and ADLS Access Issues

We are currently working on Databricks DLT tables to transform data from bronze to silver. we are specifically instructed us not to use mount paths for accessing data from ADLS Gen 2. To comply, I configured storage credentials and created an externa...

yvishal519_0-1721908544085.png
  • 1222 Views
  • 2 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @yvishal519 ,Since you're using hive metastore you have no other option than mount points. Storage credentials and external locations are only supported in Unity Catalog

  • 3 kudos
1 More Replies
helghe
by New Contributor II
  • 994 Views
  • 3 replies
  • 3 kudos

Unavailable system schemas

When I list the available schemas I get the following:{"schemas":[{"schema":"storage","state":"AVAILABLE"},{"schema":"operational_data","state":"UNAVAILABLE"},{"schema":"access","state":"AVAILABLE"},{"schema":"billing","state":"ENABLE_COMPLETED"},{"s...

  • 994 Views
  • 3 replies
  • 3 kudos
Latest Reply
hle
New Contributor II
  • 3 kudos

I have the same issue for the compute schema. Workspace is UC enabled and I'm account admin. 

  • 3 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels