cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Neli
by New Contributor II
  • 33 Views
  • 0 replies
  • 0 kudos

Decrease frequency of Databricks Asset Bundle API

We are using DABs for our deployment and to invoke any workflow. Behinds the scenes, it calls below API to get the status of workflow. Currently, it checks every few seconds. Is there a way to decrease this frequency from seconds to minutes.  GET /ap...

  • 33 Views
  • 0 replies
  • 0 kudos
RohitKulkarni
by Contributor
  • 208 Views
  • 11 replies
  • 4 kudos

Partially upload data of 1.2GB

Hello Team,I have file contain in txt format of 1.2gb file.I am trying to upload the data into ms sql server database table.Only 10% of the data able to upload it.example :Total records in a file : 51303483Number of records inserted :10224430I am usi...

  • 208 Views
  • 11 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @RohitKulkarni, Thank you for reaching out to our community! We're here to help. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 4 kudos
10 More Replies
8b1tz
by New Contributor II
  • 736 Views
  • 24 replies
  • 6 kudos

Resolved! ADF logs into Databricks

Hello, I would like to know the best way to insert Datafactory activity logs into my Databricks delta table, so that I can use dashbosrd and create monitoring in Databricks itself , can you help me? I would like every 5 minutes for all activity logs ...

  • 736 Views
  • 24 replies
  • 6 kudos
Latest Reply
jacovangelder
Contributor III
  • 6 kudos

How fancy do you want to go? You can send ADF diagnostic settings to an event hub and stream them into a delta table in Databricks. Or you can send them to a storage account and build a workflow with 5 minute interval that loads the storage blob into...

  • 6 kudos
23 More Replies
YS1
by New Contributor III
  • 207 Views
  • 0 replies
  • 0 kudos

Discrepancy in Record Count in DLT Pipeline Data Quality Tab

Hello,I have set up a DLT pipeline where I ingest data from a Kafka topic into a table. Then, I create another table that filters records from the first table. However, I'm facing an issue:When I check the Data Quality tab for the second table, it sh...

YS1_0-1721938771644.png
  • 207 Views
  • 0 replies
  • 0 kudos
MKE
by New Contributor
  • 57 Views
  • 0 replies
  • 0 kudos

Unity Catalogue and SAS data using spark-sas7dbat

Information in this post Speed Up Data Flow: Databricks and SAS | Databricks Blog led me to using spark-sas7dbat package to read SAS files and save to delta for downstream processes with great results. I was able to load very large files quickly that...

  • 57 Views
  • 0 replies
  • 0 kudos
gazzyjuruj
by Contributor II
  • 7752 Views
  • 5 replies
  • 9 kudos

Cluster start is currently disabled ?

Hi, i'm trying to run the notebooks but it doesn't do any activity.I had to create a cluster in order to start my code.pressing the play button inside of notebook does nothing at all.and the 'compute' , pressing play there on the clusters gives the e...

  • 7752 Views
  • 5 replies
  • 9 kudos
Latest Reply
mrp12
New Contributor
  • 9 kudos

This is very common issue I see with community edition. I suppose the only work around is to create new cluster each time. More info on stackoverflow:https://stackoverflow.com/questions/69072694/databricks-community-edition-cluster-wont-start

  • 9 kudos
4 More Replies
Sathish_itachi
by New Contributor
  • 2731 Views
  • 19 replies
  • 16 kudos

Resolved! Encountering an error while accessing dbfs root folders

dbfs file browser storagecontext com.databricks.backend.storage.storagecontexttype$dbfsroot$@4155a7bf for workspace 2818007466707254 is not set in the customerstorageinfo above is the error displayed on the ui

  • 2731 Views
  • 19 replies
  • 16 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 16 kudos

This was a global issue and this is fixed now!

  • 16 kudos
18 More Replies
lei_armstrong
by New Contributor II
  • 7234 Views
  • 8 replies
  • 8 kudos

Resolved! Executing Notebooks - Run All Cells vs Run All Below

Due to dependencies, if one of our cells errors then we want the notebook to stop executing.We've noticed some odd behaviour when executing notebooks depending on if "Run all cells in this notebook" is selected from the header versus "Run All Below"....

  • 7234 Views
  • 8 replies
  • 8 kudos
Latest Reply
sukanya09
New Contributor II
  • 8 kudos

Has this been implemented? I have created a job using notebook. My notebook has 6 cells and if the code in first cell fails it should not run the rest of the cells 

  • 8 kudos
7 More Replies
Coders
by New Contributor II
  • 1666 Views
  • 5 replies
  • 0 kudos

Feedback on the data quality and consistency checks in Spark

I'm seeking validation from experts regarding the data quality and consistency checks we're implementing as part of a data migration using Spark and Databricks.Our migration involves transferring data from Azure Data Lake to a different data lake. As...

  • 1666 Views
  • 5 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor
  • 0 kudos

Hi @Coders, I'd also consider some profiling checks for column stats and distribution just to be sure everything is consistent after the migration.Afterwards, you should consider the best-practice of implementing some data quality validations on the ...

  • 0 kudos
4 More Replies
laksh
by New Contributor II
  • 2393 Views
  • 5 replies
  • 3 kudos

What kind of data quality rules that can be run using unity catalog

We are trying to build data quality process for initial file level or data ingestion level for bronze and add more specific business times for silver and business related aggregates for golden layer.

  • 2393 Views
  • 5 replies
  • 3 kudos
Latest Reply
joarobles
New Contributor
  • 3 kudos

Hi @laksh!You could take a look at Rudol Data Quality, it has native Databricks integration and covers both basic an advanced data quality checks. Basic checks can be configured by non-technical roles using a no-code interface, but there's also the o...

  • 3 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 1160 Views
  • 2 replies
  • 1 kudos

What is the Data Quality Framework do you use/recomend ?

Hi guys,In your opinion what is the best Data Quality Framework (or techinique) do you recommend ? 

Data Engineering
dataquality
  • 1160 Views
  • 2 replies
  • 1 kudos
Latest Reply
joarobles
New Contributor
  • 1 kudos

Hi there!You could also take a look at Rudol, it has native Databricks support and covers Data Quality validations and Data Governance enabling non-technical roles such as Business Analysts or Data Stewards to be part of data quality as well with no-...

  • 1 kudos
1 More Replies
Phani1
by Valued Contributor
  • 4393 Views
  • 6 replies
  • 0 kudos

Data Quality in Databricks

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

  • 4393 Views
  • 6 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor
  • 0 kudos

Looks nice! However I don't see Databricks support in the docs

  • 0 kudos
5 More Replies
narendra11
by New Contributor
  • 222 Views
  • 5 replies
  • 1 kudos

Resolved! getting Status code: 301 Moved Permanently error

getting this error while running the cells Failed to upload command result to DBFS. Error message: Status code: 301 Moved Permanently, Error message: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>PermanentRedirect</Code><Message>The bucket you ...

  • 222 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @narendra11, This has been fixed now. Could you please confirm?  

  • 1 kudos
4 More Replies
hayden_blair
by New Contributor III
  • 113 Views
  • 3 replies
  • 3 kudos

Resolved! Delta Live Table automatic table removal and schema update

Hello, I made a delta live table workflow that created 3 streaming tables in unity catalog. I then removed the source code for the 3rd table from the workflow and reran. After about a week, the 3rd streaming table is no longer available in unity cata...

  • 113 Views
  • 3 replies
  • 3 kudos
Latest Reply
hayden_blair
New Contributor III
  • 3 kudos

This makes sense @raphaelblg! Just to confirm my understanding, is the following statement true:If I remove the source code for a unity catalog DLT streaming table from a DLT pipeline and wait 7 days, that table will be dropped from unity catalog, an...

  • 3 kudos
2 More Replies
dpc
by New Contributor III
  • 53 Views
  • 2 replies
  • 0 kudos

Returing and reusing the identity value

Hello I have a table that has a column defined as an identity (BIGINT GENERATED ALWAYS AS IDENTITY)I will be inserting rows into this table in parallelHow can I get the identity and use that within a pipelineParallel is relevant as there will be mult...

  • 53 Views
  • 2 replies
  • 0 kudos
Latest Reply
Slash
New Contributor II
  • 0 kudos

Hi @dpc ,What you're trying to achieve does not make sense in the context of identity columns. Look at below entry from documentation. So, the answer is - if you want to have concurrent transaction, don't use identity columns Declaring an identity co...

  • 0 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors