cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

databicky
by Contributor II
  • 1772 Views
  • 4 replies
  • 0 kudos

how to optimize the runtime in 10.4 cluster

i am loading the 1billion data from spark dataframe into target table, but in the 7.3 cluster it takes 3 hours to complete but after migrated to 10.4 cluster its taking 8 hours to complete , how can i reduce the time duration​

  • 1772 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Mohammed sadamusean​,Could you provide more details on what are you doing? What type of transformations/actions are you doing? whats your source and sink? batch or streaming? all that information will help.

  • 0 kudos
3 More Replies
RafikiT97
by New Contributor
  • 3103 Views
  • 3 replies
  • 0 kudos

Query Databricks from Power BI with Row Level Security

I am trying to apply RLS to the solution but Power BI only connects to Databricks(DB) using a token which cant be used in DB groups. Is there no other way to apply Row Level security using Power BI?

  • 3103 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Daniel Gomes​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 0 kudos
2 More Replies
farooqurrehman
by New Contributor
  • 1898 Views
  • 3 replies
  • 2 kudos

Unable to connect/read files from ADLS Gen2 using account key

It gives error[RequestId=5e57b66f-b69f-4e8b-8706-3fe5baeb77a0 ErrorClass=METASTORE_DOES_NOT_EXIST] No metastore assigned for the current workspace.using the following codespark.conf.set(  "fs.azure.account.key.mystorageaccount.dfs.core.windows.net", ...

  • 1898 Views
  • 3 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Hi @Farooq ur rehman​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 2 kudos
2 More Replies
SRK
by Contributor III
  • 2574 Views
  • 5 replies
  • 0 kudos

Delta Live Tables data quality rules application.

I have a requirement, where I need to apply inverse DQ rule on a table to track the invalid data. For which I can use the following approach:import dltrules = {}quarantine_rules = {}rules["valid_website"] = "(Website IS NOT NULL)"rules["valid_locatio...

  • 2574 Views
  • 5 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

You can get additional info from DLT event log which is in delta so you can load it as table https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html#data-quality

  • 0 kudos
4 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 851 Views
  • 1 replies
  • 10 kudos

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete ...

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete all target rows which doesn't match any source.

Screenshot 2023-01-24 130504
  • 851 Views
  • 1 replies
  • 10 kudos
Latest Reply
jose_gonzalez
Moderator
  • 10 kudos

Thank you for sharing @Hubert Dudek​ 

  • 10 kudos
Daba
by New Contributor III
  • 5730 Views
  • 5 replies
  • 5 kudos

DLT streaming table and LEFT JOIN

I'm trying to build gold level streaming live table based on two streaming silver live tables with left join.This attempt fails with the next error:"Append mode error: Stream-stream LeftOuter join between two streaming DataFrame/Datasets is not suppo...

  • 5730 Views
  • 5 replies
  • 5 kudos
Latest Reply
Daba
New Contributor III
  • 5 kudos

Thanks Fatma,I do understand the need for watermarks, but I'm just wondering if this supported by SQL syntax?

  • 5 kudos
4 More Replies
ackerman_chris
by New Contributor III
  • 2651 Views
  • 1 replies
  • 1 kudos

Resolved! Azure Devops Git sync failed in Azure Databricks

Hello,I am currently attempting to setup a Git Repo within Azure Devops to use on my Azure Databricks Workspace environment for various notebooks. I went through the process of creating a Personal Access Token (PAT) on Devops, and have inputted the t...

  • 2651 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Christopher Ackerman​, This error message usually occurs when there is an issue with authentication between Azure Databricks and Azure DevOps. One possible reason for this error is that the token was not granted the necessary permissions to acces...

  • 1 kudos
phaezel
by New Contributor
  • 923 Views
  • 1 replies
  • 0 kudos
  • 923 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @peter dhaeseleer​, As of my current knowledge in February 2023, I am unaware of any official announcement from Databricks regarding the availability of DLT Unity Catalog integration in preview.

  • 0 kudos
zak
by New Contributor II
  • 3851 Views
  • 4 replies
  • 4 kudos

Resolved! add custom metadata to avro file with pyspark

Hello, i need to add a custom metadata into a avro file. The avro file containt data. we have tried to use "option" within the write function but it's not taken without generated any error.df.write.format("avro").option("avro.codec", "snappy").option...

  • 3851 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @zakaria belamri​, You can add custom metadata to an Avro file in PySpark by creating an Avro schema with the custom metadata fields and passing it to the DataFrameWriter as an option. Here's an example code snippet that demonstrates how to do thi...

  • 4 kudos
3 More Replies
JordanYaker
by Contributor
  • 1849 Views
  • 3 replies
  • 3 kudos

Resolved! What is the maximum number of workspaces per account using Databricks on AWS?

I've been looking through the documentation and I swear this used to be listed somewhere, but for the life of me I can't find it anymore.

  • 1849 Views
  • 3 replies
  • 3 kudos
Latest Reply
JordanYaker
Contributor
  • 3 kudos

Thanks @Kaniz Fatma​ 

  • 3 kudos
2 More Replies
kkawka1
by New Contributor III
  • 5973 Views
  • 8 replies
  • 10 kudos

Resolved! Removing files saved in the root FileStore

We have just started working with databricks in one of my university modules, and the lecturers gave us a set of commands to practice saving data in the FileStore. One of the commands was the following:dbutils .fs.cp("/ databricks - datasets / weathh...

  • 5973 Views
  • 8 replies
  • 10 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 10 kudos

Hi @Konrad Kawka​ , Are you using the Community Edition?

  • 10 kudos
7 More Replies
Shanthala
by New Contributor III
  • 1100 Views
  • 1 replies
  • 3 kudos

Workspace usage for the partners

We have 11 people working on the Data Engineering Associate certification using Data Engineering with Databricks V3.  We just got done with the Foundation one and start the Engineering journey. We are Registered partners and Data Engineering with Dat...

  • 1100 Views
  • 1 replies
  • 3 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 3 kudos

Hell Shanthala, you can send an email to partnerops@databricks.com who then provide information how to set this up

  • 3 kudos
killjoy
by New Contributor III
  • 6354 Views
  • 7 replies
  • 0 kudos

Resolved! Pipeline failed while calling Databricks Notebook - Cluster Terminated

Hello,We have an Azure Data Factory pipeline running during the night, and one of the activities calls a Databricks Notebook with dynamic DatabricksInstancePoolId, ClusterVersion and Workers. Yesterday, it failed with with the following error:Cluster...

  • 6354 Views
  • 7 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Rita Fernandes​,What are you trying to install in your init script? only the ODBC driver or some other libraries/dependencies?

  • 0 kudos
6 More Replies
User16844924854
by New Contributor II
  • 807 Views
  • 1 replies
  • 1 kudos

Databricks Product Manager here �� Would love to chat with anyone who has feedback on their experience onboarding to Databricks. Can't wait...

Databricks Product Manager here  Would love to chat with anyone who has feedback on their experience onboarding to Databricks. Can't wait to hear from you!

  • 807 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Nice opportunity for our community influencers!

  • 1 kudos
grazie
by Contributor
  • 2182 Views
  • 2 replies
  • 0 kudos

Resolved! slack notification (webhook) failing

POST to the slack webhook from local http client works_"Test" action on "System Notifcations" page in Databricks gives response 400 bad request

  • 2182 Views
  • 2 replies
  • 0 kudos
Latest Reply
grazie
Contributor
  • 0 kudos

All in all, I got confused by a misconfiguration.A notification was setup as a webhook notificaton instead of a slack notification by mistake in a particular workspace, which caused the confusion. So, "problem in chair, not computer". If anything, it...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels