cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

EmilioGC
by New Contributor III
  • 8299 Views
  • 5 replies
  • 7 kudos

Resolved! Why was SQL formatting removed inside spark.sql functions? Now it looks like a plain string.

Previously we were able to see SQL queries inside spark.sql() like this:But now it just looks like a plain string: I know it's not a big issue, but it's still annoying to have to code in SQL while having it all be blue, it makes debugging more cumber...

old format new format
  • 8299 Views
  • 5 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 7 kudos

Hi @Emilio Garza​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 7 kudos
4 More Replies
Kash
by Contributor III
  • 6322 Views
  • 4 replies
  • 0 kudos

Creating a spot only single-node job compute cluster policy

Hi there,I need some help creating a new cluster policy that utilizes a single spot-instnace server to complete a job. I want to set this up as a job-compute to reduce costs and also utilize 1 spot instance.The jobs I need to ETL are very short and c...

  • 6322 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Avkash Kana​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 0 kudos
3 More Replies
databicky
by Contributor II
  • 2422 Views
  • 4 replies
  • 0 kudos

how to optimize the runtime in 10.4 cluster

i am loading the 1billion data from spark dataframe into target table, but in the 7.3 cluster it takes 3 hours to complete but after migrated to 10.4 cluster its taking 8 hours to complete , how can i reduce the time duration​

  • 2422 Views
  • 4 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Mohammed sadamusean​,Could you provide more details on what are you doing? What type of transformations/actions are you doing? whats your source and sink? batch or streaming? all that information will help.

  • 0 kudos
3 More Replies
RafikiT97
by New Contributor
  • 5095 Views
  • 3 replies
  • 0 kudos

Query Databricks from Power BI with Row Level Security

I am trying to apply RLS to the solution but Power BI only connects to Databricks(DB) using a token which cant be used in DB groups. Is there no other way to apply Row Level security using Power BI?

  • 5095 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Daniel Gomes​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 0 kudos
2 More Replies
farooqurrehman
by New Contributor
  • 3186 Views
  • 3 replies
  • 2 kudos

Unable to connect/read files from ADLS Gen2 using account key

It gives error[RequestId=5e57b66f-b69f-4e8b-8706-3fe5baeb77a0 ErrorClass=METASTORE_DOES_NOT_EXIST] No metastore assigned for the current workspace.using the following codespark.conf.set(  "fs.azure.account.key.mystorageaccount.dfs.core.windows.net", ...

  • 3186 Views
  • 3 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Farooq ur rehman​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 2 kudos
2 More Replies
SRK
by Contributor III
  • 6867 Views
  • 5 replies
  • 0 kudos

Delta Live Tables data quality rules application.

I have a requirement, where I need to apply inverse DQ rule on a table to track the invalid data. For which I can use the following approach:import dltrules = {}quarantine_rules = {}rules["valid_website"] = "(Website IS NOT NULL)"rules["valid_locatio...

  • 6867 Views
  • 5 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

You can get additional info from DLT event log which is in delta so you can load it as table https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html#data-quality

  • 0 kudos
4 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1601 Views
  • 1 replies
  • 10 kudos

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete ...

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete all target rows which doesn't match any source.

Screenshot 2023-01-24 130504
  • 1601 Views
  • 1 replies
  • 10 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 10 kudos

Thank you for sharing @Hubert Dudek​ 

  • 10 kudos
Daba
by New Contributor III
  • 8139 Views
  • 3 replies
  • 4 kudos

DLT streaming table and LEFT JOIN

I'm trying to build gold level streaming live table based on two streaming silver live tables with left join.This attempt fails with the next error:"Append mode error: Stream-stream LeftOuter join between two streaming DataFrame/Datasets is not suppo...

  • 8139 Views
  • 3 replies
  • 4 kudos
Latest Reply
Daba
New Contributor III
  • 4 kudos

Thanks Fatma,I do understand the need for watermarks, but I'm just wondering if this supported by SQL syntax?

  • 4 kudos
2 More Replies
zak
by New Contributor II
  • 4826 Views
  • 1 replies
  • 1 kudos

add custom metadata to avro file with pyspark

Hello, i need to add a custom metadata into a avro file. The avro file containt data. we have tried to use "option" within the write function but it's not taken without generated any error.df.write.format("avro").option("avro.codec", "snappy").option...

  • 4826 Views
  • 1 replies
  • 1 kudos
JordanYaker
by Contributor
  • 2637 Views
  • 1 replies
  • 1 kudos

What is the maximum number of workspaces per account using Databricks on AWS?

I've been looking through the documentation and I swear this used to be listed somewhere, but for the life of me I can't find it anymore.

  • 2637 Views
  • 1 replies
  • 1 kudos
Latest Reply
JordanYaker
Contributor
  • 1 kudos

Thanks @Kaniz Fatma​ 

  • 1 kudos
kkawka1
by New Contributor III
  • 13615 Views
  • 7 replies
  • 10 kudos

Resolved! Removing files saved in the root FileStore

We have just started working with databricks in one of my university modules, and the lecturers gave us a set of commands to practice saving data in the FileStore. One of the commands was the following:dbutils .fs.cp("/ databricks - datasets / weathh...

  • 13615 Views
  • 7 replies
  • 10 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 10 kudos

you can delete files using the data explorer in Databricks web UI.another option is to use %fs or %sh in a notebook.

  • 10 kudos
6 More Replies
Shanthala
by New Contributor III
  • 1805 Views
  • 1 replies
  • 3 kudos

Workspace usage for the partners

We have 11 people working on the Data Engineering Associate certification using Data Engineering with Databricks V3.  We just got done with the Foundation one and start the Engineering journey. We are Registered partners and Data Engineering with Dat...

  • 1805 Views
  • 1 replies
  • 3 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 3 kudos

Hell Shanthala, you can send an email to partnerops@databricks.com who then provide information how to set this up

  • 3 kudos
killjoy
by New Contributor III
  • 9607 Views
  • 7 replies
  • 0 kudos

Resolved! Pipeline failed while calling Databricks Notebook - Cluster Terminated

Hello,We have an Azure Data Factory pipeline running during the night, and one of the activities calls a Databricks Notebook with dynamic DatabricksInstancePoolId, ClusterVersion and Workers. Yesterday, it failed with with the following error:Cluster...

  • 9607 Views
  • 7 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Rita Fernandes​,What are you trying to install in your init script? only the ODBC driver or some other libraries/dependencies?

  • 0 kudos
6 More Replies
grazie
by Contributor
  • 3433 Views
  • 2 replies
  • 0 kudos

Resolved! slack notification (webhook) failing

POST to the slack webhook from local http client works_"Test" action on "System Notifcations" page in Databricks gives response 400 bad request

  • 3433 Views
  • 2 replies
  • 0 kudos
Latest Reply
grazie
Contributor
  • 0 kudos

All in all, I got confused by a misconfiguration.A notification was setup as a webhook notificaton instead of a slack notification by mistake in a particular workspace, which caused the confusion. So, "problem in chair, not computer". If anything, it...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels