cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Danish11052000
by New Contributor III
  • 26 Views
  • 2 replies
  • 0 kudos

How to incrementally backup system.information_schema.table_privileges (no streaming, no unique keys

I'm trying to incrementally backup system.information_schema.table_privileges but facing challenges:No streaming support: Is streaming supported: FalseNo unique columns for MERGE: All columns contain common values, no natural key combinationNo timest...

  • 26 Views
  • 2 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

information_schema is not a Delta Table, which is why you can't stream from it. They are basically views on top of the information coming straight from the control plane database. Also your query is actually going to be quite slow/expensive (you prob...

  • 0 kudos
1 More Replies
echol
by Visitor
  • 60 Views
  • 3 replies
  • 1 kudos

Redeploy Databricks Asset Bundle created by others

Hi everyone,Our team is using Databricks Asset Bundles (DAB) with a customized template to develop data pipelines. We have a core team that maintains the shared infrastructure and templates, and multiple product teams that use this template to develo...

  • 60 Views
  • 3 replies
  • 1 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 1 kudos

There is a purpose of development mode . its not a limitation . Its meant to make sure developers can test the changes individually . If you plan to have this deployed by multiple users you will have to deploy in production mode . 

  • 1 kudos
2 More Replies
dpc
by Contributor II
  • 145 Views
  • 6 replies
  • 2 kudos

Using AD groups for object ownership

Databricks has a general issue with object ownership in that only the creator can delete them.So, if I create a catalog, table, view, schema etc. I am the only person who can delete it.No good if it's a general table or view and some other developer ...

  • 145 Views
  • 6 replies
  • 2 kudos
Latest Reply
dpc
Contributor II
  • 2 kudos

Hi So, I've just tested this If I create a schema and somebody else creates a table in that schema, I can drop their table If they create a schema along with a table in that schemaThen grant me All privileges on the table, I cannot drop it as it says...

  • 2 kudos
5 More Replies
Fox19
by New Contributor III
  • 49 Views
  • 4 replies
  • 1 kudos

CSV Ingestion using Autoloader with single variant column

I've been working on ingesting csv files with varying schemas using Autoloader. Goal is to take the csvs and ingest them into a bronze table that writes each record as a key-value mapping with only the relevant fields for that record. I also want to ...

  • 49 Views
  • 4 replies
  • 1 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 1 kudos

If i understand the problem correctly you are getting extra keys for records from files where the keys actually dont exist . I was not able to reproduce this issue . I am getting diffrent keys , value pairs and no extra keys with null. Can you share ...

  • 1 kudos
3 More Replies
aranjan99
by Contributor
  • 41 Views
  • 2 replies
  • 1 kudos

System table missing primary keys?

This simple query takes 50seconds for me on a X-Small warehouse.select * from SYSTEM.access.workspaces_latest where workspace_id = '442224551661121'Can the team comment on why querying on system tables takes so long? I also dont see any primary keys ...

  • 41 Views
  • 2 replies
  • 1 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 1 kudos

System tables are a Databricks‑hosted, read‑only analytical store shared to your workspace via Delta Sharing; they aren’t modifiable (no indexes you can add), and the first read can have extra overhead on a very small warehouse. This can make “simple...

  • 1 kudos
1 More Replies
petergriffin1
by New Contributor II
  • 1987 Views
  • 4 replies
  • 1 kudos

Resolved! Are you able to create a iceberg table natively in Databricks?

Been trying to create a iceberg table natively in databricks with the cluster being 16.4. I also have the Iceberg JAR file for 3.5.2 Spark.Using a simple command such as:%sql CREATE OR REPLACE TABLE catalog1.default.iceberg( a INT ) USING iceberg...

  • 1987 Views
  • 4 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Databricks supports creating and working with Apache Iceberg tables natively under specific conditions. Managed Iceberg tables in Unity Catalog can be created directly using Databricks Runtime 16.4 LTS or newer. The necessary setup requires enabling ...

  • 1 kudos
3 More Replies
souravroy1990
by New Contributor II
  • 44 Views
  • 2 replies
  • 2 kudos

Error in Column level tags creation in views via SQL

Hi,I'm trying to run this query using SQL and using DBR 17.3 cluster. But I get a syntax error. ALTER VIEW catalog.schema.viewALTER COLUMN column_nameSET TAGS (`METADATA` = `xyz`); But below query works- SET TAG ON COLUMN catalog.schema.view.column_n...

  • 44 Views
  • 2 replies
  • 2 kudos
Latest Reply
souravroy1990
New Contributor II
  • 2 kudos

Thanks for the clarification @szymon_dybczak. I have a follow-up qn, if I have attached tag to a view column and the same view is associated with a SHARE, will the recipient see the tag in the view i.e. whether view column tags associated to shares a...

  • 2 kudos
1 More Replies
liquibricks
by Contributor
  • 242 Views
  • 7 replies
  • 4 kudos

Declarative Pipeline error: Name 'kdf' is not defined. Did you mean: 'sdf'

We have a Lakeflow Spark Declarative Pipeline using the new PySpark Pipelines API. This was working fine until about 7am (Central European) this morning when the pipeline started failing with a PYTHON.NAME_ERROR: name 'kdf' is not defined. Did you me...

  • 242 Views
  • 7 replies
  • 4 kudos
Latest Reply
zkaliszamisza
  • 4 kudos

For us it happened in westeurope around the same time

  • 4 kudos
6 More Replies
dpc
by Contributor II
  • 341 Views
  • 8 replies
  • 8 kudos

Case insensitive data

For all it's positives, one of the first general issues we had with databricks was case sensitivity.We have a lot of data specific filters in our codeProblem is, we land and view data from lots of different case insensitive source systems e.g. SQL Se...

  • 341 Views
  • 8 replies
  • 8 kudos
Latest Reply
dpc
Contributor II
  • 8 kudos

It works but there's a scenario that causes an issue.If I create a schema with defaultcollation UTF8_LCASE Then create a table, it marks all the string columns as UTF8_LCASE Which is fine and works If I create the table, in the newly created UTF8_LCA...

  • 8 kudos
7 More Replies
maddan80
by New Contributor II
  • 2679 Views
  • 6 replies
  • 3 kudos

Oracle Essbase connectivity

Team, I wanted to understand the best way of connecting to Oracle Essbase to ingest data into the delta lake

  • 2679 Views
  • 6 replies
  • 3 kudos
Latest Reply
hyaqoob
New Contributor II
  • 3 kudos

I am currently working with Essbase 21c and I need to pull data from Databricks through a SQL query. I was able to successfully setup JDBC connection to Databricks but when I try to create a data source using a SQL query, it gives me an error: "[Data...

  • 3 kudos
5 More Replies
RIDBX
by Contributor
  • 39 Views
  • 2 replies
  • 1 kudos

Robust/complex scheduling with dependency within Databricks?

Robust scheduling with dependency within Databricks?======================================  Thanks for reviewing my threads. I like to explore Robust/complex scheduling with dependency within Databricks.We know traditional scheduling framework allow ...

  • 39 Views
  • 2 replies
  • 1 kudos
Latest Reply
pradeep_singh
New Contributor II
  • 1 kudos

Further readings - SQL Altert Task - https://docs.databricks.com/aws/en/jobs/sqlIf else Task - https://docs.databricks.com/aws/en/jobs/if-elseFor Each task - https://docs.databricks.com/aws/en/jobs/for-eachRun job task - https://docs.databricks.com/a...

  • 1 kudos
1 More Replies
Adig
by New Contributor III
  • 8904 Views
  • 6 replies
  • 17 kudos

Generate Group Id for similar deduplicate values of a dataframe column.

Inupt DataFrame'''KeyName KeyCompare SourcePapasMrtemis PapasMrtemis S1PapasMrtemis Pappas, Mrtemis S1Pappas, Mrtemis PapasMrtemis S2Pappas, Mrtemis Pappas, Mrtemis S2Mich...

  • 8904 Views
  • 6 replies
  • 17 kudos
Latest Reply
rafaelpoyiadzi
  • 17 kudos

Hey. We’ve run into similar deduplication problems before. If the name differences are pretty minor (punctuation, spacing, small typos), fuzzy string matching can usually get you most of the way there. That kind of similarity-based clustering works f...

  • 17 kudos
5 More Replies
cdn_yyz_yul
by Contributor
  • 96 Views
  • 5 replies
  • 0 kudos

unionbyname several streaming dataframes of different sources

Is the following type of union safe with spark structured streaming?union multiple streaming dataframes, and each from a different source.Anything better solution ?for example, df1 = spark.readStream.table(f"{bronze_catalog}.{bronze_schema}.table1") ...

  • 96 Views
  • 5 replies
  • 0 kudos
Latest Reply
cdn_yyz_yul
Contributor
  • 0 kudos

Thanks @stbjelcevic ,I am looking for a solution .... === Let's say, I have already had: df1 = spark.readStream.table(f"{bronze_catalog}.{bronze_schema}.table1") df2 = spark.readStream.table(f"{bronze_catalog}.{bronze_schema}.table2") df1a = df1.se...

  • 0 kudos
4 More Replies
NathanE
by New Contributor II
  • 3727 Views
  • 2 replies
  • 1 kudos

Time travel on views

Hello,At my company, we design an application to analyze data, and we can do so on top of external databases such as Databricks. Our application cache some data in-memory and to avoid synchronization issues with the data on Databricks, we rely heavil...

  • 3727 Views
  • 2 replies
  • 1 kudos
Latest Reply
robert1213
Visitor
  • 1 kudos

Hi there,Your use case for time travel on views is really interesting. I can see why being able to track historical versions of both views and their underlying tables would be crucial for an application that relies on caching and granular queries. Ri...

  • 1 kudos
1 More Replies
shan-databricks
by New Contributor III
  • 72 Views
  • 1 replies
  • 0 kudos

What are the prerequisites for connecting Confluent Kafka with Databricks?

Please provide the prerequisites for connecting Confluent Kafka with Databricks, the different connection options, their respective advantages and disadvantages, and the best option for the deliverable.ThanksShanmugam 

  • 72 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @shan-databricks Connecting Confluent Kafka with Databricks creates a powerful "data in motion" to "data at rest" architecture.Below are the prerequisites, connection methods, and strategic recommendations for your deliverable.1. PrerequisitesBefo...

  • 0 kudos
Labels