cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

cszczotka
by New Contributor III
  • 1238 Views
  • 4 replies
  • 0 kudos

Shallow clone and issue with MODIFY permission to source table

Hi,I'm running shallow clone for external delta tables. The shallow clone is failing for source tables where I don't have MODIFY permission. I'm getting below exception. I don't understand why MODIFY permission to source table is required. Is there a...

  • 1238 Views
  • 4 replies
  • 0 kudos
Latest Reply
Amit_Dass_Chmp
New Contributor III
  • 0 kudos

Also check this documentation on access mode :Shallow clone for Unity Catalog tables | Databricks on AWS Working with Unity Catalog shallow clones in Single User access mode, you must have permissions on the resources for the cloned table source as w...

  • 0 kudos
3 More Replies
Maatari
by New Contributor III
  • 1182 Views
  • 2 replies
  • 1 kudos

Fixed interval micro-batches and AvailableNow Trigger

What is the fundamental difference between Fixed interval micro-batches and AvailableNow Trigger, given that both can consume in micro-batch based on the desire size of the micro batch ? Is the fundamental difference the fact that AvailableNow shut d...

  • 1182 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Maatari,  Fixed Interval Micro-batches are like clockwork, processing data at regular intervals without stopping.AvailableNow Trigger is more adaptive, consuming data as it becomes available and then gracefully shutting down. Regarding your confu...

  • 1 kudos
1 More Replies
surband
by New Contributor III
  • 4375 Views
  • 9 replies
  • 0 kudos

Pulsar Streaming (Read) - Benchmarking Information

We are doing a first time implementation of data streaming reading from a partitioned pulsar topics to a delta table managed by UC. We are unable to scale the job beyond about ~ 40k msgs/sec. Beyond 40k msgs/sec , the job fails.  I'd imagine Databric...

  • 4375 Views
  • 9 replies
  • 0 kudos
Latest Reply
surband
New Contributor III
  • 0 kudos

Attached Grafana screenshots

  • 0 kudos
8 More Replies
JacobKesinger
by New Contributor II
  • 2507 Views
  • 3 replies
  • 0 kudos

Resolved! Iterating over a pyspark.pandas.groupby.DataFrameGroupBy

I have a pyspark.pandas.frame.DataFrame object (that I called from `pandas_api` on a pyspark.sql.dataframe.DataFrame object).  I have a complicated transformation that I would like to apply to this data, and in particular I would like to apply it in ...

  • 2507 Views
  • 3 replies
  • 0 kudos
Latest Reply
MichTalebzadeh
Valued Contributor
  • 0 kudos

Hi,The error indicates that the Unity Catalog does not support Spark higher-order functions, such as those used in pandas_udf. This limitation likely comes from architectural or compatibility constraints. To resolve the issue, consider alternative ap...

  • 0 kudos
2 More Replies
nileshtiwaari
by New Contributor
  • 496 Views
  • 1 replies
  • 0 kudos

Unity Catalog External Tables

what if I delete the external tables files manually on storage account for external table without dropping the table itself?

  • 496 Views
  • 1 replies
  • 0 kudos
Latest Reply
mhiltner
Contributor III
  • 0 kudos

This change won't be registered as metadata thus the table will still appear on unity catalog, but you'll get an error when trying to access it, as the table metadata will point to deleted files.   

  • 0 kudos
kazinahian
by New Contributor III
  • 1746 Views
  • 2 replies
  • 1 kudos

Resolved! Lowcode ETL in Databricks

Hello everyone,I work as a Business Intelligence practitioner, employing tools like Alteryx or various low-code solutions to construct ETL processes and develop data pipelines for my Dashboards and reports. Currently, I'm delving into Azure Databrick...

  • 1746 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @kazinahian,  In the Azure ecosystem, you have a few options for building ETL (Extract, Transform, Load) data pipelines, including low-code solutions. Let’s explore some relevant tools: Azure Data Factory: Purpose: Azure Data Factory is a clou...

  • 1 kudos
1 More Replies
DataRonit
by New Contributor II
  • 620 Views
  • 1 replies
  • 1 kudos

My Databricks certified data engineer associate exam got suspended

Hi Team,My Databricks certified data engineer associate exam which was scheduled today and got suspended from the proctor side by raising some false alarms, from my end there was an internet disconnection issue for a couple of minutes. I was almost a...

  • 620 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @DataRonit, Thank you for posting your concern on Community!   To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hour...

  • 1 kudos
nggianno
by New Contributor III
  • 4040 Views
  • 5 replies
  • 2 kudos

How to enable Delta live tables serverless in Databricks?

I am trying to enable the Serverless mode in the Delta Live Tables, based on what the official Databricks channel YouTube video "Delta Live Tables A to Z: Best practices for Modern Data Pipelines".And I cannot find it in my UI. Could you help me with...

  • 4040 Views
  • 5 replies
  • 2 kudos
Latest Reply
kols
New Contributor II
  • 2 kudos

Serverless DLT pipelines are currently in PrPr (Private Preview). Thus, you will not see this checkbox if you are not part of this PrPr. To learn about enabling Serverless DLT pipelines, contact your Databricks account team.

  • 2 kudos
4 More Replies
Yashir
by New Contributor III
  • 3123 Views
  • 5 replies
  • 4 kudos

Is there a way to add Features descriptions for each of the features in a Feature Store table?

 If not, then I believe that it will be beneficial because the feature tables contain engineered features that its a good idea to document their calc logic for the benefit of other data scientists. Also, even non-engineered features are many times no...

  • 3123 Views
  • 5 replies
  • 4 kudos
Latest Reply
deep_thought
Contributor
  • 4 kudos

I also would like to see support added for feature description get/set methods.

  • 4 kudos
4 More Replies
kDev
by New Contributor
  • 11000 Views
  • 8 replies
  • 3 kudos

UnauthorizedAccessException: PERMISSION_DENIED: User does not have READ FILES on External Location

Our jobs have been running fine so far w/o any issues on a specific workspace. These jobs read data from files on Azure ADLS storage containers and dont use the hive metastore data at all.Now we attached the unity metastore to this workspace, created...

  • 11000 Views
  • 8 replies
  • 3 kudos
Latest Reply
Masha
New Contributor III
  • 3 kudos

@Wojciech_BUK  I granted both in the GUI:) you can either search for display name there (and then it uses the Managed Identity Object ID), or you can search directly for the value of the "Managed Identity Application ID" and then it works correctly! ...

  • 3 kudos
7 More Replies
etum
by New Contributor II
  • 655 Views
  • 1 replies
  • 2 kudos

Importing JSON files when format is subject to evolution

Hi there,I'm reaching out for some assistance with importing JSON files into Databricks. Still a beginner even if I've gained experience working with various data import batches (CSV/JSON) for application monitoring:  I'm currently facing a challenge...

  • 655 Views
  • 1 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @etum,  In JSON Schema, you can use the allOf keyword. It allows you to specify that the data must be valid against both the parent schema (the original schema) and the child schema (the new fields). This way, you ensure compatibility with both ol...

  • 2 kudos
Husky
by New Contributor III
  • 10980 Views
  • 5 replies
  • 1 kudos

Resolved! Upload file from local file system to Unity Catalog Volume (via databricks-connect)

Context:IDE: IntelliJ 2023.3.2Library: databricks-connect 13.3Python: 3.10Description:I develop notebooks and python scripts locally in the IDE and I connect to the spark cluster via databricks-connect for a better developer experience.  I download a...

  • 10980 Views
  • 5 replies
  • 1 kudos
Latest Reply
lathaniel
New Contributor III
  • 1 kudos

Late to the discussion, but I too was looking for a way to do this _programmatically_, as opposed to the UI.The solution I landed on was using the Python SDK (though you could assuredly do this using an API request instead if you're not in Python):w ...

  • 1 kudos
4 More Replies
Hardy
by New Contributor III
  • 5404 Views
  • 5 replies
  • 4 kudos

The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption

I am trying to connect to SQL through JDBC from databricks notebook. (Below is my notebook command)val df = spark.read.jdbc(jdbcUrl, "[MyTableName]", connectionProperties) println(df.schema)When I execute this command, with DBR 10.4 LTS it works fin...

  • 5404 Views
  • 5 replies
  • 4 kudos
Latest Reply
DBXC
Contributor
  • 4 kudos

Try to add the following parameters to your SQL connection string. It fixed my problem for 13.X and 12.X;trustServerCertificate=true;hostNameInCertificate=*.database.windows.net; 

  • 4 kudos
4 More Replies
Clampazzo
by New Contributor II
  • 1199 Views
  • 3 replies
  • 0 kudos

Power BI RLS running extremely slowly with databricks

Hi Everyone,I am brand new to databricks and am setting up my first Semantic Model with RLS and have run into an unexpected problem.When I was testing my model with filters applied (where the RLS would handle later on) it runs extremely fast.  I look...

Data Engineering
Power BI
sql
  • 1199 Views
  • 3 replies
  • 0 kudos
Latest Reply
KTheJoker
Contributor II
  • 0 kudos

Are you trying to use Power BI RLS rules on top of DirectQuery? Can you give an example of the rules you're trying to apply? Are they static roles, or dynamic roles based on the user's UPN/email being in the dataset?

  • 0 kudos
2 More Replies
Mathias_Peters
by Contributor
  • 1072 Views
  • 2 replies
  • 0 kudos

Resolved! DLT table not picked in python notebook

Hi, I am a bit stumped atm bc I cannot figure out how to get a DLT table definition picked up in a Python notebook. 1. I created a new notebook in python2. added the following code:  %python import dlt from pyspark.sql.functions import * @dlt.table(...

Mathias_Peters_0-1715334658498.png
  • 1072 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mathias_Peters
Contributor
  • 0 kudos

Ok, it seems that the default language of the notebook and the language of a particular cell can clash. If the default is set to Python, switching a cell to SQL won't work in DLT and vice versa. This is super unintuitive tbh.

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels