Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have a dlt pipeline haning at INIALIZING forever, it never stops. But I found the Analysis Exeption already happened at beginningpyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or functi...
I'm trying to use Databricks Connect to run queries on Delta Tables locally. However, SQL queries using spark.sql don't seem to work properly, even though spark.read.table works.>>> from databricks.connect import DatabricksSession>>> spark = Databric...
I am setting up a workflows with the UI. In the first task, a dynamic value for the next task's num_workers is calculated based on actual data size. In the subsequent task, I'd like to use this calculated num_workers to update the job cluster's defau...
Nowadays we already use the autoloader with checkpoint location, but I still wanted to know if it is possible to read only the last updated file within a folder. I know it somewhat loses the purpose of checkpoint locatioAnother question is it possibl...
Hello,I'm trying to create a DLT pipeline where I read data as a streaming dataset from a Kafka source, save it in a table, and then filter, transform, and pivot the data. However, I've encountered an issue: DLT doesn't support pivoting, and using fo...
Hi @YS1 ,As a workaround you can rewrite pivot to sql with case statements.Below Pivot:data = [
("ProductA", "North", 100),
("ProductA", "South", 150),
("ProductA", "East", 200),
("ProductA", "West", 250),
("ProductB", "North", 30...
Sorry for my very poor English and low Databricks Skill.At work, my boss asked me to perform liquid clustering on four columns for a Delta Lake table with an 11TB capacity and over 80 columns, and I was estimating the resources and costs required to ...
I have a catalog ( in unity catalog) containing multiple schemas. I need an AD group to have select permission on all the schemas so at catalog level I granted Select to AD grp. Then, I need to revoke permission on one particular schema in this cat...
I am writing a frontend webpage that will log into DataBricks and allow the user to select datasets.I am new to front end development, so there may be some things I am missing here, but I know that the DataBricks SQL connector for javascript only wor...
@Debayan Mukherjee​ , are you suggesting to revert the openapi version specified in https://docs.databricks.com/_extras/api-refs/jobs-2.1-aws.yaml from 3.1.0 to 3.0.3?
Actually, I have around 2000 SQL queries. I have to convert them in Databricks supported SQLs, so that I can run them in databricks environment. So I want to know the list of all keywords, functions or anything that is different in databricks SQL. Pl...
Hi @RishabhGarg ,You're saying SQL, but which dialect? Because every provider has its own extension to ANSI SQL standard. So for example, if you're using SQL Server for example, there is a TOP keyword to limit the rows.
HiI've tried to enabled a table to test the new variant data type (public preview)I used the alter cmd: ALTER TABLE tablexxxx SET TBLPROPERTIES('delta.feature.variantType-preview' = 'supported')and I have the error[DELTA_UNSUPPORTED_FEATURES_IN_CONFI...
Hi,Over here they are explaining attribute-based-access-controls, which I want to implement in my project but can't find the documentation or the option to create rules myself. Is this feature already available?https://www.databricks.com/dataaisummit...
Hi, could you include fix SPARK-46990 ([SPARK-46990] Regression: Unable to load empty avro files emitted by event-hubs - ASF JIRA (apache.org)) in Databricks 15.4? (15.4 is in the beta stage, so it might be a right time to include fix)
Has anyone else run into the issue where applying libraries through a compute policy just completely does not work?I'm trying to insane some pretty basic Python libraries from PyPI (pytest and paramiko, for example) and it is failing on 13.3 and 14.3...
Hi @Kayla, In Databricks, compute policies control various aspects of cluster behavior.
When you add libraries to a policy:
Users can’t install or uninstall compute-scoped libraries on compute that uses this policy.Libraries configured through the UI...
Hi allI've used mounts based on service principals but users using shaed clusters or the new serverless they have problems with permissions to access resources on dbfs. Right now we have used clusters in single modeWhat should be the best approach to...