cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RGSLCA
by Visitor
  • 30 Views
  • 2 replies
  • 1 kudos

Databricks Python stored procedures

Hi,I am using databricks runtime 17.3.x-scala2.13 ,  But I am unable to create python stored procedures, (functions are possible but they dont support a spark session like below) , any thoughts/help is much appreciated ? [INVALID_STATEMENT_OR_CLAUSE]...

  • 30 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @RGSLCA ,This video is a bit misleading - look at the comments section. This feature was not released and as of now you can only create stored procedure using an SQL language.

  • 1 kudos
1 More Replies
Nmtc9to5
by New Contributor II
  • 106 Views
  • 2 replies
  • 0 kudos

Enable CDC in Lakeflow Connect Tables

Hello everyone, I'm implementing a project that requires AutoCDC workflows using Lakeflow.The architecture is as follows: Data is ingested from a database using Lakeflow Connect, and then a declarative pipeline performs some transformations on this d...

Data Engineering
autocdc
change data capture
declarative pipelines
LakeFlow
lakeflow connect
  • 106 Views
  • 2 replies
  • 0 kudos
Latest Reply
Satyam4u
New Contributor II
  • 0 kudos

The short answer is no, you don't need to enable it, because the tables generated by Lakeflow Connect already capture and stream CDC metadata natively. In fact, streaming tables in Delta Live Tables (DLT) and Lakeflow pipelines are built on top of an...

  • 0 kudos
1 More Replies
Shanmugaraja
by New Contributor
  • 163 Views
  • 2 replies
  • 0 kudos

DLT pipeline's compute policy when Instance pool Id used it ignores the VM series.

Hi In Lake flow Spark Declarative Pipelines (formerly DLT) I’m trying to understand how instance pool, cluster policy and DLT pipeline interact, especially around instance type selection.I created an instance pool with Instance type: Standard_DS3_v2 ...

  • 163 Views
  • 2 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

I tried to reproduce and it worked as expected for me. Check your pipeline JSON to make sure your "clusters" spec there is explicitly mentioning the pool details. Example: "pipeline_type": "WORKSPACE", "name": "mojama-dlt-classic-demo", "cluste...

  • 0 kudos
1 More Replies
muaaz
by New Contributor III
  • 95 Views
  • 2 replies
  • 0 kudos

PostgreSQL ingestion source not supported in workspace when deploying Databricks Asset Bundle

I'm trying to deploy a Databricks Asset Bundle that creates a DLT/Lakeflow pipeline with a PostgreSQL ingestion source. The bundle builds successfully, uploads artifacts, and starts deploying resources, but the pipeline creation fails with the follow...

  • 95 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @muaaz ,Yep, that feature is in public preview. Usually, when something is in public preview you have access to that feature, but in this case they require some steps to enroll it for a given workspace. So, as @balajij8  and docs suggests -> conta...

  • 0 kudos
1 More Replies
muaaz
by New Contributor III
  • 223 Views
  • 6 replies
  • 1 kudos

Resolved! Automate Lakeflow connect to ingest 300 tables not manually

I have data in PostgreSQL and I’m using Lakeflow Connect via UI to ingest it into Databricks streaming tables.Currently, each Lakeflow Connect pipeline only allows connecting one PostgreSQL table. I have around 300 tables, and creating pipelines manu...

  • 223 Views
  • 6 replies
  • 1 kudos
Latest Reply
muaaz
New Contributor III
  • 1 kudos

Thanks @szymon_dybczak for your support.

  • 1 kudos
5 More Replies
Rudr12
by Visitor
  • 61 Views
  • 0 replies
  • 0 kudos

Apache Spark Masterclass (In-Person, Bengaluru) | 6 June

Hi everyone,We're hosting the next session of our Data Engineering Masterclass Series focused on Apache Spark.This is an in-person, hands-on session for engineers interested in modern data engineering, distributed data processing, and real-world Spar...

  • 61 Views
  • 0 replies
  • 0 kudos
malla_aayush
by Databricks Partner
  • 1081 Views
  • 3 replies
  • 2 kudos

Resolved! Not able to find lab for Data Engineering Learning Path

I am not able to find the data engineering learning path , i did open partner databricks academy lab which redirected to uplimit where i also enrolled myself to instructor led course but not able to see any labs.

  • 1081 Views
  • 3 replies
  • 2 kudos
Latest Reply
junaid-databrix
New Contributor III
  • 2 kudos

You are right the self paced e-learning courses does not include any labs. However, they are available on instructor led courses available on Uplimit. I recently enrolled for one and here is how it worked for me:1. On Uplimit portal enroll for an upc...

  • 2 kudos
2 More Replies
mnissen1337
by New Contributor III
  • 244 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks SQL connection becomes stale in long-running app

I’m building a Databricks App that continuously queries a SQL Warehouse roughly every 30 seconds to retrieve updated data.To avoid the overhead of repeatedly opening new connections, I’m currently caching the Databricks SQL connection using lru_cache...

  • 244 Views
  • 3 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

SQLAlchemy dialect is a wrapper for the native databricks sql connector. You can try to pass the various authentication configuration supported by the underlying SQL connector directly into the connect_args dictionary parameter of the alchemy engine....

  • 0 kudos
2 More Replies
thedatacrew
by Databricks Partner
  • 3369 Views
  • 8 replies
  • 1 kudos

Resolved! Delta Live Tables - skipChangeCommits in SQL

Hi,Could anyone tell me if the skipChangeCommits option is supported in SQL mode? I can use it successfully using Python, but it doesn't look like it is supported by SQL.It seems to be a glaring omission from the SQL support, or support for this will...

thedatacrew_0-1736866714336.png
  • 3369 Views
  • 8 replies
  • 1 kudos
Latest Reply
moritzmeister
Databricks Employee
  • 1 kudos

This is now supported:CREATE OR REFRESH STREAMING TABLE basic_stAS SELECT * FROM STREAM samples.nyctaxi.trips WITH (SKIPCHANGECOMMITS);Supported in runtime 17.3 and later.Documentation: https://docs.databricks.com/aws/en/ldp/developer/sql-dev#create-...

  • 1 kudos
7 More Replies
Rupa0503
by New Contributor II
  • 204 Views
  • 3 replies
  • 0 kudos

Liquid Clustering VS Z-ordering

I want to understand difference b/w Liquid Clustering VS Z-ordering and also how both works?

  • 204 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Rupa0503, In simple terms... both Liquid Clustering and Z-ordering are ways to improve data layout so Databricks can skip more irrelevant files during reads, but they are not the same thing. If I had to summarise it simply...  Z-ordering is the o...

  • 0 kudos
2 More Replies
Sameera_Naureen
by New Contributor
  • 151 Views
  • 1 replies
  • 0 kudos

internship

I am a data science aspiring Student i am very much interested in databricks and i am looking for internships. if anyone knows how to apply please help

  • 151 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Esteemed Contributor
  • 0 kudos

@Sameera_Naureen You may be on a lookout for a fresh opportunities on this link - https://www.databricks.com/company/careers/university-recruiting

  • 0 kudos
Nkrom
by New Contributor
  • 298 Views
  • 4 replies
  • 0 kudos

Renaming a folder in adls is taking a lot of time

Hi i have a folder customer and customer_01 in adls location , now i need to rename customer_01 to customer and customer to customer_01 both if these folder have lots of files . If i use dbutls.fs.mv its taking a lot of time like 7 hours something is...

  • 298 Views
  • 4 replies
  • 0 kudos
Latest Reply
ShamenParis
New Contributor II
  • 0 kudos

Hi @Nkrom ,I am happy to share the Azure REST API method! Using the Azure Python SDK is the absolute fastest way to do this but you can choose any other programming language.ADLS Gen2 uses a "Hierarchical Namespace" (HNS). When you use the Azure SDK ...

  • 0 kudos
3 More Replies
Mario_D
by New Contributor III
  • 455 Views
  • 3 replies
  • 3 kudos

Resolved! Missing upstream column lineage missing from api call after some time

I ran the following piece of code on 2 occasions.table_name = 'full path of table"lineage = w.api_client.do("GET",f"/api/2.0/lineage-tracking/column-lineage",body={"table_name": table_name,"column_name": "column_x"})u_lineage_df = spark.createDataFra...

  • 455 Views
  • 3 replies
  • 3 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 3 kudos

Hi @Mario_D, From what I can gather, this can happen, and it’s usually less about a restriction on calling the API itself and more about how lineage was captured or what the caller is allowed to see. A few common reasons are: The caller no longer has...

  • 3 kudos
2 More Replies
Pat
by Esteemed Contributor
  • 172 Views
  • 1 replies
  • 1 kudos

Sync Tables: Unity Catalog to Lakebase - Materialized Views triggered mode

Hi,I am facing some issues syncing Materialized Views into Lakebase with triggered mode, and I actually wonder if this is actually possible.create materialized view some_catalog.some_schema.some_table tblproperties( 'delta.enableChangeDataFeed' = '...

Pat_0-1780047815206.png Pat_2-1780048797934.png
  • 172 Views
  • 1 replies
  • 1 kudos
Latest Reply
ShamenParis
New Contributor II
  • 1 kudos

Hi @Pat Your research on this is spot on! I was actually really curious about this problem and tested it in my own environment to find the exact solution, and I replicated your exact issue.MV creation:Sync to Postgres (Snapshot):Here is exactly why t...

  • 1 kudos
Labels