cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

HW413
by New Contributor II
  • 883 Views
  • 4 replies
  • 3 kudos

Copy into checkpoint location not able to find

Hi All, I have been using COPYINTO for ingesting the data from managed volumes  and my destination is a managed delta table .I would like to know where is it storing the metadata information or a checkpoint location to maintain its idempotent feature...

  • 883 Views
  • 4 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @HW413 ,You won't find checkpoint. COPY INTO does not use checkpoint like autoloader or spark structured streaming. The COPY INTO command retrieves metadata about all files in the specified source directory/prefix . So, every time you run copy int...

  • 3 kudos
3 More Replies
DatabricksEngi1
by Contributor
  • 2121 Views
  • 7 replies
  • 1 kudos

run a Databricks notebook on serverless environment version 4 with Asset Bundles

Hi everyone,I’m working with Databricks Asset Bundles and running jobs that use notebooks (.ipynb).According to the documentation, it should be possible to set an environment version for serverless jobs. I want to force all of my notebook tasks to ru...

  • 2121 Views
  • 7 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @DatabricksEngi1 When you're defining job in DAB you're using job mapping. One of the key of job mapping is called environments. This is the one you're looking for:  Databricks Asset Bundles resources - Azure Databricks | Microsoft Learn

  • 1 kudos
6 More Replies
aonurdemir
by Contributor
  • 1320 Views
  • 2 replies
  • 2 kudos

Resolved! Creating an SCD Type 2 Table with Auto CDC API (One-Time Load + Ongoing Updates)

Hello everyone,I’m working with two CDC tables:table_x: 23,467,761 rows (and growing)table_y: 27,868,173,722 rowsMy goal is to build an SCD Type 2 table (table_z) using the Auto CDC API.The workflow I’d like to achieve is:Initial Load: Populate table...

  • 1320 Views
  • 2 replies
  • 2 kudos
Latest Reply
aonurdemir
Contributor
  • 2 kudos

I have solved it with the name parameter as this:dlt.create_streaming_table(name="table_z")dlt.create_auto_cdc_flow(name="backfill",target="table_z",source="table_y",keys=["user_id"],sequence_by=col("source_ts_ms"),ignore_null_updates=False,apply_as_...

  • 2 kudos
1 More Replies
saicharandeepb
by Contributor
  • 1116 Views
  • 3 replies
  • 1 kudos

How to get Spark run-time and structured metrics before job completion?

Hi all,I’m trying to get Spark run-time metrics and structured streaming metrics by enabling cluster logging and now I see the following folders:What I noticed is that the eventlog folder only gets populated after a job has completed. That makes it d...

saicharandeepb_0-1757940030183.png
  • 1116 Views
  • 3 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

Did you try the above solution ? Keep us updated

  • 1 kudos
2 More Replies
lizou1
by New Contributor III
  • 864 Views
  • 3 replies
  • 1 kudos

Resolved! serverless workflow Compute became unresponsive. Compute is likely out of memory.

I set up run 10 notebooks at same time in serverless workflowI got this error:serverless workflow Compute became unresponsive. Compute is likely out of memory.Is there a quota in serverless compute I can set in zure databricks? These notebooks are pr...

  • 864 Views
  • 3 replies
  • 1 kudos
Latest Reply
lizou1
New Contributor III
  • 1 kudos

The issue is new, and azure cloud providers are also not quite sure the details, will get more info later

  • 1 kudos
2 More Replies
BenDataBricks
by New Contributor II
  • 3701 Views
  • 1 replies
  • 2 kudos

Register more redirect URIs for OAuth U2M

I am following this guide on allowing OAuth U2M for Azure Databricks.When I get to Step 2, I make a request to account.azuredatabricks.net and specify a redirect URI to receive a code.The redirect URI in the example is localhost:8020. If I change thi...

  • 3701 Views
  • 1 replies
  • 2 kudos
Latest Reply
AFox
Contributor
  • 2 kudos

You have to register a new OAuth application.See: Enable or disable partner OAuth applications and API: Create Custom OAuth App Integration 

  • 2 kudos
naveens
by Databricks Partner
  • 2502 Views
  • 1 replies
  • 1 kudos

Resolved! Power BI Service – OAuth2 Databricks Authentication Failing After Tenant Migration

Hi,We are working on Power BI migration from INFY to TATA.I have a user TATA.nato@tata.com.a. with this user Iam able to connect to Azure databricks using Power BI desktop in INFY tenant.b. Iam logged in as TATA.nato@tata.com and switched to tenant I...

  • 2502 Views
  • 1 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 1 kudos

@naveens here are few things that you can try.1. Re-authenticate in Power BI ServiceGo to Power BI Service → Settings → Data Sources.Locate the Databricks data source.Click Edit Credentials.Choose OAuth2 and re-authenticate using the correct Azure AD...

  • 1 kudos
eballinger
by Contributor
  • 1302 Views
  • 2 replies
  • 3 kudos

Resolved! Databricks shared folder area permissions issue

We have some notebook code that I would like to share with our team only in the "shared folder" area of Databricks. I know by default this area is meant as a area to share stuff with the entire organization but from what I have read you should be abl...

  • 1302 Views
  • 2 replies
  • 3 kudos
Latest Reply
Isi
Honored Contributor III
  • 3 kudos

Hello @eballinger In Databricks, the users group (sometimes shown in the UI as All workspace users) has default permissions that cannot be revoked at the top-level Shared folder. DocsSo looks like:It’s not possible to create a folder under /Shared th...

  • 3 kudos
1 More Replies
David_M
by Databricks Partner
  • 705 Views
  • 1 replies
  • 0 kudos

Databricks Lakeflow Connector for PostgreSQL on GCP Cloud

Lakeflow connection for PosgresHi all,I hope this message finds you well.I am currently trying to create a Lakeflow connection in Databricks for a PostgreSQL database hosted on Google Cloud Platform (GCP). However, when testing the connection, I am e...

Screenshot_4.png
  • 705 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hello @David_M To better support you we’d need to clarify a few points: PostgreSQL locationIs this PostgreSQL deployed inside a private VPC in GCP or is it exposed through a public IP accessible from the internet? This is key to understand what type ...

  • 0 kudos
tyhatwar785
by Databricks Partner
  • 531 Views
  • 1 replies
  • 1 kudos

Solution Design Recommendation on Databricks

Hi Team,We need to design a pipeline in Databricks to:1. Call a metadata API (returns XML per keyword), parse, and consolidate into a combined JSON.2. Use this metadata to generate dynamic links for a second API, download ZIPs, unzip, and extract spe...

  • 531 Views
  • 1 replies
  • 1 kudos
Latest Reply
nikhilmohod-nm
New Contributor III
  • 1 kudos

Hi @tyhatwar785 1. Should metadata and file download be separate jobs/notebooks or combined?Keep them in separate notebooks but orchestrate them under a single Databricks Job.for better error handling, and retries .2. Cluster recommendationsstart wit...

  • 1 kudos
MGAutomation
by New Contributor
  • 647 Views
  • 2 replies
  • 0 kudos

How to connect to a local instance of SQL Server

How can I connect my Databricks AWS account to a local instance of SQL Server?

  • 647 Views
  • 2 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hello @MGAutomation  @szymon_dybczak You may also need to open the firewall of your on-premises SQL Server to the CIDR range of your Databricks VPC. This ensures that the EC2 instances used by Databricks have valid IPs that can reach your database.If...

  • 0 kudos
1 More Replies
pshuk
by New Contributor III
  • 1447 Views
  • 3 replies
  • 0 kudos

Access Databricks Volume through CLI

Hi,I am able to connect to DBFS and transfer files there or download from there. But when I change the path to Volumes, it doesn't work. Even though I created the volume I still get this error message:Error: no such directory: /Volumes/bgem_dev/text_...

  • 1447 Views
  • 3 replies
  • 0 kudos
Latest Reply
nisarg0
New Contributor II
  • 0 kudos

@arpit 

  • 0 kudos
2 More Replies
tana_sakakimiya
by Contributor
  • 848 Views
  • 1 replies
  • 2 kudos

Resolved! What is "External tables backed by Delta Lake"?

Goal: event-driven without implementing job triggereed on file arrivalI see hope to incrementally update materialized views which have external tables as their sources.This is quite a game changer if it works for various data formats.(since MV starte...

  • 848 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @tana_sakakimiya ,Yes, only external tables that are in delta format are supported. Databricks supports other table formats, but to be able to use this particular feature, your table needs to be in Delta format. But if you have parquet files it's ...

  • 2 kudos
andr3s
by New Contributor II
  • 43899 Views
  • 8 replies
  • 2 kudos

SSL_connect: certificate verify failed with Power BI

Hi, I'm getting this error with Power BI:Any ideas?Thanks in advance,Andres

Screenshot 2023-05-19 154328
  • 43899 Views
  • 8 replies
  • 2 kudos
Latest Reply
GaneshKrishnan
New Contributor II
  • 2 kudos

In the proxy setup, PowerBI is not aware of process to fetch intermediate certificate like a browser. hence it fails. Recent PowerBI comes with additional option such as"Automatic Proxy Discovery (Optional): Enabled"Implementation (optional) : 2.0(be...

  • 2 kudos
7 More Replies
cpayne_vax
by New Contributor III
  • 28820 Views
  • 16 replies
  • 9 kudos

Resolved! Delta Live Tables: dynamic schema

Does anyone know if there's a way to specify an alternate Unity schema in a DLT workflow using the @Dlt.table syntax? In my case, I’m looping through folders in Azure datalake storage to ingest data. I’d like those folders to get created in different...

  • 28820 Views
  • 16 replies
  • 9 kudos
Latest Reply
surajitDE
Contributor
  • 9 kudos

if you add these settings in the pipeline JSON, the issue should get fixed:"pipelines.setMigrationHints" = "true""pipelines.enableDPMForExistingPipeline" = "true"I tried it on my side, and now it no longer throws the materialization error.

  • 9 kudos
15 More Replies
Labels