cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

mrinmoygupta
by New Contributor II
  • 125 Views
  • 2 replies
  • 0 kudos

View alter permission for multiple members

Hi I've setup the gold layer for my client by creating views. Currently not using CI/CD for view deployment but it's on roadmap. Now I've another 2-3 people joining the team and they will be making minor updates to the views as a result of change req...

  • 125 Views
  • 2 replies
  • 0 kudos
Latest Reply
mrinmoygupta
New Contributor II
  • 0 kudos

Hi @Alberto_Umana thank you for your response. Unfortunately we already have all these three things in place. The problem still remains as when a member of the group alters a view, they might become the new owner, which can restrict other members fro...

  • 0 kudos
1 More Replies
stevenayers-bge
by Contributor
  • 740 Views
  • 1 replies
  • 0 kudos

Autoloader: Read old version of file. Read modification time is X, latest modification time is X

I'm recieving this error from autoloader. It seems to be stuck on this one file. I don't care when it was read and last modified, I just want to ingest it. Any ideas?java.io.IOException: Read old version of file s3a://<file-path>.json. Read modificat...

  • 740 Views
  • 1 replies
  • 0 kudos
Latest Reply
PotnuruSiva
Databricks Employee
  • 0 kudos

@stevenayers-bge Autoloader is designed to work best with immutable files. If files are mutable (i.e., they can be updated), it is recommended to set cloudFiles.allowOverwrites = true to ensure that the latest version of the file is read. Please refe...

  • 0 kudos
Marcin_U
by New Contributor II
  • 1406 Views
  • 2 replies
  • 1 kudos

AutoLoader - handle spark write transactional (_SUCCESS file) on ADLS

Spark write method (df.write.parquet) to parquet files is transactional. I mean after write is sucessfull file _SUCCESS is created in path where parquet files was loaded.Is it possible to configure AutoLoader to load parquet files only in case when w...

Marcin_U_0-1709647032623.png
  • 1406 Views
  • 2 replies
  • 1 kudos
Latest Reply
PotnuruSiva
Databricks Employee
  • 1 kudos

@Marcin_U Please use the below option in the readStream to load only parquet files .option("pathGlobfilter", "*.parquet") Please refer to the below documentation: https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/options.html...

  • 1 kudos
1 More Replies
GregTyndall
by New Contributor II
  • 298 Views
  • 2 replies
  • 0 kudos

Resolved! Materialized View Refresh - NUM_JOINS_THRESHOLD_EXCEEDED?

I have a very basic view with 3 inner joins that will only do a full refresh. Is there a limit to the number of joins you can have and still get an incremental refresh?"incrementalization_issues": [{"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MO...

  • 298 Views
  • 2 replies
  • 0 kudos
Latest Reply
PotnuruSiva
Databricks Employee
  • 0 kudos

@GregTyndall Yes, the current limit is 2 by default. But we can increase up to 5 with the below flag added to the pipeline settings. pipelines.enzyme.numberOfJoinsThreshold 5

  • 0 kudos
1 More Replies
DB_Five
by New Contributor II
  • 158 Views
  • 3 replies
  • 0 kudos

Connecting AWS MSK from Databricks

Hello ,I am new to AWS and MSK. I have created MSK on VPC with public subnet and I am trying to connect it from Databricks  on AWS. I see that both MSK and VPC are in two different VPC.  Do we need to create VPC peering to establish the connection be...

  • 158 Views
  • 3 replies
  • 0 kudos
Latest Reply
DB_Five
New Contributor II
  • 0 kudos

 Hello,Thanks for your quick update. I will continue with the setup. I also have one more question. It is mentioned in the set up that, we have to have the bellow in the Kafka client properties.In addition, if you choose to configure your connection ...

  • 0 kudos
2 More Replies
sashikanth
by New Contributor II
  • 158 Views
  • 2 replies
  • 0 kudos

Updates are going as insert in a databricks job

There is no code change in a databricks job or notebook but we have observed the malfunction in cdc. The records which are meant for updates are going for inserts and causing duplicacy. At the sametime we had make sure the PKs based on which merge ru...

  • 158 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @sashikanth, I would recommend opening a case with us further investigate this behavior since this requires a code logic review and additional checks. Please refer to: https://docs.databricks.com/en/resources/support.html

  • 0 kudos
1 More Replies
Youngwb
by New Contributor II
  • 678 Views
  • 1 replies
  • 1 kudos

Databricks ODBC driver tooks long time to list columns

I'm testing the performance of databricks, when I use ODBC driver to submit query, I found it slower than notebook,and it looks like odbc driver will send "list columns" request to databricks, 1. Is there a way to prevent the ODBC driver from sending...

Youngwb_0-1722594481062.png
Data Engineering
Databricks
deltalake
ODBC driver
  • 678 Views
  • 1 replies
  • 1 kudos
Latest Reply
varunjaincse
New Contributor II
  • 1 kudos

@Youngwb Did you fixed this issue?

  • 1 kudos
mangel
by New Contributor III
  • 6891 Views
  • 7 replies
  • 3 kudos

Resolved! Delta Live Tables error pivot

I'm facing an error in Delta Live Tables when I want to pivot a table. The error is the following: And the code to replicate the error is the following:import pandas as pd import pyspark.sql.functions as F   pdf = pd.DataFrame({"A": ["foo", "foo", "f...

image
  • 6891 Views
  • 7 replies
  • 3 kudos
Latest Reply
Michiel_Povre
New Contributor II
  • 3 kudos

Hi, Was this a specific design choice to not allow Pivots in DLT? I'm under the impression they expect fixed table structures in DLT design for a reason, but I don't understand the reason? Conceptually, I understand the fixed structures makes lineage...

  • 3 kudos
6 More Replies
anantkharat
by New Contributor II
  • 333 Views
  • 2 replies
  • 1 kudos

Resolved! Getting

payload = {"clusters": [{"num_workers": 4}],"pipeline_id": pipeline_id}update_url = f"{workspace_url}/api/2.0/pipelines/{pipeline_id}"response = requests.put(update_url, headers=headers, json=payload)for this, i'm getting below output with status cod...

Data Engineering
Databricks
Delta Live Tables
  • 333 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Thank you for accepting the solution, I am glad it worked.

  • 1 kudos
1 More Replies
Filip
by New Contributor II
  • 3385 Views
  • 4 replies
  • 0 kudos

How to Assign User Managed Identity to DBR Cluster so I can use it for quering ADLSv2?

Hi,I'm trying to figure out if we can switch from Entra ID SPN's to User Assigned Managed Indentities and everything works except I can't figure out how to access the lake files from python notebook.I've tried with below code and was running it on a ...

  • 3385 Views
  • 4 replies
  • 0 kudos
Latest Reply
kuniteru
New Contributor II
  • 0 kudos

Hi,I can be accessed with the following code.storageAccountName = "my-storage-account-name" applicationClientId = "my-umi-client-id" aadDirectoryId = "my-entra-tenant-id" containerName = "my-lake-container" spark.conf.set("fs.azure.account.auth.type...

  • 0 kudos
3 More Replies
furqanraza
by New Contributor II
  • 249 Views
  • 2 replies
  • 0 kudos

ETL job execution failure in serverless compute

Hi,We facing an issue when executing the ELT pipeline on serverless compute. In the ETL pipeline, for some users, a task gets stuck every time we run the job. However, a similar ETL pipeline works fine for other users. Furthermore, there are canceled...

  • 249 Views
  • 2 replies
  • 0 kudos
Latest Reply
ozaaditya
New Contributor III
  • 0 kudos

Hi,Could you please share the error message or any logs you’re seeing when the task gets stuck? This will help in diagnosing the issue and identifying potential solutions.

  • 0 kudos
1 More Replies
NhanNguyen
by Contributor III
  • 1242 Views
  • 8 replies
  • 2 kudos

ConcurrentAppendException After Delta Table was enable Liquid Clustering and Row level concurrency

Everytime I run parallel job it always failed with this error: ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.I did a lot of reseaches also create liquid clustering table an...

  • 1242 Views
  • 8 replies
  • 2 kudos
Latest Reply
cgrant
Databricks Employee
  • 2 kudos

Thanks for sharing. In the original screenshots I've noticed that you've set delta.isolationLevel to Serializable, which is the strongest (and most strict) level. Please try WriteSerializable, which is the default level.

  • 2 kudos
7 More Replies
JissMathew
by New Contributor III
  • 446 Views
  • 1 replies
  • 0 kudos

Auto Loader

from pyspark.sql.types import StructType, StructField, LongType, StringType, TimestampTypefrom pyspark.sql import functions as F, Windowfrom delta.tables import DeltaTableimport logging# Set up logginglogging.basicConfig(level=logging.INFO)logger = l...

  • 446 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 0 kudos

If you intend to capture data changes, take a look at this doc, which talks about change data feed in Databricks.

  • 0 kudos
nalindhar
by New Contributor II
  • 154 Views
  • 2 replies
  • 1 kudos

Historical Migration of Data from Delta Tables to Delta Live Tables

Our team is planning to migrate to the Delta Live Tables (DLT) framework for data ingestion. We currently have Delta tables populated with several years of data from ingested files and wish to avoid re-ingesting these files. What is the best approach...

  • 154 Views
  • 2 replies
  • 1 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 1 kudos

Hey @nalindhar I assume you want the target table to have the same name once you declare your ETL operations in the DLT pipeline, if that's the case, begin with renaming your delta table to specify that they contain historical data, you can do this v...

  • 1 kudos
1 More Replies
kfloresip
by New Contributor
  • 129 Views
  • 1 replies
  • 0 kudos

Bloomberg API and Databricks

Dear Databricks Community,Do you have any experience connecting Databricks (Python) to the Bloomberg Terminal to retrieve data in an automated way? I tried running the following code without success:%python%pip install blpapiThanks for your help,Kevi...

  • 129 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Hi @kfloresip ,As per documentation you need to run:%pip install --index-url=https://blpapi.bloomberg.com/repository/releases/python/simple/ blpapi It means that the package is not available on the standard Python Package Index (PyPI) and Bloomberg h...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels