cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Mits
by New Contributor II
  • 2198 Views
  • 4 replies
  • 3 kudos

Sending email alerts to non-databricks user

I am trying to send email alerts to a non databricks user. I am using Alerts feature available in SQL. Can someone help me with the steps.Do I first need to first add Notification Destination through Admin settings and then use this newly added desti...

  • 2198 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Mitali Lad​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

  • 3 kudos
3 More Replies
Phani1
by Valued Contributor II
  • 493 Views
  • 1 replies
  • 0 kudos

integrating Azure Databricks with AAD

Hi Team, Could you please provide the details/process for integrating Azure Databricks - Unity Catalog and AAD? Regards,Phani

  • 493 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
Honored Contributor II
  • 0 kudos

Hello @Phani1 ,These doc pages might be useful for you: Set up and manage Unity CatalogSync users and groups from Microsoft Entra ID 

  • 0 kudos
ismaelhenzel
by New Contributor III
  • 638 Views
  • 1 replies
  • 1 kudos

Upsert into a Delta Lake table with merge when using row masking function

I'm using databricks rls functions on my tables, and i need to make some merges into, but tables with rls functions does not support merge operations (https://docs.databricks.com/en/data-governance/unity-catalog/row-and-column-filters.html#limitation...

ismaelhenzel_0-1716979371091.png
  • 638 Views
  • 1 replies
  • 1 kudos
Latest Reply
raphaelblg
Honored Contributor II
  • 1 kudos

Hi @ismaelhenzel, if you want to use the "MERGE INTO" sql command, you must turn-off rls. This is by design.

  • 1 kudos
Mathias
by New Contributor II
  • 389 Views
  • 1 replies
  • 0 kudos

Delay rows coming into DLT pipeline

Backgroundand requirements: We are reading data from our factory and storing it in a DLT table called telemetry with columns sensorid, timestamp and value. We need to get rows where sensorid is “qrreader-x” and join with some other data from that sam...

  • 389 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
Honored Contributor II
  • 0 kudos

Hi @Mathias,  I'd say that watermarking might be a good solution for your use case. Please check Control late data threshold with multiple watermark policy in Structured Streaming.  If you want to dig-in further there's also: Spark Structured Streami...

  • 0 kudos
EcuaCrisCar
by New Contributor III
  • 979 Views
  • 1 replies
  • 0 kudos

Sending a personalized message to email.

Greetings community, I am new to using databricks and for some time I have tried some scripts in notebook. I would like your help on a task: Carry out a personalized mailing where, First, a query of the number of records in the test table is performe...

Data Engineering
SENDEMAIL SQL
  • 979 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @EcuaCrisCar,  To query the number of records in your test table, you can use SQL or DataFrame APIs in Databricks.Next, you’ll need to check if the record count falls within the specified range (80,000 to 90,000). If it does, proceed with the note...

  • 0 kudos
filipjankovic
by New Contributor
  • 3188 Views
  • 1 replies
  • 0 kudos

JSON string object with nested Array and Struct column to dataframe in pyspark

I am trying to convert JSON string stored in variable into spark dataframe without specifying schema, because I have a big number of different tables, so it has to be dynamically. I managed to do it with sc.parallelize, but since we are moving to Uni...

  • 3188 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @filipjankovic, Since you have multiple tables and need dynamic schema inference, I recommend using the following approach: Schema Inference from JSON String: You can infer the schema from the JSON string and then create a DataFrame. Schema I...

  • 0 kudos
NikhilK1998
by New Contributor II
  • 1214 Views
  • 1 replies
  • 1 kudos

DataBricks Certification Exam Got Suspended. Require support for the same.

Hi,I applied for Databricks Certified: Data Engineer Professional certification on 5th July 2023. The test was going fine for me but suddenly there was an alert from the system (I think I was in proper angle in front of camera and was genuinely givin...

  • 1214 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @NikhilK1998, I'm sorry to hear your exam was suspended. Thank you for filing a ticket with our support team. Please allow the support team 24-48 hours to resolve. In the meantime, you can review the following documentation: Room requirements Beh...

  • 1 kudos
Avinash_Narala
by Contributor
  • 789 Views
  • 1 replies
  • 0 kudos

Instance profile failure while installing Databricks Overwatch

Despite following the steps mentioned in the provided link to create an instance profile, we encountered a problem in step 6 where we couldn't successfully add the instance profile to Databricks(Step 6: Add the instance profile to Databricks).https:/...

  • 789 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Avinash_Narala, The error message you provided indicates that the verification of the instance profile failed due to an AWS authorization issue. Specifically, the user associated with the assumed role arn:aws:sts::755231362028:assumed-role/databr...

  • 0 kudos
MiBjorn
by New Contributor II
  • 735 Views
  • 2 replies
  • 1 kudos

Optimizing Data Insertion Speed for JSON Files in DLT Pipeline

Background:I'm working on a data pipeline to insert JSON files as quickly as possible. Here are the details of my setup: File Size: 1.5 - 2 kB eachFile Volume: Approximately 30,000 files per hourPipeline: Using Databricks Delta Live Tables (DLT) in c...

  • 735 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @MiBjorn, Confirm that you're using the appropriate DLT product edition (Core, Pro, or Advanced) based on your workload requirements1.You'll receive an error message if your pipeline includes features that are not supported by the selected edit...

  • 1 kudos
1 More Replies
sukanya09
by New Contributor II
  • 788 Views
  • 1 replies
  • 0 kudos

Photon is not supported for a query

(1) LocalTableScan Output [11]: [path#23524, partitionValues#23525, size#23526L, modificationTime#23527L, dataChange#23528, stats#23529, tags#23530, deletionVector#23531, baseRowId#23532L, defaultRowCommitVersion#23533L, clusteringProvider#23534] Arg...

Data Engineering
Databricks
MERGE
Photon
  • 788 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @sukanya09,  The query you provided includes a LocalTableScan node, which Photon does not fully support.The specific node you mentioned has several attributes, such as path, partitionValues, size, modificationTime, and more.Unfortunately, Photon e...

  • 0 kudos
surband
by New Contributor III
  • 934 Views
  • 1 replies
  • 0 kudos

Databricks Run Notebook GitHub Action

The GitHub action databricks/run-notebook to deploy and run a notebook from GitHub to DBX awaits the completion of the Job. The pulsar streaming job that I have is a long running job due to which the Action times out when the access token it uses to ...

  • 934 Views
  • 1 replies
  • 0 kudos
Latest Reply
surband
New Contributor III
  • 0 kudos

https://github.com/databricks/run-notebook/issues/53#issue-2321682696

  • 0 kudos
190809
by Contributor
  • 1737 Views
  • 3 replies
  • 2 kudos

Is there a way to add a date parameter to the jobs run API call?

Hi there I am currently making a call to the Databricks API jobs run endpoint. I would like to make this call on a daily basis to get data on the jobs run in the past 24 hours and add this to my delta table. Is there a way to set a GTE value in the A...

  • 1737 Views
  • 3 replies
  • 2 kudos
Latest Reply
AdrianC
New Contributor II
  • 2 kudos

Actually the "start_time_to" parameter doesn't seem to work at all. Neither alone or together with "start_time_from" (Whenever used the api call returns nothing). I'd like to report this as an issue as we want to automate our cluster usage monitoring...

  • 2 kudos
2 More Replies
Dicer
by Valued Contributor
  • 1544 Views
  • 3 replies
  • 0 kudos

Why Pandas on Spark can trigger `Driver is up but is not responsive, likely due to GC` ?

I am using the distributed Pandas on Spark, not the single node Pandas.But when I try to run the following code to transform a data frame with 652 x 729803 data points  df_ps_pct = df.pandas_api().pct_change().to_spark()  , it returns me this error: ...

  • 1544 Views
  • 3 replies
  • 0 kudos
Latest Reply
anardinelli
Contributor
  • 0 kudos

@Hi @Dicer  I don't think you have a problem with the workers, since you are running distributed Pandas, work is going to be paralleled either way. When the data is collected back to the Driver, then it might be overloaded (since the Driver has to co...

  • 0 kudos
2 More Replies
Benhcosta
by New Contributor
  • 838 Views
  • 1 replies
  • 0 kudos

Data Engineering

The Data and Ai summit provided a great insight into the future of the Databricks platform and ideas for future utilization. 

  • 838 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Benhcosta, We're thrilled to hear that you had a great experience at DAIS 2023! Your feedback is valuable to us, and we appreciate you taking the time to share it on the community platform.   We wanted to let you know that the Databricks Communit...

  • 0 kudos
SumitBhatia
by New Contributor
  • 940 Views
  • 1 replies
  • 0 kudos

URGENT: dbt Job Failing in Databricks - Azure Repo Access Denied (Service Principal)

I am encountering issues while running a Databricks job using a Microsoft Entra ID Service Principal. My workflow includes a task of type "dbt," which requires authentication and access to the Azure Repo containing my dbt project code. I have granted...

  • 940 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @SumitBhatia,  You’ll need to create a service principal in your Microsoft Entra ID (formerly Azure Active Directory) tenant. This service principal will represent your application and allow it to authenticate with Azure services.Make sure you hav...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels