cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Erik
by Valued Contributor III
  • 19826 Views
  • 13 replies
  • 8 kudos

Grafana + databricks = True?

We have some timeseries in databricks, and we are reading them into powerbi through sql compute endpoints. For timeseries powerbi is ... not optimal. Earlier I have used grafana with various backends, and quite like it, but I cant find any way to con...

  • 19826 Views
  • 13 replies
  • 8 kudos
Latest Reply
frugson
New Contributor II
  • 8 kudos

@Erik wrote:We have some timeseries in databricks, and we are reading them into powerbi through sql compute endpoints. For timeseries powerbi is ... not optimal. Earlier I have used grafana with various backends, and quite like it, but I cant find an...

  • 8 kudos
12 More Replies
ChristianRRL
by Honored Contributor
  • 2289 Views
  • 6 replies
  • 5 kudos

Resolved! Can schemaHints dynamically handle nested json structures?

Hi there, as I'm learning more about schemaHints, it seems like an incredibly useful way to unpack some of my json data. However, I've hit what is either a limitation of schemaHints or of my understanding of how to use it properly.Below I have an exa...

ChristianRRL_0-1755017647152.png ChristianRRL_2-1755018810157.png ChristianRRL_3-1755019651215.png ChristianRRL_6-1755020076906.png
  • 2289 Views
  • 6 replies
  • 5 kudos
Latest Reply
boitumelodikoko
Valued Contributor
  • 5 kudos

Hi @ChristianRRL,Would you be able to share a small sample of your JSON file (with any sensitive data removed)? That way, I can try to replicate your use case and see if we can get schemaHints working across multiple nested fields without losing data...

  • 5 kudos
5 More Replies
ebyhr
by New Contributor II
  • 12495 Views
  • 8 replies
  • 3 kudos

How to fix intermittent 503 errors in 10.4 LTS

I sometimes get the below error recently in version 10.4 LTS. Any solution to fix the intermittent failure? I added retry logic in our code, but Databricks query succeeded (even though it threw an exception) and it leads to the unexpected table statu...

  • 12495 Views
  • 8 replies
  • 3 kudos
Latest Reply
niteesh
New Contributor II
  • 3 kudos

Facing the same issue now. Were you able to find a fix?

  • 3 kudos
7 More Replies
prakashhinduja2
by New Contributor
  • 724 Views
  • 2 replies
  • 1 kudos

Prakash Hinduja ~ How do I create an empty DataFrame in Databricks—are there multiple ways?

Hello, I'm Prakash Hinduja, an Indian-born financial advisor and consultant based in Geneva, Switzerland (Swiss). My career is focused on guiding high-net-worth individuals and business leaders through the intricate world of global investment and wea...

  • 724 Views
  • 2 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

Best Practices from Experience:Use predefined schema if you know your column types upfront—prevents errors when appending new data.For ad-hoc exploration, toDF or createDataFrame([], None) works fine.Always check printSchema()—it helps avoid silent t...

  • 1 kudos
1 More Replies
tonylax6
by New Contributor
  • 5186 Views
  • 1 replies
  • 0 kudos

Azure Databricks to Adobe Experience Platform

I'm using Azure databricks and am attempting to stream near real-time data from databricks into the Adobe Experience Platform to ingest into the AEP schema for profile enrichment.We are running into an issue with the API and streaming, so we are curr...

Data Engineering
Adobe
Adobe Experience Platform
CDP integration
  • 5186 Views
  • 1 replies
  • 0 kudos
Latest Reply
tsilverstrim
New Contributor II
  • 0 kudos

Hi Tony... there are several ways to accomplish this based on the non-functional requirements of your Use Case.  What does near real-time mean from a standpoint of signal to activation time from when the data is present in Databricks to when the acti...

  • 0 kudos
slimbnsalah
by New Contributor II
  • 2127 Views
  • 2 replies
  • 0 kudos

Use Salesforce Lakeflow Connector with a Salesforce Connected App

Hello, I'm trying to use the new Salesforce Lakeflow connector to ingest data into my Databricks account.However I see only the option to connect using a normal user, whereas I want to use a Salesforce App, just like how it is described here Run fede...

  • 2127 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 0 kudos

@slimbnsalah Please select Connection type as of Salesforce Data Cloud then you will be asked for details 

  • 0 kudos
1 More Replies
ManojkMohan
by Honored Contributor II
  • 880 Views
  • 4 replies
  • 2 kudos

Resolved! Silver to Gold Layer | Running ML - Debug Help Needed

Problem I am solving:Reads the raw sports data  IPL CSV → bronze layerCleans and aggregates → silver layerSummarizes team stats → gold layerPrepares ML-ready features and trains a Random Forest classifier to predict match winners Getting error: [PARS...

ManojkMohan_0-1756389913835.png
  • 880 Views
  • 4 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 2 kudos

@ManojkMohan thanks for sharing this, I'm looking at starting an ML project in the coming weeks, I might have to bring this forward . Feeling motivated with that confusion matrix in your output .Congrats on getting it working!All the best,BS

  • 2 kudos
3 More Replies
Srinivas5
by New Contributor II
  • 837 Views
  • 6 replies
  • 3 kudos

Jar File Upload To Workspace

Spoiler  #dbfsI am unable to upload jar file dbfs to job cluster as it's deprecated now I need to upload it to workspace and install it to cluster, hower my jar size is 70mb i can't upload it through api or cli as max size is 50mb. Is there alternati...

  • 837 Views
  • 6 replies
  • 3 kudos
Latest Reply
Advika
Community Manager
  • 3 kudos

Hi @Srinivas5! Were you able to find a solution or approach that worked? If so, please mark the helpful reply as the Accepted Solution, or share your approach so others can benefit as well.

  • 3 kudos
5 More Replies
ShankarM
by Contributor
  • 430 Views
  • 2 replies
  • 0 kudos

Notebook exposure

i have created a notebook as per client requirement. I have to migrate the notebook in the client env for testing with live data but do not want to expose the Databricks notebook code to the testers in the client env.Is there a way to package the not...

  • 430 Views
  • 2 replies
  • 0 kudos
Latest Reply
WiliamRosa
Contributor III
  • 0 kudos

Hi @ShankarM,I’ve had to do something similar—packaging a Python class as a wheel. This documentation might help: https://docs.databricks.com/aws/en/dev-tools/bundles/python-wheel

  • 0 kudos
1 More Replies
DatabricksEngi1
by Contributor
  • 1173 Views
  • 2 replies
  • 1 kudos

Resolved! databricks assets bundles issue

Hii all,I’m working with Databricks Asset Bundles (DAB) and trying to move from a single repository-level bundle to a structure where each workflow (folder under resources/jobs) has its own bundle.• My repository contains:• Shared src/variables.yml a...

  • 1173 Views
  • 2 replies
  • 1 kudos
Latest Reply
DatabricksEngi1
Contributor
  • 1 kudos

I solved it.For some reason, the Terraform folder created under the bundles wasn’t set up correctly.I copied it from a working bundle, and everything completed successfully.

  • 1 kudos
1 More Replies
JPNP
by New Contributor
  • 961 Views
  • 3 replies
  • 1 kudos

Not able to creare Secret scope in Azure databricks

Hello,I am trying to create the  Azure Key Vault-backed secret scope, but it failing with the below error, I have tried to clear the cache, and logged out , used incognito browser as well but not able to create a scope. Can you please help here ? 

JPNP_0-1755692310711.jpeg
  • 961 Views
  • 3 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 1 kudos

If the UI keeps failing with that vague error, the CLI approach suggested above is the best next step, since it usually gives a clearer error message. Also make sure that:The service principal you’re using to create the scope has Key Vault Administra...

  • 1 kudos
2 More Replies
jar
by Contributor
  • 385 Views
  • 1 replies
  • 0 kudos

Excluding job update from DAB .yml deployment

Hi.We have a range of scheduled jobs and _one_ continuous job all defined in .yml and deployed with DAB. The continuous job is paused per default and we use a scheduled job of a notebook to pause and unpause it so that it only runs during business ho...

  • 385 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 0 kudos

You’re running into this because DAB treats the YAML definition as the source of truth — so every time you redeploy, it will reset the job state (including the paused/running status) back to what’s defined in the file. Unfortunately, there isn’t curr...

  • 0 kudos
karthik_p
by Esteemed Contributor
  • 15984 Views
  • 5 replies
  • 1 kudos

does delta live tables supports identity columns

we are able to test identity columns using sql/python, but when we are trying same using DLT, we are not seeing values under identity column. it is always empty for coloumn we created "id BIGINT GENERATED ALWAYS AS IDENTITY" 

  • 15984 Views
  • 5 replies
  • 1 kudos
Latest Reply
Gowrish
New Contributor II
  • 1 kudos

Hi,i see from the following databricks documentaion - https://docs.databricks.com/aws/en/dlt/limitationsit states the following which kind of giving an impression that you can define identity column to a steaming table Identity columns might be recom...

  • 1 kudos
4 More Replies
mtreigelman
by New Contributor III
  • 563 Views
  • 1 replies
  • 3 kudos

First Lakeflow (DLT) Pipeline Best Practice Question

Hi, I am writing my first streaming pipeline and trying to ensure it is setup to work as a "Lakeflow" pipeline.  It is connecting an external Oracle database with some external Azure Blob storage data (all managed in the same Unity Catalog). The pipe...

  • 563 Views
  • 1 replies
  • 3 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 3 kudos

@mtreigelmanthanks for providing the update. If you wouldn't mind, could you explain why you think the first way didn't work and why the second way did? Then you can mark your response as a solution to the question .I found this article to be useful ...

  • 3 kudos
ck7007
by Contributor II
  • 666 Views
  • 1 replies
  • 2 kudos

Cost

Reduced Monthly Databricks Bill from $47K to $12.7KThe Problem: We were scanning 2.3TB for queries needing only 8GB of data.Three Quick Wins1. Multi-dimensional Partitioning (30% savings)# Beforedf.write.partitionBy("date").parquet(path)# After-parti...

  • 666 Views
  • 1 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 2 kudos

@ck7007 thanks so much for sharing! That's such a saving, by the way. Congrats.Out of curiosity, did you consider using Liquid Clustering which was meant to replace partitioning and z-order: https://docs.databricks.com/aws/en/delta/clustering I found...

  • 2 kudos
Labels