cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pepco
by New Contributor II
  • 11 Views
  • 1 replies
  • 0 kudos

DAB git - sometimes doesn't see modules

We are using DABs to deploy our jobs. DABs have source set to git branch or git tag depending on the environment.  Repository is structured in mono repo fashion. We don't use wheels for our modules. Sometimes when the jobs run they "randomly" fail th...

  • 11 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Honored Contributor III
  • 0 kudos

@pepco Would you mind sharing your DAB yaml (hiding secrets)?

  • 0 kudos
Areqio
by New Contributor
  • 306 Views
  • 2 replies
  • 1 kudos

How to Stream Azure event hub to databricks delta table

I am trying to stream my IoT data from azure event hub to databricks. Im running Databricks runtime 17.3 LTS with scala 2.13. 

  • 306 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hi @Areqio, +1 to what @balajij8 suggested about using Lakeflow Declarative Pipelines as the simplest, supported way to land Azure Event Hubs IoT data into Delta. Lakeflow Spark declarative pipelines are built on top of Structured Streaming, so you g...

  • 1 kudos
1 More Replies
aonurdemir
by Contributor
  • 159 Views
  • 2 replies
  • 1 kudos

Liquid Clustering file pruning breaks when filtering on a high NULL numeric column in dataSkipping

EnvironmentCloud: AWSCompute: ServerlessTable: a_big_tableTable type: Streaming Table (SDP pipeline)Table size: 641 GB, 6,210 filesLiquid Clustering columns: [event_time, integer_userId]delta.dataSkippingStatsColumns:event_time, integer_userId, integ...

  • 159 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hello @aonurdemir , I looked into your query and have compiled some helpful tips: I don't have direct access to your workspace internals, so I can't prove this definitively. But what you're seeing is consistent with how Delta's stats-based data skipp...

  • 1 kudos
1 More Replies
staskh
by Contributor
  • 72 Views
  • 1 replies
  • 1 kudos

Delta update/insert from multiple source tables

[Sorry for a novice question.]I have multiple tables periodically updated from external sources (including insert, update, or delete). I need to update a target table, which is an outer join from multiple source tables without rewriting it each time....

  • 72 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @staskh , I did some digging and compiled my thoughts regarding your question. Building a Daily Gold Table from Delta Sources Treat this as a standard Gold table built daily from Delta sources. Start simple. Add incremental tricks only when...

  • 1 kudos
AlexSantiago
by New Contributor II
  • 17499 Views
  • 25 replies
  • 4 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 17499 Views
  • 25 replies
  • 4 kudos
Latest Reply
ElevateNew
New Contributor
  • 4 kudos

In this context, Elevate New is relevant as a digital content platform that covers technology trends, online platforms, software ecosystems, and   modern internet-based solutions. As developers and tech communities continue discussing APIs, cloud ser...

  • 4 kudos
24 More Replies
FantineM
by New Contributor
  • 264 Views
  • 4 replies
  • 3 kudos

Resolved! Vector index not syncing: DELTA_UNSUPPORTED_TIME_TRAVEL_BEYOND_DELETED_FILE_RETENTION_DURATION

Hi All,Lately I have had issues with my vector search index not syncing.The associated pipeline fails to create with error:failed to resolve flow: '__online_index_view'. com.databricks.sql.transaction.tahoe.DeltaAnalysisException: [DELTA_UNSUPPORTED_...

  • 264 Views
  • 4 replies
  • 3 kudos
Latest Reply
FantineM
New Contributor
  • 3 kudos

Thanks again for the kind help!

  • 3 kudos
3 More Replies
Kirankumarbs
by Contributor III
  • 879 Views
  • 4 replies
  • 2 kudos

Resolved! Serverless notebook idle timeout — is it configurable? What exactly am I paying for? Really Ambiguos

Been running notebooks on serverless compute and watching the indicator in the UI. After my last cell finishes, it goes from dark green to this fading green, sits there for maybe 5-10 minutes, then finally goes grey. Pretty sure I'm paying for that e...

  • 879 Views
  • 4 replies
  • 2 kudos
Latest Reply
hali
New Contributor
  • 2 kudos

I have the same concern and feedback as OP. I wish there's a way to set auto-terminate after the serverless cluster has been idle for X minutes and not be billed if our users left their notebooks attached to serverless compute and forgot to hit "term...

  • 2 kudos
3 More Replies
MrJava
by New Contributor III
  • 21054 Views
  • 18 replies
  • 13 kudos

How to know, who started a job run?

Hi there!We have different jobs/workflows configured in our Databricks workspace running on AWS and would like to know who actually started the job run? Are they started by a user or a service principle using curl?Currently one can only see, who is t...

  • 21054 Views
  • 18 replies
  • 13 kudos
Latest Reply
saibabu
New Contributor
  • 13 kudos

Any update on this feature ?

  • 13 kudos
17 More Replies
GJ2
by New Contributor II
  • 20754 Views
  • 15 replies
  • 2 kudos

Install the ODBC Driver 17 for SQL Server

Hi,I am not a Data Engineer, I want to connect to ssas. It looks like it can be connected through pyodbc. however looks like  I need to install "ODBC Driver 17 for SQL Server" using the following command. How do i install the driver on the cluster an...

GJ2_1-1739798450883.png
  • 20754 Views
  • 15 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

As SQL Server is included in the Lakehouse federation driver, it is built in databricks. Install only in case you need a different version - the built-in one is not working

  • 2 kudos
14 More Replies
maze2498
by New Contributor
  • 92 Views
  • 1 replies
  • 0 kudos

Issue Genie Benchmark: Different responses in UI and Benchmark

Hello, I am trying to add a benchmark dataset for my genie space.When I ask the a question on the Genie space UI directly, I get the right output. However when I add the same question in the genie benchmark, the result is quite bad and the sql it use...

  • 92 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, when you say the sql it generates is quite bad and missing, do you mean when you run the benchmark? The benchmark purposefully doesn't have any conversation history unlike the Genie Space. So sometimes the results can vary. Ie if you've asked a l...

  • 0 kudos
Subhas1729
by New Contributor
  • 99 Views
  • 1 replies
  • 1 kudos

how to access the catalog and schema from my program

Hi     I am using the SDP editor. I have set the catalog and schema in setting. how to access those variables values in my program. I am doing as follows:  catalog = spark.conf.get("catalog") and it is similar for schema. When I try to use those vari...

  • 99 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Subhas1729 ,Default location for data assets section of the pipeline configuration UI sets the default catalog and schema for a pipeline. This default catalog and schema are used for all dataset definitions and table reads, unless overridden with...

  • 1 kudos
Dhruv-22
by Contributor III
  • 91 Views
  • 1 replies
  • 0 kudos

ProfilingError: SPARK_ERROR. Spark encountered an error while refreshing metrics.

I've a table with the following profiling settings{ "status": "MONITOR_STATUS_ACTIVE", "profile_metrics_table_name": "edw_prd_aen.silver.fct_retail_permit_profile_metrics", "drift_metrics_table_name": "edw_prd_aen.silver.fct_retail_permit_drift_me...

Dhruv22_0-1777435414693.png
  • 91 Views
  • 1 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

Hi @Dhruv-22 , This is a known limitation. Data Profiling monitors don't auto-adapt when columns are added to the source table, the fix is to delete and recreate the monitor. When the monitor is created, the profiling job captures the source schema a...

  • 0 kudos
Lewis
by New Contributor
  • 178 Views
  • 2 replies
  • 2 kudos

Resolved! Server Error: Invalid Request URL

Hello,Had this pop up a few times this week when trying to run notebooks from within another notebook. It has been quite inconsistent, as some of the referenced notebooks will work but then for one this will pop up (and the one that doesn't work vari...

Lewis_0-1777454480192.png
  • 178 Views
  • 2 replies
  • 2 kudos
Latest Reply
Lewis
New Contributor
  • 2 kudos

Thank you

  • 2 kudos
1 More Replies
MikeGo
by Valued Contributor
  • 159 Views
  • 1 replies
  • 1 kudos

Resolved! Genie space model selection

Hi team, is it possible to specify models for genie space? Thanks.

  • 159 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi  @MikeGo ,Short answer: no, you cannot select the LLM for a native Genie Space - the model is managed entirely by Databricks.Genie uses a compound AI system to interpret business questions and generate answers. Instead of using a single large lang...

  • 1 kudos
Labels