cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AntonDBUser
by New Contributor III
  • 1792 Views
  • 1 replies
  • 2 kudos

Lakehouse Federation with OAuth connection to Snowflake

Hi!We have a lot use cases were we need to load data from Snowflake into Databricks, where users are using both R and Python for further analysis and machine learning. For this we have been using Lakehouse Federation combined with basic auth, but are...

  • 1792 Views
  • 1 replies
  • 2 kudos
Latest Reply
AntonDBUser
New Contributor III
  • 2 kudos

For anyone interested: We solved this by building an OAuth integration to Snowflake ourselfs using Entra ID: https://community.snowflake.com/s/article/External-oAuth-Token-Generation-using-Azure-ADWe also created some simple Python and R-packages tha...

  • 2 kudos
JonHMDavis
by New Contributor II
  • 7366 Views
  • 5 replies
  • 2 kudos

Graphframes not importing on Databricks 9.1 LTS ML

Is Graphframes for python meant to be installed by default on Databricks 9.1 LTS ML? Previously I was running the attached python command on 7.3 LTS ML with no issue, however now I am getting "no module named graphframes" when trying to import the pa...

  • 7366 Views
  • 5 replies
  • 2 kudos
Latest Reply
malz
Databricks Partner
  • 2 kudos

Hi @MuthuLakshmi ,  As per the documentation it was mentioned that graphframes comes preinstalled in databricks runtime for machine learning. but when trying to import the python module of graphframes, getting no module found error.from graphframes i...

  • 2 kudos
4 More Replies
naveenreddy1
by New Contributor II
  • 20330 Views
  • 4 replies
  • 0 kudos

Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace

We are using the databricks 3 node cluster with 32 GB memory. It is working fine but some times it automatically throwing the error: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

  • 20330 Views
  • 4 replies
  • 0 kudos
Latest Reply
RodrigoDe_Freit
Databricks Partner
  • 0 kudos

If your job fails follow this:According to https://docs.databricks.com/jobs.html#jar-job-tips: "Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and ma...

  • 0 kudos
3 More Replies
ArturOA
by New Contributor III
  • 6256 Views
  • 7 replies
  • 0 kudos

Attaching to Serverless from Azure Data Factory via Service Principal

Hi,We have issues trying to run Databricks notebooks orchestrated with Azure Data Factory. We have been doing this for a while now without any issues when we use Job Clusters, Existing General Purpose Clusters, or Cluster Pools. We use an Azure Data ...

ArturOA_0-1729677593083.png
  • 6256 Views
  • 7 replies
  • 0 kudos
Latest Reply
h_h_ak
Contributor
  • 0 kudos

Does the service principal has access and permission for the notebook?

  • 0 kudos
6 More Replies
HamidHamid_Mora
by New Contributor II
  • 5224 Views
  • 4 replies
  • 3 kudos

ganglia is unavailable on DBR 13.0

We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...

  • 5224 Views
  • 4 replies
  • 3 kudos
Latest Reply
h_h_ak
Contributor
  • 3 kudos

You should have a look here: https://community.databricks.com/t5/data-engineering/azure-databricks-metrics-to-prometheus/td-p/71569

  • 3 kudos
3 More Replies
amanda3
by New Contributor II
  • 1507 Views
  • 3 replies
  • 0 kudos

Flattening JSON while also keep embedded types

I'm attempting to create DLT tables from a source table that includes an "data" column that is a JSON string. I'm doing something like this: sales_schema = StructType([ StructField("customer_id", IntegerType(), True), StructField("order_numbers",...

  • 1507 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

To ensure that the "value" field retains its integer type, you can explicitly cast it after parsing the JSON. from pyspark.sql.functions import col, from_json, expr from pyspark.sql.types import StructType, StructField, IntegerType, ArrayType, LongTy...

  • 0 kudos
2 More Replies
xhudik
by New Contributor III
  • 1727 Views
  • 1 replies
  • 1 kudos

Resolved! does stream.stop() generates "ERROR: Query termination received for []" automatically?

Whenever code contains stream.stop() in STDERR (in cluster logs) I get an error like:ERROR: Query termination received for [id=b7e14d07-f8ad-4ae6-99de-8a7cbba89d86, runId=5c01fd71-2d93-48ca-a53c-5f46fab726ff]No other message, even if I try to try-cat...

  • 1727 Views
  • 1 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 1 kudos

@xhudik does stream.stop() generates "ERROR: Query termination received for []" automatically?Yes, this is generated when there is stream.stop() in stdderIs ERROR: Query termination received for [] dangerous, or it is just ans info stream was closed?...

  • 1 kudos
roberta_cereda
by New Contributor
  • 1318 Views
  • 1 replies
  • 0 kudos

Describe history operationMetrics['materializeSourceTimeMs']

Hi, during some checks on MERGE execution , I was running the describe history command and in the operationMetrics column I noticed this information :  operationMetrics['materializeSourceTimeMs'] .I haven't found that metric in the documentation so I...

  • 1318 Views
  • 1 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@roberta_cereda  If it’s specific to “materializeSourceTimeMs” then it’s “time taken to materialize source (or determine it's not needed)”

  • 0 kudos
pranav_k1
by New Contributor III
  • 2905 Views
  • 3 replies
  • 1 kudos

Resolved! Error while loading mosaic in notebook - TimeoutException: Futures timed out after [80 seconds]

I am working on reading spatial data with mosaic and gdal Previously I used databricks mosaic = 0.3.9 version with databricks cluster = 12.2 LTS version With following command - %pip install databricks-mosaic==0.3.9 --quiet Now It's giving timeout er...

  • 2905 Views
  • 3 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @pranav_k1,Thanks for confirming it worked for you now!I see that the usual %pip install databricks-mosaic cannot install due to the fact that it has thus far allowed geopandas to essentially install the latest... As of geopandas==0.14.4, the vers...

  • 1 kudos
2 More Replies
DmitriyLamzin
by New Contributor II
  • 5474 Views
  • 2 replies
  • 1 kudos

applyInPandas function hangs in runtime 13.3 LTS ML and above

Hello, recently I've tried to upgrade my runtime env to the 13.3 LTS ML and found that it breaks my workload during applyInPandas.My job started to hang during the applyInPandas execution. Thread dump shows that it hangs on direct memory allocation: ...

Data Engineering
pandas udf
  • 5474 Views
  • 2 replies
  • 1 kudos
Latest Reply
Marcin_Milewski
New Contributor II
  • 1 kudos

Hi @Debayan the link just redirects to the same thread? Is there any update on this issue?We share some similar issue on job hanging using mapInPandas.   

  • 1 kudos
1 More Replies
Sanjeev
by New Contributor II
  • 2221 Views
  • 3 replies
  • 1 kudos

Unverified Commits via Databricks Repos: Seeking Solution for GitHub Verification

The team is integrating Databricks Repos with Personal Access Tokens (PAT) to commit code directly to GitHub. Our organization requires signed commits for verification purposes.Issue: When committing via Databricks Repos, the commits appear as unveri...

Data Engineering
data engineering
  • 2221 Views
  • 3 replies
  • 1 kudos
Latest Reply
Sanjeev
New Contributor II
  • 1 kudos

Can you please share the link to this doc DB-I-3082. I couldn't find it.

  • 1 kudos
2 More Replies
Danny_Lee
by Databricks Partner
  • 1393 Views
  • 1 replies
  • 0 kudos

UI improvement - open multiple workspace notebooks

Hi all,I have an idea for a feature to open multiple workbooks.  Currently, right-clicking a notebook in the Workspace will allow you to "Open in new tab".  If I multi-select notebooks, I only have option to Move or Move to trash.  Why not allow a us...

  • 1393 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Many thanks for your feedback and great idea. We have created idea DBE-I-1544, this will be analyzed by our team and if approved it can be implemented in the near future.

  • 0 kudos
MOlliver
by Databricks Partner
  • 8622 Views
  • 1 replies
  • 0 kudos

DBT or Delta Live Tables

Quick question, when would people use DBT over Delta Live Tables? Or better yet can you use DBT to create Delta Live Tables?

  • 8622 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Delta Live Tables (DLT): DLT is an ETL (Extract, Transform, Load) framework designed to simplify the creation and management of data pipelines. It uses a declarative approach to build reliable data pipelines and automatically manages infrastructure a...

  • 0 kudos
Vishwanath_Rao
by New Contributor II
  • 2557 Views
  • 2 replies
  • 0 kudos

Photon plan invariant violated Error

We've run into a niche error where we get the below message only on our non prod environment, with the same data, with the same code as our prod environment.org.apache.spark.sql.execution.adaptive.InvalidAQEPlanException: Photon plan invariant violat...

  • 2557 Views
  • 2 replies
  • 0 kudos
Latest Reply
JAC2703
New Contributor II
  • 0 kudos

Hey, did you raise a ticket and get a resolution to this? 

  • 0 kudos
1 More Replies
felix_immanuel
by New Contributor III
  • 6384 Views
  • 4 replies
  • 2 kudos

Resolved! Error while Deploying Asset Bundle using Azure Devops

Hi,I'm trying to deploy the Asset Bundle using Azure DevOps, it is giving me this error Step: databricks bundle validate -t dev========================== Starting Command Output ===========================2024-09-02T05:41:19.9113254Z Error: failed du...

  • 6384 Views
  • 4 replies
  • 2 kudos
Latest Reply
sampo
New Contributor II
  • 2 kudos

I had similar error message but then using correct environment variables in the pipeline solved the problem. Especially setting DATABRICKS_HOST point to the account. More detailed description is here Databricks Asset Bundle OAuth Authentication in Az...

  • 2 kudos
3 More Replies
Labels