cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

der
by Contributor
  • 37 Views
  • 8 replies
  • 0 kudos

Rasterio on shared/standard cluster has no access to proj.db

We try to use rasterio on a Databricks shared/standard cluster with DBR 17.1. Rasterio is directly installed on the cluster as library. Code:import rasterio rasterio.show_versions()Output: rasterio info:rasterio: 1.4.3GDAL: 3.9.3PROJ: 9.4.1GEOS: 3.11...

  • 37 Views
  • 8 replies
  • 0 kudos
Latest Reply
Chiran-Gajula
New Contributor
  • 0 kudos

Hi @der Can you try adding this in your test script.import osos.environ["PROJ_LIB"]="/databricks/native/proj-data"Hope users have access to this path /databricks/native/proj-data 

  • 0 kudos
7 More Replies
maninegi05
by New Contributor
  • 225 Views
  • 3 replies
  • 0 kudos

DLT Pipeline Stopped working

Hello, Suddenly our DLT pipelines we're getting failures saying thatLookupError: Traceback (most recent call last): result_df = result_df.withColumn("input_file_path", col("_metadata.file_path")).withColumn( ...

  • 225 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @maninegi05 , I did some digging internally and I believe some recent changes to the DLT image may be to blame. We are aware of regression issue and are actively working to address them. TL/DR Why you might see “LookupError: ContextVar 'par...

  • 0 kudos
2 More Replies
devpavan
by New Contributor
  • 97 Views
  • 7 replies
  • 0 kudos

Encountering an error while setting up a single-node cluster on top of aws

Hi Team,I'm trying to create a single-node cluster in Databricks on AWS, but I'm encountering an error. Could you please assist me with this?{ "reason": { "code": "INVALID_ARGUMENT", "type": "CLIENT_ERROR", "parameters": { "databr...

  • 97 Views
  • 7 replies
  • 0 kudos
Latest Reply
nayan_wylde
Honored Contributor III
  • 0 kudos

@devpavan Are you using API or terraform to create. Can you please share the json config that you are passing?

  • 0 kudos
6 More Replies
aravind-ey
by New Contributor II
  • 19373 Views
  • 22 replies
  • 4 kudos

vocareum lab access

Hi I am doing a data engineering course in databricks(Partner labs) and would like to have access to vocareum workspace to practice using the demo sessions.can you please help me to get the access to this workspace?regards,Aravind

  • 19373 Views
  • 22 replies
  • 4 kudos
Latest Reply
Eicke
Visitor
  • 4 kudos

You can log into databricks, search for "Canada Sales" in the Marketplace and find "Simulated Canada Sales and Opportunities Data". Get free instant access, wait a few seconds for the warehouse to be built for you et voila: the tables for building th...

  • 4 kudos
21 More Replies
lezwon
by Contributor
  • 2316 Views
  • 1 replies
  • 1 kudos

Cant view DAB deployed pipelines in Databricks UI

I am using the databricks asset pipeline to version control the jobs and pipelines in my workspace. I recently pulled these pipelines from the workspace using the `databricks bundle generate pipeline` command and deployed them back using `databricks ...

  • 2316 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @lezwon    Thanks for the details and screenshots—this looks like a permissions/ownership issue with your newly deployed Delta Live Tables pipelines.   What’s going on Pipelines run under the pipeline owner’s identity (Databricks recommends a ser...

  • 1 kudos
mahfooz_iiitian
by New Contributor III
  • 23 Views
  • 3 replies
  • 0 kudos

databricks serverless cluster and poetry private repository

Currently we are evaluating the databricks serverless. It support public repository in poetry as dependency path but it is not supporting private repository as we are not sure whether put the credentials details regarding privare repository.

  • 23 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @mahfooz_iiitian ,Databricks supports private repositories only for Notebook-scoped libraries.In serverless you can use do it using pip install (of course store you token in a safe palce):Notebook-scoped Python libraries - Azure Databricks | Micro...

  • 0 kudos
2 More Replies
VaDim
by New Contributor III
  • 1763 Views
  • 2 replies
  • 1 kudos

ModuleNotFound error when using transformWithStateInPandas via a class defined outside the notebook

As per Databricks documentation when I define the class that extends `StatefulProcessor` in a Notebook everything works ok however, execution fails with ModuleNotFound error as soon as the class definition is moved to a file (module) of it's own in a...

Data Engineering
transformWithState
  • 1763 Views
  • 2 replies
  • 1 kudos
Latest Reply
VaDim
New Contributor III
  • 1 kudos

This is no longer an issue; it must be some patch version of DBX Runtime 16.4 fixed it and it works now without doing any changes to original code.Thanks.

  • 1 kudos
1 More Replies
smoortema
by New Contributor III
  • 42 Views
  • 5 replies
  • 2 kudos

Resolved! when automatic liquid clustering is enabled, how to know which columns are used for clustering?

Let's say a table is configured to have automatic liquid clustering:ALTER TABLE table1 CLUSTER BY AUTO; How to know which columns were chosen by Databricks?

  • 42 Views
  • 5 replies
  • 2 kudos
Latest Reply
smoortema
New Contributor III
  • 2 kudos

From the documentation, it seems that in Python, there is such an option, only when creating or replacing a table.# To set clustering columns and auto, which serves as a way to give a hint # for the initial selection. df.writeTo(...).using("delta") ...

  • 2 kudos
4 More Replies
vamsi_simbus
by New Contributor III
  • 60 Views
  • 1 replies
  • 0 kudos

Migrating Talend ETL Jobs to Databricks – Best Practices & Challenges

Hi All,I’m currently working on a Proof of Concept (POC) to migrate existing Talend ETL jobs to Databricks. The goal is to leverage Databricks for data processing and orchestration while moving away from Talend.I’d appreciate insights on the followin...

Data Engineering
migration
Talend
  • 60 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
New Contributor
  • 0 kudos

Migrating from Talend’s jobs to Databricks requires rebuilding ETL logic using Spark. There is no native one-click converter provided by Databricks, so the typical approach is to audit and refactor each Talend job into code (PySpark or Spark SQL) in ...

  • 0 kudos
hf-databricks
by Visitor
  • 20 Views
  • 0 replies
  • 0 kudos

Unable to create workspace

Hi Team,we have challenge creating workspace in data bricks account created on top of aws.below are the details:Databricks account name : saichaitanya.vaddadhi@healthfirsttech.com's LakehouseAWS Account id : 720016114009databricks id: 1ee8765f-b472-4...

  • 20 Views
  • 0 replies
  • 0 kudos
tarunnagpal
by New Contributor III
  • 1033 Views
  • 7 replies
  • 3 kudos

Lakebridge questions

We have a few questions before we propose Lakebridge as the migration tooling for one of our customers, where the requirement is to migrate from Redshift to Databricks. We need help with your quick response so we can proceed with the next steps:Our u...

  • 1033 Views
  • 7 replies
  • 3 kudos
Latest Reply
sky_bricks
Visitor
  • 3 kudos

Hi community,We’re currently planning a migration from an on-premise SQL Server data warehouse (with associated SSIS packages) to Databricks Unity Catalog. As part of this effort, we’re evaluating the use of Lakebridge for assessment, conversion, and...

  • 3 kudos
6 More Replies
Jonathan_
by New Contributor III
  • 355 Views
  • 7 replies
  • 6 kudos

Slow PySpark operations after long DAG that contains many joins and transformations

We are using PySpark and notice that when we are doing many transformations/aggregations/joins of the data then at some point the execution time of simple task (count, display, union of 2 tables, ...) become very slow even if we have a small data (ex...

  • 355 Views
  • 7 replies
  • 6 kudos
Latest Reply
Jonathan_
New Contributor III
  • 6 kudos

It's a cluster with 128 GO of memory, when looking in Spark UI there is 54 GO for storage memory. Honestly I don't think it's memory issue like I said it's a small data and if we do checkpoint at same point then continu we don't have the problem afte...

  • 6 kudos
6 More Replies
rajg
by New Contributor
  • 128 Views
  • 1 replies
  • 1 kudos

Cannot export embedded dashboard widget as CSV or other formats except PNG

I’ve integrated a Databricks dashboard into my web application for all my users, following the guidelines in this article:Embedding Databricks Dashboards.This integration worked perfectly initially. However, I’m now encountering an issue with exporti...

rajg_0-1760793195625.png rajg_1-1760793235941.png rajg_2-1760793253909.png
  • 128 Views
  • 1 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

Hi @rajg , Based on the link you shared, it looks to me like you have an external embedding situation? If so, this is a feature that is not currently available, but it is a commonly requested feature. External dashboard embedding is currently in Publ...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels