cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

leenack
by Visitor
  • 26 Views
  • 5 replies
  • 1 kudos

No rows returned when calling Databricks procedure via .NET API and Simba ODBC driver

I created a simple Databricks procedure that should return a single value."SELECT 1 AS result;"When I call this procedure from my .NET API using ExecuteReader, ExecuteAdapter, or ExecuteScalar, the call completes without any errors, but no rows are r...

  • 26 Views
  • 5 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor
  • 1 kudos

Tested in my environments and it is not possible indeed to get results despite being executed correctly in SQL Warehouses. I tried with multiple options but no luck. However, as a workaround, I would recommend to replace or partially replace part of ...

  • 1 kudos
4 More Replies
intelliconnectq
by New Contributor II
  • 13 Views
  • 1 replies
  • 0 kudos

Loading CSV from private S3 bucket

Trying to load a csv file from a private S3 bucketplease clarify requirements to do this- Can I do it in community edition (if yes then how)? How to do it in premium version?I have IAM role and I also access key & secret 

  • 13 Views
  • 1 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor
  • 0 kudos

Assuming you have these pre-requisites: A private S3 bucket (e.g., s3://my-private-bucket/data/file.csv)An IAM user or role with access (list/get) to that bucketThe AWS Access Key ID and Secret Access Key (client and secret)The most straightforward w...

  • 0 kudos
RajaPalukuri
by New Contributor II
  • 3653 Views
  • 4 replies
  • 0 kudos

Databricks -Terraform- (condition_task)

Hi Team ,I am planning to create IF/ELSE condition task in databricks using terraform code . My requirement is Task A ( Extract records from DB and Count recs) --> Task B ( validate the counts using Condition_task) --> Task c ( load data if Task B va...

  • 3653 Views
  • 4 replies
  • 0 kudos
Latest Reply
jackqasim
Visitor
  • 0 kudos

The Truecaller app mod improves call management, while Databricks with Terraform (condition_task) helps automate cloud infrastructure efficiently.

  • 0 kudos
3 More Replies
kahrees
by Visitor
  • 26 Views
  • 1 replies
  • 0 kudos

DATA_SOURCE_NOT_FOUND Error with MongoDB (Suggestions in other similar posts have not worked)

I am trying to load data from MongoDB into Spark. I am using the Community/Free version of DataBricks so my Jupiter Notebook is in a Chrome browser.Here is my code:from pyspark.sql import SparkSession spark = SparkSession.builder \ .config("spar...

  • 26 Views
  • 1 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Hey @kahrees , Good Day! I tested this internally, and I was able to reproduce the issue. Screenshot below:   You’re getting [DATA_SOURCE_NOT_FOUND] ... mongodb because the MongoDB Spark connector jar isn’t actually on your cluster’s classpath. On D...

  • 0 kudos
turagittech
by Contributor
  • 1629 Views
  • 3 replies
  • 1 kudos

Resolved! Schema updating with CI/CD development in SQL

Hi all,I am working to resolve how to build tables in a development workspace catalog and then easily migrate the code to a production catalog without manually altering the schema name. For those unaware, you can't have the same catalog names in deve...

  • 1629 Views
  • 3 replies
  • 1 kudos
Latest Reply
evanc
Visitor
  • 1 kudos

I will like to see an example from databricks that how will alembic and databricks works together, especially with schema evo. I think once schema got changed by schema evo, the alembic version not valid anymore. Wondering how to handle it properly. ...

  • 1 kudos
2 More Replies
petitregny
by New Contributor II
  • 2791 Views
  • 5 replies
  • 2 kudos

Reading from an S3 bucket using boto3 on serverless cluster

Hello All,I am trying to read a CSV file from my S3 bucket in a notebook running on serverless.I am using the two standard functions below, but I get a credentials error (Error reading CSV from S3: Unable to locate credentials).I don't have this issu...

  • 2791 Views
  • 5 replies
  • 2 kudos
Latest Reply
Ramana
Valued Contributor
  • 2 kudos

Boto3 with Access/Secret Key worked. I will try the Service Credentials.  If Databricks Documentation is right, Instance Profiles with Serverless should work to establish Boto3 connection, but, unfortunately, setting up instance profiles on Serverles...

  • 2 kudos
4 More Replies
shubham007
by New Contributor III
  • 55 Views
  • 2 replies
  • 0 kudos

Urgency: Getting Lakebridge installation failed in our organization environment (laptop)

Dear community expert,I’m reaching out for assistance with installing Databricks Lakebridge on my organization laptop. I have confirmed the stated prerequisites are installed: Java 22+, Python 3.11+, and the latest Databricks CLI, but the installer f...

  • 55 Views
  • 2 replies
  • 0 kudos
Latest Reply
WiliamRosa
Contributor
  • 0 kudos

Additionally, you can check the following documentation and links that I found, which may help you:1) To view the available Python versions and guidance on using a virtual environment (virtualenv or conda):https://www.piwheels.org/project/databricks-...

  • 0 kudos
1 More Replies
DatabricksEngi1
by New Contributor III
  • 71 Views
  • 2 replies
  • 0 kudos

MERGE operation not performing data skipping with liquid clustering on key columns

 Hi, I need some help understanding a performance issue.I have a table that reads approximately 800K records every 30 minutes in an incremental manner.Let’s say its primary key is:timestamp, x, y This table is overwritten every 30 minutes and serves ...

  • 71 Views
  • 2 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor II
  • 0 kudos

MERGE is not a pure read plus filter operationEven though Liquid Clustering organizes your data by key ranges and writes min/max stats, the MERGE engine has to identify both matches and non-matches.That means the query planner must:Scan all candidate...

  • 0 kudos
1 More Replies
lecarusin
by New Contributor
  • 68 Views
  • 4 replies
  • 1 kudos

Help regarding a python notebook and s3 file structure

Hello all, I am new to this forum, so please forgive if I am posting in the wrong location (I'd appreciate if the post is moved by mods or am told where to post).I am looking for help with an optimization of a python code I have. This python notebook...

  • 68 Views
  • 4 replies
  • 1 kudos
Latest Reply
arunpalanoor
New Contributor II
  • 1 kudos

I am not sure if I fully understand how your data pipeline is setup, but have you considered incremental data loading say using something similar to "COPY INTO" method which would only read your incremental load, and then apply a 90 day filter on top...

  • 1 kudos
3 More Replies
shubham007
by New Contributor III
  • 44 Views
  • 1 replies
  • 0 kudos

Urgency: How to do Data Migration task using Databricks Lakebridge tool ?

Dear community expert,I have completed two phases Analyzer & Converter of Databricks Lakebridge but stuck at migrating data from source to target using lakebridge. I have watched BrickBites Series on Lakebridge but did not find on how to migrate data...

  • 44 Views
  • 1 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor II
  • 0 kudos

Lakebridge doesn’t copy data. It covers Assessment → Conversion (Analyzer/Converter) → Reconciliation.The fastest way is to use Lakehouse Federation. Create a Snowflake connection in Unity Catalog and run federated queries from Databricks. For perman...

  • 0 kudos
Akshay_Petkar
by Valued Contributor
  • 57 Views
  • 1 replies
  • 0 kudos

Advanced Data Engineering Event and Free Certification Voucher

Hi everyone,In the past couple of years, Databricks has organized an Advanced Data Engineering event where attendees received a 100% free certification voucher under their organization account after attending the session.I wanted to check if this eve...

  • 57 Views
  • 1 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor II
  • 0 kudos

I’m only aware of the Databricks Learning Festival, which typically offers a 50% discount voucher for certification, rather than a full-voucher.I couldn’t find any official confirmation of a 100% free voucher for an “Advanced Data Engineering” event ...

  • 0 kudos
cdn_yyz_yul
by New Contributor II
  • 26 Views
  • 1 replies
  • 0 kudos

delta as streaming source, can the reader reads only newly appended rows?

Hello everyone,In our implementation of Medallion Architecture, we want to stream changes with spark structured streaming. I would like some advice on how to use delta table as source correctly, and if there is performance (memory usage) concern in t...

  • 26 Views
  • 1 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor II
  • 0 kudos

First of all, you are using append-only reads, which means that every time your stream triggers, Spark will process the entire Delta snapshot rather than just the changes.That’s why you’re observing the memory usage increase after each run, it’s not ...

  • 0 kudos
vikram_p
by Visitor
  • 22 Views
  • 1 replies
  • 0 kudos

Generate embeddings for 50 million rows in dataframe

Hello All,I have dataframe with 5 million rows and before we can setup vector search endpoint against index, we want to generate embeddings column for each of those rows. Please suggest whats an optimal way to do this?We are in development phase so w...

  • 22 Views
  • 1 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor II
  • 0 kudos

The easiest and most reliable way to generate embeddings for millions of rows is to let Databricks Vector Search compute them automatically during synchronization from a Delta table.Vector Search can generate embeddings for you, keep them updated whe...

  • 0 kudos
Divya_Bhadauria
by New Contributor III
  • 27 Views
  • 1 replies
  • 0 kudos

Does Databricks Runtime 7.3+ include built-in Hadoop S3 connector configurations?

I came across the KB article S3 connection reset error, which mentions not using the following Spark settings for the Hadoop S3 connector for DBR 7.3 and above:spark.hadoop.fs.s3.impl com.databricks.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl com.data...

  • 27 Views
  • 1 replies
  • 0 kudos
Latest Reply
hasnat_unifeye
  • 0 kudos

No, you don’t need to set those on DBR 7.3 and above.From 7.3+ Databricks already uses the newer Hadoop S3A connector by default, so those com.databricks.s3a.S3AFileSystem settings are not part of the default config and shouldn’t be added.If they are...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels