cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sachin_kanchan
by New Contributor III
  • 2531 Views
  • 6 replies
  • 0 kudos

Unable to log in into Community Edition

So I just registered for the Databricks Community Edition. And received an email for verification.When I click the link, I'm redirected to this website (image attached) where I am asked to input email. And when I do that, it sends me a verification c...

db_fail.png
  • 2531 Views
  • 6 replies
  • 0 kudos
Latest Reply
sachin_kanchan
New Contributor III
  • 0 kudos

What a disappointment this has been

  • 0 kudos
5 More Replies
prasidataengine
by New Contributor II
  • 2001 Views
  • 2 replies
  • 0 kudos

Issue when connecting with Databricks cluster 15.4 without unity catalog using databricks connect

Hi,I have a shared cluster created on databricks which uses 15.4 runtime.I dont want to enable the unity catalog for this cluster.Previously I used python 3.9.13 version to connect to 11.3 cluster using databricks connect 11.3Now my company has restr...

Data Engineering
Databricks
databricks-connect
  • 2001 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @prasidataengine, For DBR runtime 13.3 LTS and above you must have Unity Catalog enabled to be able to use databricks-connect. A Databricks account and workspace that have Unity Catalog enabled. See Set up and manage Unity Catalog and Enable a wo...

  • 0 kudos
1 More Replies
vidya_kothavale
by Contributor
  • 1248 Views
  • 2 replies
  • 0 kudos

MongoDB Streaming Not Receiving Records in Databricks

Batch Read (spark.read.format("mongodb")) works fine.Streaming Read (spark.readStream.format("mongodb")) runs but receives no records.Batch Read (Works):df = spark.read.format("mongodb")\.option("database", database)\.option("spark.mongodb.read.conne...

  • 1248 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @vidya_kothavale, MongoDB requires the use of change streams to enable streaming. Change streams allow applications to access real-time data changes without polling the database. Ensure that your MongoDB instance is configured to support change...

  • 0 kudos
1 More Replies
Dianagarces8
by New Contributor
  • 835 Views
  • 1 replies
  • 0 kudos

The lifetime of files in the DBFS are NOT tied to the lifetime of our cluster

What happen so that the lifetime of files in the DBFS are NOT tied to the lifetime of our cluster?

  • 835 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

files in dbfs are typically not linked to a cluster or it's lifetime.There are tmp directories in dbfs so perhaps you are looking at those, but f.e. Filestore can definitely be used.However, I suggest not using dbfs but some data lake (S3/ADLS).

  • 0 kudos
AbishekP
by New Contributor
  • 1214 Views
  • 1 replies
  • 0 kudos

Unable to run selected lines in Databricks

I'm using SQL language in databricks. Basically I'm a tester and I'm trying to test the data load on tables by writing various queries. I'm unable to select a particular query and run. Ctrl+Shift+Enter shortcut is not working.Currently I need to open...

  • 1214 Views
  • 1 replies
  • 0 kudos
Latest Reply
Edthehead
Contributor III
  • 0 kudos

You cannot do this from the notebooks. But you can do it via the SQL editor as shown below. 

  • 0 kudos
Radix95
by New Contributor II
  • 3288 Views
  • 3 replies
  • 2 kudos

Resolved! Error updating tables in DLT

I'm working on a Delta Live Tables (DLT) pipeline in Databricks Serverless mode.I receive a stream of data from Event Hubs, where each incoming record contains a unique identifier (uuid) along with some attributes (code1, code2).My goal is to update ...

  • 3288 Views
  • 3 replies
  • 2 kudos
Latest Reply
Edthehead
Contributor III
  • 2 kudos

All the tables that DLT writes to or updates needs to be managed by DLT. The reason is that these tables are streaming tables and hence DLT needs to manage the checkpointing. It also does the optimization for such tables. So in your scenario, you can...

  • 2 kudos
2 More Replies
Hariharan49
by New Contributor
  • 2844 Views
  • 4 replies
  • 1 kudos

How can I use multiple schema in DLT?

Hi I would like to use multiple schema as destination in dlt but currently I can just give single unity schema . I have my tables of multi hop in different schema.

  • 2844 Views
  • 4 replies
  • 1 kudos
Latest Reply
kuldeep-in
Databricks Employee
  • 1 kudos

@Hariharan49  'Direct Publishing Mode' Public Preview is now live on all production regions. This feature will allow you to write to multiple schemas & catalogs from the same pipeline.

  • 1 kudos
3 More Replies
lauraxyz
by Contributor
  • 2751 Views
  • 2 replies
  • 0 kudos

Online Table: create only if it does not exist

i'm following this Doc to create online table using Databricks SDK. How can i set it to create ONLY when it doesn't exist, to avoid the failure of "table already exists" error? Or, is there another way to programatic way to check existence of an Onli...

  • 2751 Views
  • 2 replies
  • 0 kudos
Latest Reply
lauraxyz
Contributor
  • 0 kudos

Thank you @Alberto_Umana ,  that's a good way to go when there's no built-in creat-if-not-exist feature.i also tried a different way to use  information_schema, i think it should work toodef table_exists(table_name): return spark.sql(f""" ...

  • 0 kudos
1 More Replies
AP52
by New Contributor III
  • 2332 Views
  • 4 replies
  • 3 kudos

Resolved! Package in Python Wheel not Importing When Running on Serverless Compute

Hi All, I am using a python wheel to execute ingestions with Databricks workflows based on entry points in the wheel for each workflow. Included in the .whl file is a separate script named functions.py which includes several functions which get impor...

  • 2332 Views
  • 4 replies
  • 3 kudos
Latest Reply
AP52
New Contributor III
  • 3 kudos

To close out this thread we found the issue we were having with serverless didn't have to do with our import, but with the isinstance check we are using in our if statements for different functions. In short, serverless uses a different DataFrame typ...

  • 3 kudos
3 More Replies
dmadh
by New Contributor
  • 1296 Views
  • 1 replies
  • 0 kudos

Optimizing Task Execution Time on Databricks Serverless Compute

Question:To reduce cluster- start up times, trying out the serveless compute option while triggering workflows, for proof of concept. I've noticed that a simple pyspark DataFrame creation task completes in 40-50 seconds. However, when multiple reques...

  • 1296 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @dmadh, At the moment there isn't a direct way to improve this. Our engineering team is working on "speed optimized" feature and "warm pool" but isn't available yet. 

  • 0 kudos
Fabich
by New Contributor II
  • 4963 Views
  • 4 replies
  • 1 kudos

What's the ETA for supporting Java 21 in the JDBC Driver ?

Hello,I have seen this other post about the Java JDBC driver not working in Java 21.The post is now 3 months old and Java 21 has been available for even longer, is there any update on the topic ?Can you communicate any ETA of when we can expect the d...

Data Engineering
driver
java
java21
JDBC
  • 4963 Views
  • 4 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Hello, unfortunately as of now there is still no ETA of support of JAVA 32 with the Arrow functionality, the team is working on this but still no information of release has been provided

  • 1 kudos
3 More Replies
rrajan
by New Contributor II
  • 1640 Views
  • 4 replies
  • 0 kudos

Urgent Help Needed - Databricks Notebook Failure Handle for Incremental Processing

I have created a notebook which helps in creating three different gold layer objects from one single silver table. All these tables are processed incremently. I want to develop the failure handling scenario in case if the pipeline fails after loading...

  • 1640 Views
  • 4 replies
  • 0 kudos
Latest Reply
CamdenJacobs
New Contributor II
  • 0 kudos

Thank you so much for the suggestion.

  • 0 kudos
3 More Replies
ImranA
by Contributor
  • 1541 Views
  • 1 replies
  • 0 kudos

How to do a Full Load using DLT pipeline

if I use "spark.readStream" it does incremental loads and If I do "spark.read" it creates a materialised view.What I want is:  do a full load each time(no need of scd types) and it should be a streaming table and not a materialised view.Any help woul...

  • 1541 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

In Databricks Delta Live Tables (DLT), you can't directly truncate a streaming table, as streaming tables are append-only by design. However in your scenario, you could possibly use a job workflow, where the first task runs a sql statement (using ser...

  • 0 kudos
Labels