Data Engineering

Forum Posts

Sorted by:

by milind2000 • New Contributor

01-16-2025 8:05:14 PM

190 Views
1 replies
0 kudos

Question about Data Management for Supply-Demand Allocation

I have a scenario where I am trying to parallelize supply - demand allotment between sellers and buyers with many to many links. I am unsure of whether I can parallelize the calculation using PySpark operations. I have two columns to keep track of in...

Data Engineering

190 Views
1 replies
0 kudos

01-16-2025 8:05:14 PM

View Replies

Latest Reply

Walter_C
Databricks Employee

01-17-2025 7:16:55 AM

0 kudos

Parallelizing supply-demand allotment in PySpark can be challenging due to the need for sequential updates to supply and demand values across rows. However, it is possible to achieve this using PySpark operations, though it may require a different ap...

0 kudos

01-17-2025 7:16:55 AM

by glevine • New Contributor II

01-17-2025 2:51:23 AM

284 Views
1 replies
0 kudos

Resolved! DNSResolve Error while establishing JDBC connection to Azure Databricks

I am using the Databricks JDBC driver (https://databricks.com/spark/jdbc-drivers-download) to connect to Azure Databricks through a VPN.I am connecting through a SAAS low-code platform, Appian, so I don't have access to any more logs. We have set up ...

Data Engineering

284 Views
1 replies
0 kudos

01-17-2025 2:51:23 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

01-17-2025 7:11:55 AM

0 kudos

It seems that the DNS is not being able to resolve the domain name of your workspace, from the browser with the VPN connection are you able to access to it?

0 kudos

01-17-2025 7:11:55 AM

by eballinger • Contributor

01-09-2025 12:09:37 PM

628 Views
6 replies
2 kudos

Resolved! DLT Pipeline Event Logs

There seems to be a issue now with our DLT pipeline event logs. I am not sure if this is a recent bug or not (but they were ok in Dec). But the issue is in dev, qc and prod and we only have a couple days of history logs now visible in the UI.From wha...

Data Engineering

628 Views
6 replies
2 kudos

01-09-2025 12:09:37 PM

View Replies

Latest Reply

Walter_C
Databricks Employee

01-17-2025 7:03:13 AM

2 kudos

Great to hear your issue got resolved.

2 kudos

01-17-2025 7:03:13 AM

5 More Replies

by Costas96 • New Contributor III

01-17-2025 1:28:59 AM

276 Views
1 replies
1 kudos

Resolved! Delta Live Tables: Add sequential column

Hello everyone, I have a DLT table (examp_table) and I want to add a sequential column that its values will be incremented every time a record gets ingested. I tried to do that with monotonically_increasing_id and Window.orderBy("a column") functions...

Data Engineering

276 Views
1 replies
1 kudos

01-17-2025 1:28:59 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

01-17-2025 5:06:29 AM

1 kudos

Hi @Costas96, Thanks for your question. You can use identity column feature. https://www.databricks.com/blog/2022/08/08/identity-columns-to-generate-surrogate-keys-are-now-available-in-a-lakehouse-near-you.html

1 kudos

01-17-2025 5:06:29 AM

by BenceCzako • New Contributor II

12-11-2024 5:03:02 AM

518 Views
5 replies
0 kudos

Databricks mount bug

Hello,I have a weird problem in databricks for which I hope you can suggest some solutions.I have an azureml blob storage mounted to databricks with some folder structure that can be accessed from a notebook as/dbfs/mnt/azuremount/foo/bar/something.t...

Data Engineering

518 Views
5 replies
0 kudos

12-11-2024 5:03:02 AM

View Replies

Latest Reply

BenceCzako
New Contributor II

01-17-2025 4:53:53 AM

0 kudos

Hello,Can you figure out the issue?

0 kudos

01-17-2025 4:53:53 AM

4 More Replies

by Costas96 • New Contributor III

01-16-2025 4:46:04 AM

878 Views
7 replies
0 kudos

Resolved! Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first column

Hello everyone. I am new to DLT and I am trying to practice with it by doing some basic ingestions. I have a query like the following where I am getting data from two tables using UNION. I have noticed that everything gets ingested at the first colum...

Data Engineering

878 Views
7 replies
0 kudos

01-16-2025 4:46:04 AM

View Replies

Latest Reply

Costas96
New Contributor III

01-17-2025 1:21:49 AM

0 kudos

Actually I found the solution by using spark.readStream to read the external tables a and b into two dataframes and then I just did combined_df = df_a.union(df_b) to create my DLT table. Thank you!

0 kudos

01-17-2025 1:21:49 AM

6 More Replies

by udara_zure • New Contributor II

01-16-2025 10:34:08 PM

444 Views
3 replies
0 kudos

Resolved! what is the best way to deploy workflows with different notebooks to execute in different workspaces

I have a workflow in QA workspace that attached one notebook. I need to deploy the same workflow to PRD workspace , with all the notebooks in the azure devops repo and attche and run a different notebook in the PRD workflow.

Data Engineering

444 Views
3 replies
0 kudos

01-16-2025 10:34:08 PM

View Replies

Latest Reply

ashraf1395
Honored Contributor

01-16-2025 10:48:10 PM

0 kudos

Databricks asset bundles can be a great solution for this. Clear and straightforward. https://docs.databricks.com/en/dev-tools/bundles/index.html

0 kudos

01-16-2025 10:48:10 PM

2 More Replies

by ashraf1395 • Honored Contributor

01-16-2025 11:01:33 PM

238 Views
1 replies
2 kudos

Migrating data from hive metastore to unity catalog. data workflow is handled in fivetran

So in a uc migration project,we have a fivetran connection which handles most of the etl processes and writes data into hive metastore. we have migrated the schemas related to fivetran in UC. The workspace where fivetran was running had default catal...

Data Engineering

238 Views
1 replies
2 kudos

01-16-2025 11:01:33 PM

View Replies

Latest Reply

saurabh18cs
Valued Contributor III

01-17-2025 12:18:36 AM

2 kudos

Hi @ashraf1395 I can think of following :Fivetran needs to be aware of the new catalog structure. This typically involves updating the destination settings in Fivetran to point to the Unity Catalog. Navigate to the destination settings for your Datab...

2 kudos

01-17-2025 12:18:36 AM

by jb1z • Contributor

01-15-2025 9:59:36 PM

455 Views
5 replies
0 kudos

Resolved! Query separate data loads from python spark.readStream

I am using python spark.readStream in a Delta Live Tables pipeline to read json data files from a S3 folder path. Each load is a daily snapshot of a very similar set of products showing changes in price and inventory. How do i distinguish and query e...

Data Engineering

455 Views
5 replies
0 kudos

01-15-2025 9:59:36 PM

View Replies

Latest Reply

jb1z
Contributor

01-16-2025 11:45:47 PM

0 kudos

The problem was fixed by this importfrom pyspark.sql import functions as F then using F.lit() instead of F.col.withColumn('ingestion_date', F.lit(folder_date)) Sorry code formatting is not working at the moment.

0 kudos

01-16-2025 11:45:47 PM

4 More Replies

by abhinandan084 • New Contributor III

08-19-2021 11:15:28 AM

24249 Views
19 replies
13 kudos

Community Edition signup issues

I am trying to sign up for the community edition (https://databricks.com/try-databricks) for use with a databricks academy course. However, I am unable to signup and I receive the following error (image attached). On going to login page (link in ora...

Data Engineering

24249 Views
19 replies
13 kudos

08-19-2021 11:15:28 AM

View Replies

Latest Reply

brokeTechBro
New Contributor II

09-20-2024 11:53:19 AM

13 kudos

Hello,I get "An error occurred, try again"I am exhausted from trying... also from solving the puzzle to prove I'm not a robot

13 kudos

09-20-2024 11:53:19 AM

18 More Replies

by minhhung0507 • Contributor

01-15-2025 3:20:39 AM

828 Views
6 replies
5 kudos

Resolved! Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log

Dear Databricks experts,I encountered the following error in Databricks:`com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: [DELTA_EMPTY_DIRECTORY] No file found in the directory: gs://cimb-prod-lakehouse/bronze-layer/losdb/pl_message/_...

Data Engineering

828 Views
6 replies
5 kudos

01-15-2025 3:20:39 AM

View Replies

Latest Reply

hari-prasad
Valued Contributor II

01-16-2025 1:36:04 AM

5 kudos

Hi @minhhung0507,The VACUUM command on a Delta table does not delete the _delta_log folder, as this folder contains all the metadata related to the Delta table. The _delta_log folder acts as a pointer where all changes are tracked. In the event that ...

5 kudos

01-16-2025 1:36:04 AM

5 More Replies

by ErikJ • New Contributor III

10-11-2024 3:10:45 AM

2162 Views
7 replies
2 kudos

Errors calling databricks rest api /api/2.1/jobs/run-now with job_parameters

Hello! I have been using the databricks rest api for running workflows using this endpoint: /api/2.1/jobs/run-now. But now i wanted to also include job_parameters in my api call, i have put job parameters inside my workflow: param1, param2, and in my...

Data Engineering

2162 Views
7 replies
2 kudos

10-11-2024 3:10:45 AM

View Replies

Latest Reply

slkdfuba
New Contributor II

01-16-2025 4:28:08 PM

2 kudos

I encountered a null job_id in my post, when a notebook parameter was set in the job GUI. But it runs just fine (I get a valid job_id with active run) if I delete the notebook parameter in the job GUI.Is this a documented behavior, or a bug? If it's ...

2 kudos

01-16-2025 4:28:08 PM

6 More Replies

by diegohMoodys • New Contributor

01-16-2025 7:31:03 AM

289 Views
1 replies
0 kudos

JBDC RBMS Table Overwrite Transaction Incomplete

Spark version: spark-3.4.1-bin-hadoop3JBDC Driver: mysql-connector-j-8.4.0.jarAssumptions:have all the proper read/write permissionsdataset isn't large: ~2 million recordsreading flat files, writing to a databaseDoes not read from the database at al...

Data Engineering

289 Views
1 replies
0 kudos

01-16-2025 7:31:03 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

01-16-2025 12:24:10 PM

0 kudos

Hi @diegohMoodys, Can you try in debug mode? spark.sparkContext.setLogLevel("DEBUG")

0 kudos

01-16-2025 12:24:10 PM

by stevomcnevo007 • New Contributor III

12-27-2024 5:15:47 AM

2429 Views
16 replies
2 kudos

agents.deploy NOT_FOUND: The directory being accessed is not found. error

I keep getting the following error although the model definitely does exist and version names and model name is correct RestException: NOT_FOUND: The directory being accessed is not found. when calling # Deploy the model to the review app and a model...

Data Engineering

2429 Views
16 replies
2 kudos

12-27-2024 5:15:47 AM

View Replies

Latest Reply

ezermoysis
New Contributor III

01-16-2025 10:18:17 AM

2 kudos

Does the model need to be served before deployment?

2 kudos

01-16-2025 10:18:17 AM

15 More Replies

by Aatma • New Contributor

08-20-2024 5:50:35 PM

1046 Views
1 replies
0 kudos

Resolved! DABs require library dependancies from GitHub private repository.

developing a python wheel file using DABs which require library dependancies from GitHub private repository. Please help me understand how to setup the git user and token in the resource.yml file and how to authenticate the GitHub package.pip install...

Data Engineering

1046 Views
1 replies
0 kudos

08-20-2024 5:50:35 PM

View Replies

Latest Reply

Satyadeepak
Databricks Employee

01-16-2025 10:05:37 AM

0 kudos

To install dependencies from a private GitHub repository in a Databricks Asset Bundle, you need to set up the GitHub user and token in the resource.yml file and authenticate the GitHub package. Here are the steps: Generate a GitHub Personal Access T...

0 kudos

01-16-2025 10:05:37 AM

User

Count

1611

763

345

286

252

Databricks Community

Forum Posts

Question about Data Management for Supply-Demand Allocation

Resolved! DNSResolve Error while establishing JDBC connection to Azure Databricks

Resolved! DLT Pipeline Event Logs

Resolved! Delta Live Tables: Add sequential column

Databricks mount bug

Resolved! Delta Live Tables: Creating table with spark.sql and everything gets ingested at the first column

Resolved! what is the best way to deploy workflows with different notebooks to execute in different workspaces

Migrating data from hive metastore to unity catalog. data workflow is handled in fivetran

Resolved! Query separate data loads from python spark.readStream

Community Edition signup issues

Resolved! Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log

Errors calling databricks rest api /api/2.1/jobs/run-now with job_parameters

JBDC RBMS Table Overwrite Transaction Incomplete

agents.deploy NOT_FOUND: The directory being accessed is not found. error

Resolved! DABs require library dependancies from GitHub private repository.

Join Us as a Local Community Builder!

CREATE TEMP TABLE

Error while reading file from Cloud Storage

Permission Issue in Delta Lake Course

Connecting to SQL on Databricks Using SQLAlchemy o...

databricks bundle Deploy: exit code 0 even if an e...