Data Engineering

Forum Posts

Sorted by:

by User16826994223 • Databricks Employee

06-25-2021 10:25:47 AM

7629 Views
5 replies
7 kudos

How to access final delta table for web application or interface.

I have a final layer of the gold delta table, that has final aggregated data from silver data . I want to access this final layer of data through the WEB interfaceI think I need to write a web script that would run the spark SQL behind to get the d...

Data Engineering

7629 Views
5 replies
7 kudos

06-25-2021 10:25:47 AM

View Replies

Latest Reply

h_h_ak
Contributor

11-02-2024 3:43:27 PM

7 kudos

You can also use direct statement execution from databricks: https://docs.databricks.com/api/workspace/statementexecution

7 kudos

11-02-2024 3:43:27 PM

4 More Replies

by olivier-soucy • Contributor

10-22-2024 9:36:27 AM

2242 Views
5 replies
2 kudos

Resolved! Spark Structured Streaming foreachBatch with databricks-connect

Hello!I'm trying to use the foreachBatch method of a Spark Streaming DataFrame with databricks-connect. Given that spark connect supported was added to `foreachBatch` in 3.5.0, I was expecting this to work.Configuration:- DBR 15.4 (Spark 3.5.0)- dat...

Data Engineering

2242 Views
5 replies
2 kudos

10-22-2024 9:36:27 AM

View Replies

Latest Reply

VZLA
Databricks Employee

11-01-2024 1:18:34 AM

2 kudos

Thanks for sharing the solution! Just curious, was the original error message reported in this post in the Driver log as well?

2 kudos

11-01-2024 1:18:34 AM

4 More Replies

by alvaro_databric • New Contributor III

02-13-2023 6:39:29 AM

3365 Views
2 replies
2 kudos

How to access hard disk attached to cluster?

Hi,I am using the VM family Lasv3, which incorporate a NVMe SSD. I would like to take advantage of this huge amount of space but I cannot find where this disk is mounted. Does someone know where this disk is mounted and if it can be used as local dri...

Data Engineering

3365 Views
2 replies
2 kudos

02-13-2023 6:39:29 AM

View Replies

Latest Reply

JosiahJohnston
New Contributor III

11-01-2024 4:00:19 PM

2 kudos

Great question; I've been trying to hunt that down also. `/local_disk0` looks like a good candidate, but it has restricted access and I can't confirm or use.Would love to learn a solution someday. This is a big need for hybrid workflows & libraries c...

2 kudos

11-01-2024 4:00:19 PM

1 More Replies

by Anand4 • New Contributor II

11-01-2024 1:44:00 PM

2431 Views
1 replies
2 kudos

Resolved! Delta Table - Partitioning

Created a streaming job with delta table as a target. The table did not have a partition when created earlier, however i would like to add an existing column as a partition column.I am getting the following error.com.databricks.sql.transaction.tahoe...

Data Engineering

2431 Views
1 replies
2 kudos

11-01-2024 1:44:00 PM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

11-01-2024 3:26:12 PM

2 kudos

Hi @Anand4,Delta Lake does not support altering the partitioning of an existing table directly. Therefore, the way forward is to rewrite the entire table with the new partition column

2 kudos

11-01-2024 3:26:12 PM

by mvmiller • New Contributor III

02-22-2024 11:38:31 AM

9861 Views
4 replies
2 kudos

Troubleshooting _handle_rpc_error GRPC Error

I am trying to run the following chunk of code in the cell of a Databricks notebook (using Databricks runtime 14.3 LTS, Apache spark 3.5.0, scala 2.12): spark.sql("CREATE OR REPLACE table sample_catalog.sample_schema.sample_table_tmp AS SELECT * FROM...

Data Engineering

9861 Views
4 replies
2 kudos

02-22-2024 11:38:31 AM

View Replies

Latest Reply

kunalmishra9
Contributor

11-01-2024 11:57:00 AM

2 kudos

Following. Also having this issue, but within the context of pivoting a DF, then aggregating by *

2 kudos

11-01-2024 11:57:00 AM

3 More Replies

by ChristianRRL • Valued Contributor III

10-29-2024 6:48:50 PM

2272 Views
7 replies
3 kudos

DLT Potential Bug: File Reprocessing Issue with "cloudFiles.allowOverwrites": "true"

Hi there, I ran into a peculiar case and I'm wondering if anyone else has run into this and can offer an explanation. We have a DLT process to pull CSV files from a landing location and insert (append) them into target tables. We have the setting "cl...

Data Engineering

2272 Views
7 replies
3 kudos

10-29-2024 6:48:50 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

11-01-2024 10:25:59 AM

3 kudos

Apologies, that could be the internet or networking issue. So, in DLT you will be able to change the DBR but will have to use custom image, it may be tricky if you have not done it earlier. By default, photon will be used in serverelss. It may be a ...

3 kudos

11-01-2024 10:25:59 AM

6 More Replies

by FabianGutierrez • Contributor

10-24-2024 12:12:02 AM

3512 Views
3 replies
1 kudos

Issue with DAB (Databricks Asset Bundle) requesting Terraform files

Hi community,Since recently (2 days ago) we have been receiving the following error when validating and deploying our DAB (Databricks Asset Bundle):"Error: error downloading Terraform: Get "https://releases.hashicorp.com/terraform/1.5.5/index.json": ...

Data Engineering

3512 Views
3 replies
1 kudos

10-24-2024 12:12:02 AM

View Replies

Latest Reply

FabianGutierrez
Contributor

11-01-2024 7:16:45 AM

1 kudos

Some update, we cannot get the FW cleared on time so we need to go for the offline optiion, that is download everything form Terraform and DB templated but it is not as clear or intuitive as describe. Using their Container unfortunately not a option ...

1 kudos

11-01-2024 7:16:45 AM

2 More Replies

by pjv • New Contributor III

10-09-2024 3:05:17 AM

1858 Views
1 replies
0 kudos

How to ensure pyspark udf execution is distributed across worker nodes

Hi,I have the following databricks notebook code defined: pyspark_dataframe = create_pyspark_dataframe(some input data)MyUDF = udf(myfunc, StringType())pyspark_dataframe = pyspark_dataframe.withColumn('UDFOutput', DownloadUDF(input data columns))outp...

Data Engineering

1858 Views
1 replies
0 kudos

10-09-2024 3:05:17 AM

View Replies

Latest Reply

VZLA
Databricks Employee

11-01-2024 6:59:28 AM

0 kudos

@pjv Can you please try the following, you'll basically want to have more than a single partition: from pyspark.sql import SparkSession from pyspark.sql.functions import udf from pyspark.sql.types import StringType # Initialize Spark session (if not...

0 kudos

11-01-2024 6:59:28 AM

by Vasu_Kumar_T • New Contributor II

10-09-2024 4:35:30 AM

613 Views
1 replies
0 kudos

Larger than Max error :

Hi,We are trying to pass the keys to decrypt a file and receiving the above error as in attached.Please help in case we need to change and configuration or set any options to avoid this error. Thanks. Vasu

Data Engineering

613 Views
1 replies
0 kudos

10-09-2024 4:35:30 AM

View Replies

Latest Reply

VZLA
Databricks Employee

11-01-2024 6:55:49 AM

0 kudos

@Vasu_Kumar_T can you provide some more details or context? Feel free to replace sensitive data. Where are you getting this? How are you passing the keys to decrypt a file? Is there a move comprehensive stacktrace apart from this message in the image...

0 kudos

11-01-2024 6:55:49 AM

by sangram11 • New Contributor

10-29-2024 7:49:33 PM

1611 Views
4 replies
0 kudos

Myths about vacuum command

I identified some myths while working with vacuum command spark 3.5.x.1. vacuum command is not working with days. Instead it's retain clause is asking explicitly to supply values in hours. I tried many times, and it is throwing parse syntax error (wh...

Data Engineering

1611 Views
4 replies
0 kudos

10-29-2024 7:49:33 PM

View Replies

Latest Reply

VZLA
Databricks Employee

11-01-2024 1:25:11 AM

0 kudos

Thanks for reporting this Sangram. Are these youtube and educational contents in the Databricks channel? > set delta.databricks.delta.retentionDurationCheck.enabled = false. It works if I want to delete obsolete files whose lifespan is less than defa...

0 kudos

11-01-2024 1:25:11 AM

3 More Replies

by kidexp • New Contributor II

04-14-2015 2:58:01 PM

28317 Views
7 replies
2 kudos

Resolved! How to install python package on spark cluster

Hi, How can I install python packages on spark cluster? in local, I can use pip install. I want to use some external packages which is not installed on was spark cluster. Thanks for any suggestions.

Data Engineering

28317 Views
7 replies
2 kudos

04-14-2015 2:58:01 PM

View Replies

Latest Reply

Mikejerere
New Contributor II

10-31-2024 11:53:54 PM

2 kudos

If --py-files doesn’t work, try this shorter method:Create a Conda Environment: Install your packages.conda create -n myenv python=3.xconda activate myenvpip install your-packagePackage and Submit: Use conda-pack and spark-submit with --archives.cond...

2 kudos

10-31-2024 11:53:54 PM

6 More Replies

by Akshay_127877 • New Contributor II

03-21-2023 11:45:03 AM

53237 Views
8 replies
1 kudos

How to open Streamlit URL that is hosted by Databricks in local web browser?

I have run this webapp code on Databricks notebook. It works properly without any errors. With databricks acting as server, I am unable open this link on my browser for this webapp.But when I run the code on my local IDE, I am able to just open the U...

Data Engineering

53237 Views
8 replies
1 kudos

03-21-2023 11:45:03 AM

View Replies

Latest Reply

navallyemul
New Contributor III

10-31-2024 11:23:22 PM

1 kudos

@Akshay_127877 : Were you able to resolve this issue?

1 kudos

10-31-2024 11:23:22 PM

7 More Replies

by IoannaV • New Contributor

10-14-2024 7:14:56 AM

1400 Views
1 replies
0 kudos

Issue with Uploading Oracle Driver in Azure Databricks Cluster

Hi, Could you please help me with the following ?I am facing the bellow issue when I try to upload a jar file in the Azure Databricks Libraries.Only Wheel and requirements file from /Workspace are allowed on Assigned UC cluster. Denied library is Jar...

Data Engineering

1400 Views
1 replies
0 kudos

10-14-2024 7:14:56 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

10-31-2024 11:14:56 PM

0 kudos

Hey, This is by design. I understand the jobs are failing when run on a UC Single user cluster since it is unable to install a Jar package located in the /Workspace path. This is however a known behaviour and is already documented below: https://docs...

0 kudos

10-31-2024 11:14:56 PM

by lprevost • Contributor II

10-14-2024 2:07:57 PM

1043 Views
1 replies
0 kudos

Using Autoloader in DLT: ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP]

I've been using Autloloader in a DLT pipeline loading data from an s3 location to my hive_metastore shared with AWS glue.I'm now trying to migrate this over to Unity Catalog to take advantage of liquid clustering and data quality.However, I'm getting...

Data Engineering

1043 Views
1 replies
0 kudos

10-14-2024 2:07:57 PM

View Replies

Latest Reply

NandiniN
Databricks Employee

10-31-2024 11:10:34 PM

0 kudos

https://kb.databricks.com/unity-catalog/invalid_parameter_valuelocation_overlap-overlaps-with-managed-storage-error

0 kudos

10-31-2024 11:10:34 PM

by nagendrapruthvi • New Contributor

10-15-2024 9:24:11 AM

1081 Views
2 replies
0 kudos

Cannot login to databricks using SSO

Hi, I created accounts with Databricks for both production and staging environments at my company, but I made a mistake with the case of the email addresses. For production, I used Xyz@company.com, and for staging, I used xyz@company.com.Now that my...

Data Engineering

1081 Views
2 replies
0 kudos

10-15-2024 9:24:11 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

10-31-2024 10:59:22 PM

0 kudos

Okay, so I checked some documents - The email addresses will also be case-insensitive, the same behavior as in AWS, Azure and GCP. This means that email addresses will be stored in lowercase in Databricks. So, the issue is not with case sensitivity b...

0 kudos

10-31-2024 10:59:22 PM

1 More Replies

Databricks Community

Forum Posts

How to access final delta table for web application or interface.

Resolved! Spark Structured Streaming foreachBatch with databricks-connect

How to access hard disk attached to cluster?

Resolved! Delta Table - Partitioning

Troubleshooting _handle_rpc_error GRPC Error

DLT Potential Bug: File Reprocessing Issue with "cloudFiles.allowOverwrites": "true"

Issue with DAB (Databricks Asset Bundle) requesting Terraform files

How to ensure pyspark udf execution is distributed across worker nodes

Larger than Max error :

Myths about vacuum command

Resolved! How to install python package on spark cluster

How to open Streamlit URL that is hosted by Databricks in local web browser?

Issue with Uploading Oracle Driver in Azure Databricks Cluster

Using Autoloader in DLT: ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP]

Cannot login to databricks using SSO

Join Us as a Local Community Builder!

Delta live tables - foreign keys

Inconsistent behaviour when using read_files to re...

SQL Warehouse - Table does not support overwrite b...

Naming question about SQL server database schemas

API Call to return more than 100 jobs