Data Engineering

Forum Posts

Sorted by:

by bricksdata • New Contributor

02-08-2023 1:14:43 AM

4958 Views
3 replies
0 kudos

Unable to authenticate against https://accounts.cloud.databricks.com as an account admin.

ProblemI'm unable to authenticate against the https://accounts.cloud.databricks.com endpoint even though I'm an account admin. I need it to assign account level groups to workspaces via the workspace assignment api (https://api-docs.databricks.com/re...

Data Engineering

4958 Views
3 replies
0 kudos

02-08-2023 1:14:43 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 3:14:14 AM

0 kudos

Hi @lasse l Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...

0 kudos

04-10-2023 3:14:14 AM

2 More Replies

by meystingray • New Contributor II

09-01-2023 10:59:35 AM

1805 Views
1 replies
2 kudos

Resolved! Azure Databricks: Cannot create volumes or tables

If I try to create a Volume, I get this error:Failed to access cloud storage: AbfsRestOperationException exceptionTraceId=fa207c57-db1a-406e-926f-4a7ff0e4afddWhen i try to create a table, I get this error:Error creating table[RequestId=4b8fedcf-24b3-...

Data Engineering

1805 Views
1 replies
2 kudos

09-01-2023 10:59:35 AM

View Replies

Latest Reply

Kaniz
Community Manager

09-04-2023 8:52:24 AM

2 kudos

Hi @meystingray, • Databricks cannot access Azure storage, causing errors when creating a volume or table.• Storage container has Storage Blob Contributor Access, and Storage Account has access, but there may be setup issues. • Troubleshooting steps...

2 kudos

09-04-2023 8:52:24 AM

by Madison • New Contributor II

09-01-2023 12:54:28 AM

2919 Views
3 replies
0 kudos

AnalysisException: [ErrorClass=INVALID_PARAMETER_VALUE] Missing cloud file system scheme

I am trying to follow along Apache Spark Programming training module where the instructor creates events table from a parquet file like this:%sql CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/mnt/training/ecommerce/events/events.par...

Data Engineering

Databricks SQL

2919 Views
3 replies
0 kudos

09-01-2023 12:54:28 AM

View Replies

Latest Reply

Madison
New Contributor II

09-03-2023 8:06:49 AM

0 kudos

@Kaniz Thanks for your response. I didn't provide cloud file system scheme in the path while creating the table using DataFrame API, but I was still able to create the table. %python # File location and type file_location = "/mnt/training/ecommerce/...

0 kudos

09-03-2023 8:06:49 AM

2 More Replies

by XavierPereVives • New Contributor II

09-03-2023 6:58:21 PM

1044 Views
3 replies
0 kudos

Azure Shared Clusters - P4J Security Exception on non-whitelisted classes

When I try to use a third party JAR on an Azure shared cluster - which is installed via Maven and I can successfully import - , I get the following message: py4j.security.Py4JSecurityException: Method public static org.apache.spark.sql.Column com.da...

Data Engineering

1044 Views
3 replies
0 kudos

09-03-2023 6:58:21 PM

View Replies

Latest Reply

XavierPereVives
New Contributor II

09-04-2023 6:25:25 AM

0 kudos

Thanks Kaniz.I must use a shared cluster because I'm reading from a DLT table stored in a Unity Catalog.https://docs.databricks.com/en/data-governance/unity-catalog/compute.htmlMy understanding is that shared clusters are enforcing the Py4J policy I ...

0 kudos

09-04-2023 6:25:25 AM

2 More Replies

by alemo • New Contributor III

07-03-2023 11:25:05 PM

974 Views
3 replies
1 kudos

Delta live table UC Kinesis: options overwriteschema, ignorechanges not supported for data sourc

I try to build a DLT in UC with Kinesis as producer.My first table looks like: @dlt.create_table( table_properties={ "pipelines.autoOptimize.managed": "true" }, spark_conf={"spark.databricks.delta.schema.autoMerge.enabled": "true"},)def feed_chu...

Data Engineering

974 Views
3 replies
1 kudos

07-03-2023 11:25:05 PM

View Replies

Latest Reply

Corbin
New Contributor III

08-23-2023 6:43:56 AM

1 kudos

If you use the "Preview" Channel in the "Advanced" section of the DLT Pipeline, this error should resolve itself. This fix is planned to make it into the "Current" channel by Aug 31, 2023

1 kudos

08-23-2023 6:43:56 AM

2 More Replies

by vroste • New Contributor III

09-04-2023 1:59:49 AM

1012 Views
1 replies
1 kudos

Resolved! Delta Live Tables maintenance schedule

I have a DLT that runs every day and an automatically executed maintenance job that runs within 24 hours every day. The maintenance operations are costly, is it possible to change the schedule to once a week or so?

Data Engineering

1012 Views
1 replies
1 kudos

09-04-2023 1:59:49 AM

View Replies

Latest Reply

Kaniz
Community Manager

09-04-2023 3:38:02 AM

1 kudos

Hi @vroste, Based on the information provided, it is impossible to directly change the frequency of the automatic maintenance tasks performed by Delta Live Tables (DLT) from every 24 hours to once a week. The system is designed to perform maintenance...

1 kudos

09-04-2023 3:38:02 AM

by scvbelle • New Contributor III

08-30-2023 4:47:11 AM

1836 Views
3 replies
3 kudos

Resolved! DLT failure: ABFS does not allow files or directories to end with a dot

In my DLT pipeline outlined below which generically cleans identifier tables, after successfully creating initial streaming tables from the append-only sources, fails when trying to create the second cleaned tables witht the following:It'**bleep** cl...

Data Engineering

abfss

azure

dlt

engineering

1836 Views
3 replies
3 kudos

08-30-2023 4:47:11 AM

View Replies

Latest Reply

Priyanka_Biswas
Valued Contributor

09-01-2023 2:56:44 PM

3 kudos

Hi @scvbelle The error message you're seeing is caused by an IllegalArgumentException error due to the restriction in Azure Blob File System (ABFS) that does not allow files or directories to end with a dot. This error is thrown by the trailingPeriod...

3 kudos

09-01-2023 2:56:44 PM

2 More Replies

by kinsun • New Contributor II

04-27-2023 7:58:46 AM

7290 Views
5 replies
0 kudos

Resolved! DBFS and Local File System Doubts

Dear Databricks Expert,I got some doubts when dealing with DBFS and Local File System.Case01: Copy a file from ADLS to DBFS. I am able to do so through the below python codes:#spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.a...

Data Engineering

7290 Views
5 replies
0 kudos

04-27-2023 7:58:46 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2023 10:34:27 PM

0 kudos

Hi @KS LAU Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your q...

0 kudos

04-28-2023 10:34:27 PM

4 More Replies

by MNotUsed • New Contributor

12-07-2022 5:24:20 PM

2858 Views
7 replies
4 kudos

Resolved! Change Databricks Community User Name

How do I change my Databricks Community user name?

Data Engineering

2858 Views
7 replies
4 kudos

12-07-2022 5:24:20 PM

View Replies

Latest Reply

MetaMaestro
New Contributor III

09-01-2023 4:39:06 AM

4 kudos

Hi Sujitha,Thanks for the quick help! I cleaned the caches, and it works now.

4 kudos

09-01-2023 4:39:06 AM

6 More Replies

by Nino • Contributor

08-09-2023 10:34:38 PM

4411 Views
9 replies
5 kudos

Resolved! Where in Hive Metastore can the s3 locations of Databricks tables be found?

I have a few Databricks clusters, some share a single Hive Metastore (HMS), call them PROD_CLUSTERS, and an additional cluster, ADHOC_CLUSTER, which has its own HMS. All my data is stored in S3, as Databricks delta tables: PROD_CLUSTERS have read-wri...

Data Engineering

HMS

metastore

4411 Views
9 replies
5 kudos

08-09-2023 10:34:38 PM

View Replies

Latest Reply

Kaniz
Community Manager

09-01-2023 1:11:10 AM

5 kudos

Hi @Nino , To query HMS to get the full path for all data files of tables defined in that HMS, you can use the Hive MetaStore API. Specifically, you can use the GET_TABLE_FILES operation to retrieve the file metadata for a given table, including the ...

5 kudos

09-01-2023 1:11:10 AM

8 More Replies

by Soma • Valued Contributor

08-08-2023 3:36:38 AM

2438 Views
10 replies
2 kudos

spark streaming listener is lagging

We use pyspark streaming listener and it is lagging for 10 hrsThe data streamed in 10 am IST is logged at 10 PM IstCan someone explain how logging listener interface work

Data Engineering

2438 Views
10 replies
2 kudos

08-08-2023 3:36:38 AM

View Replies

Latest Reply

jerrymark
New Contributor II

09-01-2023 2:39:49 AM

2 kudos

When you're experiencing lag in Spark Streaming, it means that the system is not processing data in real-time, and there is a delay in data processing. This delay can be caused by various factors, and diagnosing and addressing the issue requires care...

2 kudos

09-01-2023 2:39:49 AM

9 More Replies

by sourander • New Contributor III

09-15-2021 3:36:45 AM

9633 Views
15 replies
7 kudos

Resolved! Protobuf deserialization in Databricks

Hi,Let's assume I have these things:Binary column containing protobuf-serialized dataThe .proto file including message definitionWhat different approaches have Databricks users chosen to deserialize the data? Python is the programming language that...

Data Engineering

9633 Views
15 replies
7 kudos

09-15-2021 3:36:45 AM

View Replies

Latest Reply

Amou
New Contributor II

08-31-2023 4:12:58 PM

7 kudos

We've now added a native connector with parsing directly with Spark Dataframes. https://docs.databricks.com/en/structured-streaming/protocol-buffers.htmlfrom pyspark.sql.protobuf.functions import to_protobuf, from_protobuf schema_registry_options = ...

7 kudos

08-31-2023 4:12:58 PM

14 More Replies

by mjbobak • New Contributor III

09-08-2022 6:31:52 PM

11952 Views
5 replies
9 kudos

Resolved! How to import a helper module that uses databricks specific modules (dbutils)

I have a main databricks notebook that runs a handful of functions. In this notebook, I import a helper.py file that is in my same repo and when I execute the import everything looks fine. Inside my helper.py there's a function that leverages built-i...

Data Engineering

11952 Views
5 replies
9 kudos

09-08-2022 6:31:52 PM

View Replies

Latest Reply

amitca71
Contributor II

12-11-2022 7:51:48 AM

9 kudos

Hi,i 'm facing similiar issue, when deploying via dbx.I have an helper notebook, that when executing it via jobs works fine (without any includes)while i deploy it via dbx (to same cluster), the helper notebook results withdbutils.fs.ls(path)NameEr...

9 kudos

12-11-2022 7:51:48 AM

4 More Replies

by RC • Contributor

08-30-2023 5:15:23 AM

1647 Views
3 replies
1 kudos

Error while creating table with Glue catalog

Hi, I have Databricks cluster earlier connected to hive metastore and we have started migrating to Glue catalog.I'm facing an issue while creating table,Path must be absolute: <table-name>-__PLACEHOLDER__We have provided full access to glue and s3 in...

Data Engineering

1647 Views
3 replies
1 kudos

08-30-2023 5:15:23 AM

View Replies

Latest Reply

Kaniz
Community Manager

08-30-2023 8:14:30 AM

1 kudos

Hi @RC, The error message you're seeing suggests that the table path is not absolute. This could be due to how you create the table in the Glue Catalog. As per the given sources, when using AWS Glue Data Catalog as the metastore, it's recommended to...

1 kudos

08-30-2023 8:14:30 AM

2 More Replies

by Nasreddin • New Contributor

11-02-2021 1:20:19 PM

4156 Views
2 replies
0 kudos

ColumnTransformer not fitted after sklearn Pipeline loaded from Mlflow

I am building a machine learning model using sklearn Pipeline which includes a ColumnTransformer as a preprocessor before the actual model. Below is the code how the pipeline is created.transformers = [] num_pipe = Pipeline(steps=[ ('imputer', Si...

Data Engineering

4156 Views
2 replies
0 kudos

11-02-2021 1:20:19 PM

View Replies

Latest Reply

Kaniz
Community Manager

08-31-2023 12:44:04 PM

0 kudos

Hi @Nasreddin, MLflow is compatible with sklearn Pipeline with multiple steps. The error you're encountering, "This ColumnTransformer instance is not fitted yet. Call’ fit’ with appropriate arguments before using this estimator." is likely because C...

0 kudos

08-31-2023 12:44:04 PM

1 More Replies

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Unable to authenticate against https://accounts.cloud.databricks.com as an account admin.

Resolved! Azure Databricks: Cannot create volumes or tables

AnalysisException: [ErrorClass=INVALID_PARAMETER_VALUE] Missing cloud file system scheme

Azure Shared Clusters - P4J Security Exception on non-whitelisted classes

Delta live table UC Kinesis: options overwriteschema, ignorechanges not supported for data sourc

Resolved! Delta Live Tables maintenance schedule

Resolved! DLT failure: ABFS does not allow files or directories to end with a dot

Resolved! DBFS and Local File System Doubts

Resolved! Change Databricks Community User Name

Resolved! Where in Hive Metastore can the s3 locations of Databricks tables be found?

spark streaming listener is lagging

Resolved! Protobuf deserialization in Databricks

Resolved! How to import a helper module that uses databricks specific modules (dbutils)

Error while creating table with Glue catalog

ColumnTransformer not fitted after sklearn Pipeline loaded from Mlflow

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...