Data Engineering

Forum Posts

Sorted by:

Start a conversation

by MoJaMa • Valued Contributor II

06-25-2021 1:32:16 PM

478 Views
1 replies
0 kudos

Can my job have more than 1 owner? What if the original owner leaves the company?

Data Engineering

478 Views
1 replies
0 kudos

06-25-2021 1:32:16 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-25-2021 1:33:32 PM

0 kudos

We still require a single user to be an owner. But you can set a group to have CAN_MANAGE which unblocks most of the necessary updates. It is released in all Premium workspaces that have Jobs ACLs. The official OWNER is whose identity is used to crea...

0 kudos

06-25-2021 1:33:32 PM

by Srikanth_Gupta_ • Valued Contributor

06-25-2021 1:29:00 PM

520 Views
0 replies
0 kudos

How do we maintain count reconciliation for streaming application (kafka -> bronze -> silver) capturing success and errored record counts

Data Engineering

520 Views
0 replies
0 kudos

06-25-2021 1:29:00 PM

by MoJaMa • Valued Contributor II

06-25-2021 1:19:32 PM

631 Views
1 replies
1 kudos

Does Databricks support SSH at all for Repos? We only see HTTPS in the UI.

Data Engineering

631 Views
1 replies
1 kudos

06-25-2021 1:19:32 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-25-2021 1:21:20 PM

1 kudos

Only HTTPS is supported right now.If SSH is required for your use case, please let your Databricks Rep know and reference the Idea DB-I-3697 so that it can be prioritized.

1 kudos

06-25-2021 1:21:20 PM

by User16868770416 • Contributor

06-25-2021 1:20:34 PM

431 Views
0 replies
0 kudos

Is it possible to query an RDS Logical Replication from Databricks?

Data Engineering

431 Views
0 replies
0 kudos

06-25-2021 1:20:34 PM

by MoJaMa • Valued Contributor II

06-25-2021 1:17:55 PM

551 Views
1 replies
0 kudos

In the Repos GIT URL Allow List, it specifically says ‘Allow list’ restricts the Git repository users to be able to commit and push. But does not mention about adding the Repo. Why?

Data Engineering

551 Views
1 replies
0 kudos

06-25-2021 1:17:55 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-25-2021 1:18:25 PM

0 kudos

You can clone any repo, the security concern is usually around proprietary code exfiltration, whether intentional or accidental.

0 kudos

06-25-2021 1:18:25 PM

by MoJaMa • Valued Contributor II

06-25-2021 1:08:20 PM

663 Views
1 replies
0 kudos

Any plans to support Scala in the feature store API?

Data Engineering

663 Views
1 replies
0 kudos

06-25-2021 1:08:20 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-25-2021 1:09:57 PM

0 kudos

It's not currently on the roadmap but please create an Idea and Product team could consider it, based on demand.https://ideas.databricks.com/

0 kudos

06-25-2021 1:09:57 PM

by MoJaMa • Valued Contributor II

06-25-2021 1:04:33 PM

744 Views
1 replies
0 kudos

What is the timeline for ability to delete feature tables and/or features?

Data Engineering

744 Views
1 replies
0 kudos

06-25-2021 1:04:33 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-25-2021 1:05:55 PM

0 kudos

Feature table deletion is a potentially dangerous operation, since downstream consumers of feature tables (models, online stores, jobs, etc) may break due to the deletion. We might support a safe way to do this in future. In the meanwhile, we may be ...

0 kudos

06-25-2021 1:05:55 PM

by User15787040559 • New Contributor III

06-09-2021 6:07:05 PM

592 Views
1 replies
1 kudos

How do you find out if the REST API calls are logged anywhere when you update an IP Access List?

In the example response at https://docs.databricks.com/security/network/ip-access-list.html{ "ip_access_list": { "list_id": "<list-id>", "label": "office", "ip_addresses": [ "1.1.1.1", "2.2.2.2/21" ], "address_co...

Data Engineering

592 Views
1 replies
1 kudos

06-09-2021 6:07:05 PM

View Replies

Latest Reply

User16752239289
Valued Contributor

06-25-2021 1:05:12 PM

1 kudos

The workspace audit logs should provide all workspace conf change logs. You can check service accountsManager and action name createWorkspaceConfiguration or updateWorkspaceConfiguration.

1 kudos

06-25-2021 1:05:12 PM

by User16826987838 • Contributor

06-25-2021 1:04:47 PM

409 Views
0 replies
0 kudos

is there a way to force a struct column into string column when reading xml?

Data Engineering

409 Views
0 replies
0 kudos

06-25-2021 1:04:47 PM

by User16826987838 • Contributor

06-25-2021 12:57:52 PM

407 Views
0 replies
0 kudos

is there a way to remove a structfield from schema by name instead of index?

Data Engineering

407 Views
0 replies
0 kudos

06-25-2021 12:57:52 PM

by User16869510359 • Esteemed Contributor

06-25-2021 12:54:34 PM

742 Views
1 replies
0 kudos

Resolved! DB Connect giving different results

The below code gives different result when executed using DB Connect and a Notebooksc = spark.sparkContext a = sc.accumulator(0) rdd = sc.parallelize([1, 2, 3]) def f(x): global a a.add(x) rdd.foreach(f) rdd.count() print(a.value)

Data Engineering

742 Views
1 replies
0 kudos

06-25-2021 12:54:34 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 12:55:08 PM

0 kudos

This is a known limitation that accumulators do not work with DB Connect.

0 kudos

06-25-2021 12:55:08 PM

by User16869510359 • Esteemed Contributor

06-25-2021 12:50:51 PM

924 Views
1 replies
0 kudos

Resolved! Auto-scaling not getting kicked in

I have a spark-submit job, I do not see auto-scaling happening on the cluster at the time of executions.

Data Engineering

924 Views
1 replies
0 kudos

06-25-2021 12:50:51 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 12:51:18 PM

0 kudos

This is working as expected. Autoscaling is not available for spark-submit jobsRun the job as jar job instead of spark-submit

0 kudos

06-25-2021 12:51:18 PM

by User16826987838 • Contributor

06-25-2021 12:51:00 PM

400 Views
0 replies
0 kudos

Is there a cap/ceiling on string length by default for Spark/Delta?

I am ingesting change data from S3 using autoloader jobs. We have some very long string fields in the data. Does spark/delta cap the string length by default?

Data Engineering

400 Views
0 replies
0 kudos

06-25-2021 12:51:00 PM

by User16869510359 • Esteemed Contributor

06-25-2021 12:29:14 PM

1018 Views
1 replies
0 kudos

Resolved! Delta metadata caching

I understand the Delta caching for the data files. Do we have anything similar for the metadata files. Will the delta metadata get cached in the Delta caching

Data Engineering

1018 Views
1 replies
0 kudos

06-25-2021 12:29:14 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 12:31:46 PM

0 kudos

The Delta logs - JSON files will be cached on the Driver (in memory) for Delta if they are small enough (<10 MB). They are not stored in the Delta cache. Before every query Delta checks if the snapshot is stale or has to be re-built.

0 kudos

06-25-2021 12:31:46 PM

by User16826992666 • Valued Contributor

06-25-2021 11:08:26 AM

680 Views
1 replies
0 kudos

Resolved! If I create a Feature Store, how is the underlying data actually saved?

And do I have any control over where and how it's saved?

Data Engineering

680 Views
1 replies
0 kudos

06-25-2021 11:08:26 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-25-2021 12:28:42 PM

0 kudos

The offline store is backed by Delta tables . In AWS we support Amazon Aurora (MySQL-compatible) & Amazon RDS MySQL and in Azure we support Azure Database for MySQL and Azure SQL Database as as online stores https://docs.microsoft.com/en-us/azure/d...

0 kudos

06-25-2021 12:28:42 PM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Can my job have more than 1 owner? What if the original owner leaves the company?

How do we maintain count reconciliation for streaming application (kafka -> bronze -> silver) capturing success and errored record counts

Does Databricks support SSH at all for Repos? We only see HTTPS in the UI.

Is it possible to query an RDS Logical Replication from Databricks?

In the Repos GIT URL Allow List, it specifically says ‘Allow list’ restricts the Git repository users to be able to commit and push. But does not mention about adding the Repo. Why?

Any plans to support Scala in the feature store API?

What is the timeline for ability to delete feature tables and/or features?

How do you find out if the REST API calls are logged anywhere when you update an IP Access List?

is there a way to force a struct column into string column when reading xml?

is there a way to remove a structfield from schema by name instead of index?

Resolved! DB Connect giving different results

Resolved! Auto-scaling not getting kicked in

Is there a cap/ceiling on string length by default for Spark/Delta?

Resolved! Delta metadata caching

Resolved! If I create a Feature Store, how is the underlying data actually saved?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...