cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MoJaMa
by Databricks Employee
  • 2032 Views
  • 1 replies
  • 0 kudos
  • 2032 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

You can clone any repo, the security concern is usually around proprietary code exfiltration, whether intentional or accidental.

  • 0 kudos
MoJaMa
by Databricks Employee
  • 1548 Views
  • 1 replies
  • 0 kudos
  • 1548 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Feature table deletion is a potentially dangerous operation, since downstream consumers of feature tables (models, online stores, jobs, etc) may break due to the deletion. We might support a safe way to do this in future. In the meanwhile, we may be ...

  • 0 kudos
User15787040559
by Databricks Employee
  • 1801 Views
  • 1 replies
  • 1 kudos

How do you find out if the REST API calls are logged anywhere when you update an IP Access List?

In the example response at https://docs.databricks.com/security/network/ip-access-list.html{ "ip_access_list": { "list_id": "<list-id>", "label": "office", "ip_addresses": [ "1.1.1.1", "2.2.2.2/21" ], "address_co...

  • 1801 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Databricks Employee
  • 1 kudos

The workspace audit logs should provide all workspace conf change logs. You can check service accountsManager and action name createWorkspaceConfiguration or updateWorkspaceConfiguration.

  • 1 kudos
brickster_2018
by Databricks Employee
  • 1866 Views
  • 1 replies
  • 0 kudos

Resolved! DB Connect giving different results

The below code gives different result when executed using DB Connect and a Notebooksc = spark.sparkContext a = sc.accumulator(0) rdd = sc.parallelize([1, 2, 3]) def f(x): global a a.add(x) rdd.foreach(f) rdd.count() print(a.value)

  • 1866 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

This is a known limitation that accumulators do not work with DB Connect.

  • 0 kudos
brickster_2018
by Databricks Employee
  • 2180 Views
  • 1 replies
  • 0 kudos

Resolved! Auto-scaling not getting kicked in

I have a spark-submit job, I do not see auto-scaling happening on the cluster at the time of executions.

  • 2180 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

This is working as expected. Autoscaling is not available for spark-submit jobsRun the job as jar job instead of spark-submit

  • 0 kudos
brickster_2018
by Databricks Employee
  • 2782 Views
  • 1 replies
  • 0 kudos

Resolved! Delta metadata caching

I understand the Delta caching for the data files. Do we have anything similar for the metadata files. Will the delta metadata get cached in the Delta caching

  • 2782 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

The Delta logs - JSON files will be cached on the Driver (in memory) for Delta if they are small enough (<10 MB). They are not stored in the Delta cache. Before every query Delta checks if the snapshot is stale or has to be re-built.

  • 0 kudos
User16826992666
by Databricks Employee
  • 2246 Views
  • 1 replies
  • 0 kudos

Resolved! If I create a Feature Store, how is the underlying data actually saved?

And do I have any control over where and how it's saved?

  • 2246 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

The offline store is backed by Delta tables . In AWS we support Amazon Aurora (MySQL-compatible) & Amazon RDS MySQL and in Azure we support Azure Database for MySQL and Azure SQL Database as as online stores https://docs.microsoft.com/en-us/azure/d...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 2400 Views
  • 1 replies
  • 0 kudos

Resolved! Delta Streaming and Optimize

I have a master delta table that is continuously getting written by a streaming job. I have optimize writes enabled and in addition, I run the OPTIMIZE command every 3 hours. However, I think the downstream streaming jobs which are streaming the data...

  • 2400 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

This is working as expected. For Delta streaming, the data files created in the first place will be used for streaming. The optimized files are not considered the downstream streaming job. This is the reason it's not recommended to run VACUUM with f...

  • 0 kudos
User16826990884
by Databricks Employee
  • 4088 Views
  • 1 replies
  • 1 kudos

Impact on Databricks objects after a user is deleted

What happens to resources (notebooks, jobs, clusters etc.) owned by a user when a user is deleted? The underlying problem we are trying to solve is that we want to automatically delete users through SCIM when the user leaves the company so that the u...

  • 4088 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 1 kudos

When you remove a user from Databricks, a special backup folder is created in the workspace. This backup folder contains all of the deleted user’s content.W.r.t clusters/jobs, an admin can grant permission to other users.

  • 1 kudos
brickster_2018
by Databricks Employee
  • 3259 Views
  • 1 replies
  • 1 kudos

Resolved! How to run commands on the executor

Using %sh, I am able to run commands on the notebook and get output. How can i run a command on the executor and get the output. I want to avoid using the Spark API's

  • 3259 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

It's not possible to use %sh to run commands on the executor. The below code can be used to run commands on the executor and get the outputvar res=sc.runOnEachExecutor[String]({ () => import sys.process._ var cmd_Result=Seq("bash", "-c", "h...

  • 1 kudos
User16826990884
by Databricks Employee
  • 19799 Views
  • 1 replies
  • 1 kudos

Resolved! Views vs Materialized Delta Tables

Is there general guidance around using views vs creating Delta tables? For example, I need to do some filtering and make small tweaks to a few columns for use in another application. Is there a downside of using a view here?

  • 19799 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826990884
Databricks Employee
  • 1 kudos

Views won't duplicate the data so if you are just filtering columns or rows or making small tweaks then views might be a good option. Unless, of course, the filtering is really expensive or you are doing a lot of calculations, then materialize the vi...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels