cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Honored Contributor III
  • 1209 Views
  • 1 replies
  • 0 kudos

How to provide access to user bases on the login credentials databricks

Hi Team I am trying to do security audit and its become tough to manage so many credentials and IAM role we have in databricks Different clusters, Is it possible that I simplify it , like a user who has type of access in s3 bucket get same type of...

  • 1209 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

This is a great question and Databricks is working continuously working on management of security , to make user experience better and simple.The use case you are trying to solve will be easily solved using high concurrency cluster and checkin...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1184 Views
  • 1 replies
  • 1 kudos
  • 1184 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

Access control: Rich suite of access control all the way down to the storage layer. Databricks can take advantage of its cloud backbone by utilizing state-of-the-art AWS security services right in the platform. Federate your existing AWS data access ...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1466 Views
  • 1 replies
  • 0 kudos

spark is reading data from source even I am persisting the data

hI allI am reading data and I am caching the data and then I am performing Action Count to get the data in memory, but still, in dag I found out that every time it reads data from SOURCE.

  • 1466 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

It looks like the the spark memory is not sufficient to cache all the data so it read always from source

  • 0 kudos
User16826994223
by Honored Contributor III
  • 3580 Views
  • 1 replies
  • 0 kudos

Resolved! Change in spark code if I migrate from spark 2.4 to 3.0

I am thinking of migrating the spark 2.4 to 3.0, what should I know to take care of changes thet I need to look at while migrating

  • 3580 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

I see there are many changes that need to take care of if you have used coding in spark 2.4 the changes are in Data set API StatementBuiltin UDF and functionsMore you can get from spark Documentation https://spark.apache.org/docs/latest/sql-migrat...

  • 0 kudos
User16826987838
by Contributor
  • 1628 Views
  • 1 replies
  • 0 kudos

What type of aws instance and how many are used for an L sized Databricks SQL(SQLA) cluster ?

What type of aws instance and how many are used for an L sized Databricks SQL(SQLA) cluster with Photon enabled

  • 1628 Views
  • 1 replies
  • 0 kudos
Latest Reply
Taha
Databricks Employee
  • 0 kudos

L size is 16 workers of i3.8xlarge

  • 0 kudos
User16826987838
by Contributor
  • 1991 Views
  • 1 replies
  • 0 kudos

Refreshing external tables

After I vacuum the tables, do i need to update the manifest table and parquet table to refresh my external tables for integrations to work?

  • 1991 Views
  • 1 replies
  • 0 kudos
Latest Reply
Taha
Databricks Employee
  • 0 kudos

Manifest files need to be re-created when partitions are added or altered. Since a VACUUM only deletes all historical versions, you shouldn't need to create an updated manifest file unless you are also running an OPTIMIZE.

  • 0 kudos
brickster_2018
by Databricks Employee
  • 3912 Views
  • 1 replies
  • 1 kudos
  • 3912 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

 G1GC can solve problems in some cases where garbage collection is a bottleneck. checkout https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

  • 1 kudos
User16790091296
by Contributor II
  • 2330 Views
  • 1 replies
  • 0 kudos
  • 2330 Views
  • 1 replies
  • 0 kudos
Latest Reply
Taha
Databricks Employee
  • 0 kudos

As of this comment, SQL analytics still requires a few additional enablement steps. You will need to ask your Databricks account team to help turn this on in your workspace.

  • 0 kudos
brickster_2018
by Databricks Employee
  • 2161 Views
  • 1 replies
  • 0 kudos
  • 2161 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

You could potentially do this through a Global Init Script - https://docs.databricks.com/clusters/init-scripts.html

  • 0 kudos
User16790091296
by Contributor II
  • 4576 Views
  • 3 replies
  • 0 kudos
  • 4576 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 0 kudos

Dy doing a `GET` call using the cluster idcurl --netrc -X GET \ https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/get \ --data '{ "cluster_id": "1234-567890-myclustID" }' \ | jq .The response json will have a `state` tag which will look...

  • 0 kudos
2 More Replies
User16826992666
by Valued Contributor
  • 2834 Views
  • 1 replies
  • 0 kudos

Can I move some partitions of a Delta table to a different location?

I am partitioning my Delta table by date. Older data is rarely accessed, so I am wondering if I can move some of the files off to colder storage options. What would happen if I did this? Is this a supported pattern or would it break the table?

  • 2834 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

You could look at S3 Intelligent-Tiering - https://aws.amazon.com/about-aws/whats-new/2018/11/s3-intelligent-tiering/

  • 0 kudos
brickster_2018
by Databricks Employee
  • 2682 Views
  • 1 replies
  • 0 kudos
  • 2682 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Deleting the Delta log directory would cause you to lose the underlying transaction history on the delta table and other delta related optimizations. In effect the table would be converted to a Parquet table at that point

  • 0 kudos
User16790091296
by Contributor II
  • 1906 Views
  • 2 replies
  • 0 kudos
  • 1906 Views
  • 2 replies
  • 0 kudos
Latest Reply
Taha
Databricks Employee
  • 0 kudos

Also, a lot of examples here: https://docs.databricks.com/administration-guide/clusters/policies.html#cluster-policy-examples

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 4736 Views
  • 1 replies
  • 0 kudos
  • 4736 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Global: run on every cluster in the workspace. They can help you to enforce consistent cluster configurations across your workspace. Use them carefully because they can cause unanticipated impacts, like library conflicts. Only admin users can create ...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels