cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16826992666
by Valued Contributor
  • 1655 Views
  • 1 replies
  • 1 kudos

Resolved! How long does the automatic notebook Revision History store the changes?

I am wondering how far back I can restore old versions of my notebook.

  • 1655 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16137833804
New Contributor III
  • 1 kudos

I believe it stores since the beginning of the creation of the notebook assuming the revision history doesn't get cleared.

  • 1 kudos
User16826989884
by New Contributor
  • 1165 Views
  • 1 replies
  • 0 kudos

Chargeback in Azure Databricks

What is the best way to monitor consumption and cost in Azure Databricks? Ultimate goal is to allocate consumption by team/workspace

  • 1165 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

If your goal is to charge back other teams or business units based on consumption then you should enforce tags on all clusters/compute. These tags will show up on your Azure bill which you can then identify which groups used which resources.

  • 0 kudos
sajith_appukutt
by Honored Contributor II
  • 1786 Views
  • 1 replies
  • 0 kudos
  • 1786 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

If you are using pools then you should consider keeping a min idle count of machines greater than 2. This will allow you to have machines available and ready to use. If you have 0 machines on idle then the first job executed against the pool will hav...

  • 0 kudos
User16826992783
by New Contributor II
  • 1189 Views
  • 1 replies
  • 1 kudos

Receiving a "Databricks Delta is not enabled on your account" error

The team is using Databricks Light for some pipeline development and would like to leverage Delta but are running into this error? "Databricks Delta is not enabled on your account"How can we enable Delta for our account

  • 1189 Views
  • 1 replies
  • 1 kudos
Latest Reply
craig_ng
New Contributor III
  • 1 kudos

Databricks Light is the open source Apache Spark runtime and does not come with any type of client for Delta Lake pre-installed. You'll need to manually install open source Delta Lake in order to do any reads or writes.See our docs and release notes ...

  • 1 kudos
Anonymous
by Not applicable
  • 1233 Views
  • 1 replies
  • 0 kudos
  • 1233 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

Delta Lake uses optimistic concurrency control to provide transactional guarantees between writes. Under this mechanism, writes operate in three stages:Read: Reads (if needed) the latest available version of the table to identify which files need to ...

  • 0 kudos
Anonymous
by Not applicable
  • 846 Views
  • 1 replies
  • 0 kudos

Resolved! Is it possible to have time travel capability but also be able to selectively vacuum ?

I would like to have time travel functionality for several months but that ends up adding up storage costs. Is there some way to have mix of vacuum and time travel?

  • 846 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

There is not a way to time travel past the vacuum retention period. If you would like to time travel back lets say 3 months then you are not able to vacuum a shorter time frame.

  • 0 kudos
User16752241457
by New Contributor II
  • 813 Views
  • 1 replies
  • 0 kudos

Overwriting Delta Table Using SQL

I have a delta table that is updated nightly, that I drop and recreate at the start of each day. However, this isn't ideal because every time I drop the table I lose all the info in the transaction log. Is there a way that I can do the equivalent of:...

  • 813 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

I think you are looking for the INSERT OVERWRITE command in Spark SQL. Check out the documentation here: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-dml-insert-overwrite-table.html

  • 0 kudos
User16826992666
by Valued Contributor
  • 1002 Views
  • 2 replies
  • 0 kudos

Can I query tables I have created in my Databricks workspace using Tableau?

I have created Delta tables in my Databricks workspace and would like to access them from Tableau. Is this possible?

  • 1002 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Yeah - here is the link with details on how to integrate with Tabelu's different products https://docs.databricks.com/integrations/bi/tableau.html

  • 0 kudos
1 More Replies
User16826987838
by Contributor
  • 1097 Views
  • 1 replies
  • 0 kudos

Is there a way to change the default cluster setting after a notebook has been created?

When you create a notebook, you are prompted to specify a default cluster that it will connect to. Is there a way to change that setting after the notebook is created?

  • 1097 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 0 kudos

Yes of course, notebooks are not exclusively tied to a specific cluster so you can pick any available/visible cluster to attach the notebook on when you want to run it.Also please keep in mind that by doing this half-way through executing a notebook,...

  • 0 kudos
User16826992783
by New Contributor II
  • 1709 Views
  • 1 replies
  • 0 kudos

Find Databricks SQL endpoints runtime

Is there a way to find out which runtime SQL endpoints are running?

  • 1709 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

In the UI, Databricks will list the running endpoints on top. Programmatically you can get information about the endpoints using the REST APIs. You will likely need to use a combo of the list endpoint to get all the endpoints. The for each endpoint u...

  • 0 kudos
Kaniz_Fatma
by Community Manager
  • 5386 Views
  • 1 replies
  • 0 kudos
  • 5386 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

Repartition triggers a full shuffle of data and distributes the data evenly over the number of partitions and can be used to increase and decrease the partition count. Coalesce is typically used for reducing the number of partitions and does not requ...

  • 0 kudos
craig_ng
by New Contributor III
  • 939 Views
  • 1 replies
  • 1 kudos
  • 939 Views
  • 1 replies
  • 1 kudos
Latest Reply
craig_ng
New Contributor III
  • 1 kudos

Yes, you can use the SCIM API integration to provision both users and groups. We have examples for Okta, Azure AD and OneLogin, but any SCIM-enabled IdP should suffice.

  • 1 kudos
sajith_appukutt
by Honored Contributor II
  • 1193 Views
  • 1 replies
  • 0 kudos

Resolved! Can I schedule Databricks pools to have different minimum idle instance counts at different times of the day

I have few jobs configured to run against a pool at 10 PM every night. After running some tests, I found that increasing minimum idle instance counts improves the job latencies. However, It wouldn't be needed to have so many VMs idle at other times...

  • 1193 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

Yes you can do so programmatically using the REST APIs. You can edit the settings of a Databricks Pool by using the Instance Pool Edit endpoint and provide the min idle that you desire. This cannot be done via the web UI.

  • 0 kudos
User16826992666
by Valued Contributor
  • 1277 Views
  • 1 replies
  • 0 kudos

Resolved! When should I turn on multi-cluster load balancing on SQL Endpoints?

I see the option to enable multi-cluster load balancing when creating a SQL Endpoint, but I don't know if I should be using it or not. How do I know when I should enable it?

  • 1277 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

It is best to enable multi-cluster load balance on sql endpoints when a lot of users will be running queries concurrently. Load balancing will help isolate the queries and ensure the best performance for all users. If you only have a few users runnin...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels