cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

brickster_2018
by Esteemed Contributor
  • 3157 Views
  • 1 replies
  • 0 kudos

Resolved! Best ways to copy the parquet files in the staging directory to the Delta table

I have some parquet data in a temporary directory. Can I copy them into the delta table directly, what are the best options.

  • 3157 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The easiest solution is to use the COPY INTO command. The COPY INTO command ensures idempotency, so even if the operation fails there are no data inconsistencies. COPY INTO command utilizes the resources on the Spark cluster hence completes faster. h...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1379 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to download the files from the notebook UI

I used to download the SQL query output from the Notebook UI. but right now I am unable to download files now

  • 1379 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

This is a workspace-level configuration. Probably your workspace admin disabled it. If you have admin privilege on your workspace you can enable it from the Admin Console -> Workspace Settings

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1004 Views
  • 1 replies
  • 0 kudos

MSCK REPAIR TABLE doesn't work in delta

I have a delta table in adls and for the same table, I have defined an external table in hive After creating the hive table and generating manifests, I am loading the partitions using MSCK REPAIR TABLE. All the partition columns are in same But s...

  • 1004 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Can you please check partition column order, does it in same sequence as before or it has changed

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 2135 Views
  • 1 replies
  • 0 kudos

Resolved! I can't find my cluster

I had a cluster that I used in the past. I do not see the cluster any longer. I checked with the admin and my team and everyone confirmed that there no user deletion. 

  • 2135 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

If the cluster is unsued for 30 days, Databricks removes the cluster. This is a general clean-up policy. It's possible to whitelist a cluster from this clean-up by Pinning the cluster. https://docs.databricks.com/clusters/clusters-manage.html#pin-a-c...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1310 Views
  • 1 replies
  • 0 kudos

Resolved! Delta adds a new partition making the old partition unreadable

  In Notebook, My code read and write the data to delta , My delta is partitioned by calendar_date. After the initial load i am able to read the delta file and look the data just fine.But after the second load for data for 6 month , the previous part...

  • 1310 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

I think you are writing the data in override mode. what happens in delta is it doesn't delete the data for certain days even it is written by overwrite mode for versioning , and you will be able to query only most recent data,But in format parque...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 618 Views
  • 1 replies
  • 0 kudos

Resolved! Additional permission for Delta compared to Parquet

I am trying to create a Delta table and it seems the Delta table requires additional permissions on the parent folder of the table. The command failed telling permission errors. I tried to create a parquet table and it works fine. 

  • 618 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Delta table is the non-hive compatible format. So there must also be permissions for a client to access the path to the database’s location so that it can create a new temporary “directory” there. This comes from Spark SQL’s handling of external tabl...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1195 Views
  • 1 replies
  • 1 kudos

Resolved! How many active connections are made to Hive metastore

We are using an internal metastore implementation. ie the metastore is hosted at the Dataricks side. However, we believe the metastore instance made available for my workspace is not adequate enough to handle the load. How can I monitor the number of...

  • 1195 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 1 kudos

Use the below code snippet from a notebook%scala import java.sql.Connection import java.sql.DriverManager import java.sql.ResultSet import java.sql.SQLException   /** * For details on what this query means, checkout https://dev.mysql.com/doc/refma...

  • 1 kudos
User16826992666
by Valued Contributor
  • 1002 Views
  • 1 replies
  • 1 kudos

Resolved! Does Databricks integrate with Immuta?

My company uses Immuta for data governance. Will Databricks be able to fit into our existing security patterns?

  • 1002 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 1 kudos

Yes, check out the immuta web page on the Databricks Integration. https://www.immuta.com/integrations/databricks

  • 1 kudos
brickster_2018
by Esteemed Contributor
  • 2271 Views
  • 1 replies
  • 0 kudos

Resolved! Compute cost for the SQL Analytics query

Is there a way to get some kind of compute the cost associated with every SQL analytics query?

  • 2271 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Right now, we do not have an option to measure the compute cost at a query level.

  • 0 kudos
User16765131552
by Contributor III
  • 304 Views
  • 0 replies
  • 0 kudos

docs.databricks.com

Best practices: Cluster configuration | Databricks on AWSLearn best practices when creating and configuring Databricks clusters.https://docs.databricks.com/clusters/cluster-config-best-practices.html

  • 304 Views
  • 0 replies
  • 0 kudos
User16765131552
by Contributor III
  • 311 Views
  • 0 replies
  • 0 kudos

docs.gcp.databricks.com

Best practices | Databricks on Google CloudLearn best practices when using or administering Databricks.https://docs.gcp.databricks.com/best-practices-index.html

  • 311 Views
  • 0 replies
  • 0 kudos
User16765131552
by Contributor III
  • 290 Views
  • 0 replies
  • 0 kudos

docs.microsoft.com

Best practices - Azure DatabricksLearn best practices when using or administering Azure Databricks.https://docs.microsoft.com/en-us/azure/databricks/best-practices-index

  • 290 Views
  • 0 replies
  • 0 kudos
User16765131552
by Contributor III
  • 354 Views
  • 0 replies
  • 0 kudos

docs.databricks.com

Best practices | Databricks on AWSLearn best practices when using or administering Databricks.https://docs.databricks.com/best-practices-index.html

  • 354 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 2212 Views
  • 1 replies
  • 0 kudos

Resolved! How to prevent Delta Lake checkpoints to be removed in Databricks?

I am seeing with new commits the old checkpoints are getting removed and i can time travel only last 10 versions , Is there any way I can prevent it so that delat checkpoints are not removed I'm using Azure Databricks 7.3 LTS ML.

  • 2212 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

If you want to keep your checkpoints X days, you can set delta.checkpointRetentionDuration to X days this way:spark.sql(f""" ALTER TABLE delta.`path` SET TBLPROPERTIES ( delta.checkpointRetentionDuration = 'X days'...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 950 Views
  • 1 replies
  • 0 kudos

Resolved! How to track the progress of a VACUUM command.

My VACCUM command is stuck. I am not sure if it's deleting any files. 

  • 950 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

There is no direct way to track the progress of the VACUUM command. One easy workaround is to run a DRY RUN from another notebook which will give the estimate of files to be deleted at that point in time. This will give a rough estimate of files to b...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels