Data Engineering

Forum Posts

Sorted by:

by User16869510359 • Esteemed Contributor

06-25-2021 10:15:28 AM

453 Views
1 replies
0 kudos

Resolved! Additional permission for Delta compared to Parquet

I am trying to create a Delta table and it seems the Delta table requires additional permissions on the parent folder of the table. The command failed telling permission errors. I tried to create a parquet table and it works fine.

Data Engineering

453 Views
1 replies
0 kudos

06-25-2021 10:15:28 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 10:15:47 AM

0 kudos

Delta table is the non-hive compatible format. So there must also be permissions for a client to access the path to the database’s location so that it can create a new temporary “directory” there. This comes from Spark SQL’s handling of external tabl...

0 kudos

06-25-2021 10:15:47 AM

by User16869510359 • Esteemed Contributor

06-25-2021 10:09:52 AM

918 Views
1 replies
0 kudos

Resolved! How many active connections are made to Hive metastore

We are using an internal metastore implementation. ie the metastore is hosted at the Dataricks side. However, we believe the metastore instance made available for my workspace is not adequate enough to handle the load. How can I monitor the number of...

Data Engineering

918 Views
1 replies
0 kudos

06-25-2021 10:09:52 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 10:10:36 AM

0 kudos

Use the below code snippet from a notebook%scala import java.sql.Connection import java.sql.DriverManager import java.sql.ResultSet import java.sql.SQLException /** * For details on what this query means, checkout https://dev.mysql.com/doc/refma...

0 kudos

06-25-2021 10:10:36 AM

by User16826992666 • Valued Contributor

06-25-2021 9:27:49 AM

730 Views
1 replies
1 kudos

Resolved! Does Databricks integrate with Immuta?

My company uses Immuta for data governance. Will Databricks be able to fit into our existing security patterns?

Data Engineering

730 Views
1 replies
1 kudos

06-25-2021 9:27:49 AM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

06-25-2021 10:02:09 AM

1 kudos

Yes, check out the immuta web page on the Databricks Integration. https://www.immuta.com/integrations/databricks

1 kudos

06-25-2021 10:02:09 AM

by User16869510359 • Esteemed Contributor

06-25-2021 9:59:31 AM

2060 Views
1 replies
0 kudos

Resolved! Compute cost for the SQL Analytics query

Is there a way to get some kind of compute the cost associated with every SQL analytics query?

Data Engineering

2060 Views
1 replies
0 kudos

06-25-2021 9:59:31 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 9:59:58 AM

0 kudos

Right now, we do not have an option to measure the compute cost at a query level.

0 kudos

06-25-2021 9:59:58 AM

by User16765131552 • Contributor III

06-25-2021 9:59:52 AM

224 Views
0 replies
0 kudos

docs.databricks.com

Best practices: Cluster configuration | Databricks on AWSLearn best practices when creating and configuring Databricks clusters.https://docs.databricks.com/clusters/cluster-config-best-practices.html

Data Engineering

224 Views
0 replies
0 kudos

06-25-2021 9:59:52 AM

by User16765131552 • Contributor III

06-25-2021 9:59:09 AM

242 Views
0 replies
0 kudos

docs.gcp.databricks.com

Best practices | Databricks on Google CloudLearn best practices when using or administering Databricks.https://docs.gcp.databricks.com/best-practices-index.html

Data Engineering

242 Views
0 replies
0 kudos

06-25-2021 9:59:09 AM

by User16765131552 • Contributor III

06-25-2021 9:58:26 AM

220 Views
0 replies
0 kudos

docs.microsoft.com

Best practices - Azure DatabricksLearn best practices when using or administering Azure Databricks.https://docs.microsoft.com/en-us/azure/databricks/best-practices-index

Data Engineering

220 Views
0 replies
0 kudos

06-25-2021 9:58:26 AM

by User16765131552 • Contributor III

06-25-2021 9:57:35 AM

274 Views
0 replies
0 kudos

docs.databricks.com

Best practices | Databricks on AWSLearn best practices when using or administering Databricks.https://docs.databricks.com/best-practices-index.html

Data Engineering

274 Views
0 replies
0 kudos

06-25-2021 9:57:35 AM

by User16826994223 • Honored Contributor III

06-25-2021 9:53:40 AM

1886 Views
1 replies
0 kudos

Resolved! How to prevent Delta Lake checkpoints to be removed in Databricks?

I am seeing with new commits the old checkpoints are getting removed and i can time travel only last 10 versions , Is there any way I can prevent it so that delat checkpoints are not removed I'm using Azure Databricks 7.3 LTS ML.

Data Engineering

1886 Views
1 replies
0 kudos

06-25-2021 9:53:40 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 9:54:57 AM

0 kudos

If you want to keep your checkpoints X days, you can set delta.checkpointRetentionDuration to X days this way:spark.sql(f""" ALTER TABLE delta.`path` SET TBLPROPERTIES ( delta.checkpointRetentionDuration = 'X days'...

0 kudos

06-25-2021 9:54:57 AM

by User16869510359 • Esteemed Contributor

06-25-2021 9:49:57 AM

716 Views
1 replies
0 kudos

Resolved! How to track the progress of a VACUUM command.

My VACCUM command is stuck. I am not sure if it's deleting any files.

Data Engineering

716 Views
1 replies
0 kudos

06-25-2021 9:49:57 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 9:50:35 AM

0 kudos

There is no direct way to track the progress of the VACUUM command. One easy workaround is to run a DRY RUN from another notebook which will give the estimate of files to be deleted at that point in time. This will give a rough estimate of files to b...

0 kudos

06-25-2021 9:50:35 AM

by User16869510359 • Esteemed Contributor

06-25-2021 9:47:04 AM

1489 Views
1 replies
0 kudos

Resolved! Can auto-loader process the overwritten file

I have a directory where I get files with the same multiple times. Will Auto-loader process all the files or will it process the first and ignore the rest

Data Engineering

1489 Views
1 replies
0 kudos

06-25-2021 9:47:04 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 9:47:20 AM

0 kudos

Autoloader has an option - "cloudFiles. allowOverwrites". This determines whether to allow input directory file changes to overwrite existing data. This option is available in Databricks Runtime 7.6 and above.

0 kudos

06-25-2021 9:47:20 AM

by User16826994223 • Honored Contributor III

06-25-2021 9:29:45 AM

1605 Views
1 replies
0 kudos

Truncate delta table in Databricks

I cannot find how to truncate table using pyspark or python commnd , I need to truncate delta table using python

Data Engineering

1605 Views
1 replies
0 kudos

06-25-2021 9:29:45 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 9:31:43 AM

0 kudos

Not everything is exposed as a function for Python or Java/Scala. Some operations are SQL-only, like spark.sql("TRUNCATE TABLE delta.`<path>`")

0 kudos

06-25-2021 9:31:43 AM

by User16826987838 • Contributor

06-25-2021 9:31:38 AM

502 Views
0 replies
0 kudos

How do I toggle between reading encrypted and writing unencrypted

If we want to read from a kms encrypted s3 bucket, but write out unencrypted, Do we use the global init script?I am wondering how to “toggle” btw reading encrypted, and writing unencrypted

Data Engineering

502 Views
0 replies
0 kudos

06-25-2021 9:31:38 AM

by User16826994223 • Honored Contributor III

06-25-2021 9:24:13 AM

523 Views
1 replies
0 kudos

Prevent Duplicate Entries to enter to delta lake Storage

I have a data frame and I write this data frame to adls table, next day I get an updated data frame which has some records from the past also and i want to update the delta table without creating duplicate

Data Engineering

523 Views
1 replies
0 kudos

06-25-2021 9:24:13 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 9:25:07 AM

0 kudos

This is a task for Merge command - you define condition for merge (your unique column) and then actions.MERGE INTO target USING src ON target.column = source.column WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *could be your dataf...

0 kudos

06-25-2021 9:25:07 AM

by User16869510359 • Esteemed Contributor

06-25-2021 9:23:28 AM

751 Views
1 replies
0 kudos

Resolved! When should I run the FSCK REPAIR command on my Delta table

Is it a good practice to run the FSCK REPAIR command on a regular basis? I have Optimize and VACUUM commands scheduled to run every day.

Data Engineering

751 Views
1 replies
0 kudos

06-25-2021 9:23:28 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 9:23:53 AM

0 kudos

Unlike OPTIMIZE and VACUUM, FSCK REPAIR is not an operational command that has to be executed on a regular basis. FSCK REPAIR is useful to repair the Delta metadata and remove the reference of the files from the metadata that are no longer accessible...

0 kudos

06-25-2021 9:23:53 AM

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Resolved! Additional permission for Delta compared to Parquet

Resolved! How many active connections are made to Hive metastore

Resolved! Does Databricks integrate with Immuta?

Resolved! Compute cost for the SQL Analytics query

docs.databricks.com

docs.gcp.databricks.com

docs.microsoft.com

docs.databricks.com

Resolved! How to prevent Delta Lake checkpoints to be removed in Databricks?

Resolved! How to track the progress of a VACUUM command.

Resolved! Can auto-loader process the overwritten file

Truncate delta table in Databricks

How do I toggle between reading encrypted and writing unencrypted

Prevent Duplicate Entries to enter to delta lake Storage

Resolved! When should I run the FSCK REPAIR command on my Delta table

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...