cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16765131552
by Contributor III
  • 2721 Views
  • 1 replies
  • 0 kudos

Resolved! Cluster Log Partitioning

Customer wants to understand our strategy for breaking cluster logs into different partitions and files. They want to be able to ingest these logs into a tool that needs to understand this. They have indicated that the logs used to all be in one file...

  • 2721 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16765131552
Contributor III
  • 0 kudos

Log files are rolled over by time/size criteria.

  • 0 kudos
User16765131552
by Contributor III
  • 881 Views
  • 0 replies
  • 0 kudos

docs.databricks.com

Best practices for Databricks pools — Databricks DocumentationLearn best practices for configuring and using Databricks pools.https://docs.databricks.com/clusters/instance-pools/pool-best-practices.htmlBest practices for Azure Databricks pools - Azur...

  • 881 Views
  • 0 replies
  • 0 kudos
brickster_2018
by Databricks Employee
  • 2091 Views
  • 1 replies
  • 0 kudos
  • 2091 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Pre-emption is by default turned on Databricks cluster. Turning on or turning off pre-emption would make more sense on a high concurrency cluster. Pre-emption ensures that the job starting for resources gets a fair share of the resource available on ...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 2594 Views
  • 1 replies
  • 0 kudos

Resolved! How to uninstall libraries that are set to auto-install on all cluster - using REST API

I have a bunch of libraries that I want to uninstall. All of them are marked as auto-install.

  • 2594 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

1) Find the corresponding library definition from an existing cluster using "libraries/cluster-status?cluster_id=<cluster_id>".$ curl -X GET 'https://cust-success.cloud.databricks.com/api/2.0/libraries/cluster-status?cluster_id=1226-232931-cuffs129' ...

  • 0 kudos
User16765131552
by Contributor III
  • 2460 Views
  • 1 replies
  • 1 kudos

Resolved! Saving Files Location

If someone saves a flat file from a cell without specifying any location, where does it save?

screen_shot_2021-04-16_at_1.32.57_pm
  • 2460 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16765131552
Contributor III
  • 1 kudos

In this case they are writing to a directory on the driver.

  • 1 kudos
brickster_2018
by Databricks Employee
  • 3350 Views
  • 1 replies
  • 0 kudos

Resolved! Super slow SQL queries on an HC cluster

I have a high concurrency cluster where multiple users are running. However, I see the queries are running very slow. I did debug the logs and see more time is spent on the Spark driver. on the Spark UI, I do not see slowness.

  • 3350 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

It's possible the connectivity to hive metastore is causing the delay here. When there is a high degree of concurrency and contention for metastore access. Interactive clusters in DBR are configured to use up to 5 (spark.databricks.hive.metastore.cli...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1190 Views
  • 1 replies
  • 0 kudos

Resolved! versioning of delta table while writing from a structured streaming job

Does writing to a Delta table create a versioning for every micro-batch of stream

  • 1190 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Yes it is correct - Every commits to the delta create a version so definitely each micro batch create a version More Info -: https://databricks.com/blog/2019/02/04/introducing-delta-time-travel-for-large-scale-data-lakes.html

  • 0 kudos
User16826994223
by Honored Contributor III
  • 2328 Views
  • 1 replies
  • 1 kudos

spark data frame parquet vs delta : rows Doesn't match

I have data written in Delta on ADLS. As I understand the delta also internal file in parquet format but when Iread the file in different format I got different record countspark.read.parquet() or spark.read.format('delta').load()df = spark.read.for...

  • 2328 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

I think you have written in delta twice using overwrite mode .But Delta is versioned data format - when you use overwrite, it doesn't delete previous data, it just writes new files, and don't delete files immediately - they are just marked as delete...

  • 1 kudos
MoJaMa
by Databricks Employee
  • 8394 Views
  • 3 replies
  • 1 kudos
  • 8394 Views
  • 3 replies
  • 1 kudos
Latest Reply
User16783853906
Contributor III
  • 1 kudos

Databricks does not charge DBUs while instances are idle in the pool. Instance provider billing does apply.Please refer here for more information - https://docs.databricks.com/clusters/instance-pools/index.html

  • 1 kudos
2 More Replies
brickster_2018
by Databricks Employee
  • 4836 Views
  • 1 replies
  • 0 kudos

Resolved! Best ways to copy the parquet files in the staging directory to the Delta table

I have some parquet data in a temporary directory. Can I copy them into the delta table directly, what are the best options.

  • 4836 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

The easiest solution is to use the COPY INTO command. The COPY INTO command ensures idempotency, so even if the operation fails there are no data inconsistencies. COPY INTO command utilizes the resources on the Spark cluster hence completes faster. h...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 2706 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to download the files from the notebook UI

I used to download the SQL query output from the Notebook UI. but right now I am unable to download files now

  • 2706 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

This is a workspace-level configuration. Probably your workspace admin disabled it. If you have admin privilege on your workspace you can enable it from the Admin Console -> Workspace Settings

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1721 Views
  • 1 replies
  • 0 kudos

MSCK REPAIR TABLE doesn't work in delta

I have a delta table in adls and for the same table, I have defined an external table in hive After creating the hive table and generating manifests, I am loading the partitions using MSCK REPAIR TABLE. All the partition columns are in same But s...

  • 1721 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Can you please check partition column order, does it in same sequence as before or it has changed

  • 0 kudos
brickster_2018
by Databricks Employee
  • 4135 Views
  • 1 replies
  • 0 kudos

Resolved! I can't find my cluster

I had a cluster that I used in the past. I do not see the cluster any longer. I checked with the admin and my team and everyone confirmed that there no user deletion. 

  • 4135 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

If the cluster is unsued for 30 days, Databricks removes the cluster. This is a general clean-up policy. It's possible to whitelist a cluster from this clean-up by Pinning the cluster. https://docs.databricks.com/clusters/clusters-manage.html#pin-a-c...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 2497 Views
  • 1 replies
  • 0 kudos

Resolved! Delta adds a new partition making the old partition unreadable

  In Notebook, My code read and write the data to delta , My delta is partitioned by calendar_date. After the initial load i am able to read the delta file and look the data just fine.But after the second load for data for 6 month , the previous part...

  • 2497 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

I think you are writing the data in override mode. what happens in delta is it doesn't delete the data for certain days even it is written by overwrite mode for versioning , and you will be able to query only most recent data,But in format parque...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels