- 706 Views
- 2 replies
- 0 kudos
We dump our logs in S3 currently. Can you give us best practices to make these logs easier to query?
- 706 Views
- 2 replies
- 0 kudos
Latest Reply
And if it is generic logs which gets landed on S3 , it'd be worth taking a look at Autoloader. Here is a blog post on processing crowdstrike logs in a similar way
1 More Replies
- 2537 Views
- 1 replies
- 0 kudos
What is the recommended way to backfill a delta table using a series of smaller date partitioned jobs?
- 2537 Views
- 1 replies
- 0 kudos
Latest Reply
Another approach you might consider is creating a template notebook to query a known date range with widgets. For example, two date widgets, start time and end time. Then from there you could use Databricks Jobs to update these parameters for each ru...
- 522 Views
- 0 replies
- 1 kudos
After I enable one of my Delta tables to use Change Data Feed, does it record all previous changes to my table?
- 522 Views
- 0 replies
- 1 kudos
- 425 Views
- 0 replies
- 5 kudos
Some Tips & Tricks for Optimizing costs and performance (Clusters and Ganglia):[Note: This list is not exhaustive]Leverage the DataFrame or SparkSQL API’s first. They use the same execution process resulting in parity in performance but they also com...
- 425 Views
- 0 replies
- 5 kudos
- 1793 Views
- 1 replies
- 0 kudos
When does it make sense to use Delta over parquet? Are there any instances when you would rather use parquet?
- 1793 Views
- 1 replies
- 0 kudos
Latest Reply
Users should almost always choose Delta over parquet. Keep in mind that delta is a storage format that sits on top of parquet so the performance of writing to both formats is similar. However, reading data and transforming data with delta is almost a...
- 605 Views
- 1 replies
- 0 kudos
When and why should I convert b/w a Pandas to Koalas dataframe? What are the implications?
- 605 Views
- 1 replies
- 0 kudos
Latest Reply
Koalas is distributed on a Databricks cluster similar to how Spark dataframes are also distributed. Pandas dataframes only live on the spark driver in memory. If you are a pandas user and are using a multi-node cluster then you should use koalas to p...
- 560 Views
- 0 replies
- 0 kudos
I’m using the databricks-snowflake connector to load data into a Snowflake table. Can someone point me to any example of how we can append only a subset of columns to a target Snowflake table (for example some columns in the target snowflake table ar...
- 560 Views
- 0 replies
- 0 kudos
- 523 Views
- 0 replies
- 0 kudos
We have a user notebook in R that reliably crashes the driver. Are detailed logs from the R process stored somewhere on drivers/workers?
- 523 Views
- 0 replies
- 0 kudos
- 403 Views
- 0 replies
- 0 kudos
How are index columns handled in Koalas? What about multi-level indices?
- 403 Views
- 0 replies
- 0 kudos
- 1286 Views
- 0 replies
- 0 kudos
I know that I can do a DESCRIBE DETAIL on a table to get current delta table version details. If I want to get these same details on a previous version, how can I do that?
- 1286 Views
- 0 replies
- 0 kudos
- 1618 Views
- 1 replies
- 0 kudos
I have a function within a module in my git-repo. I want to import that to my DB notebook - how can I do that?
- 1618 Views
- 1 replies
- 0 kudos
Latest Reply
Databricks Repos allows you to sync your work in Databricks with a remote Git repository. This makes it easier to implement development best practices. Databricks supports integrations with GitHub, Bitbucket, and GitLab. Using Repos you can bring you...
- 437 Views
- 0 replies
- 0 kudos
I know I can disable Databricks PAT tokens from being used, but what about AAD tokens?
- 437 Views
- 0 replies
- 0 kudos