by
aladda
• Honored Contributor II
- 938 Views
- 1 replies
- 0 kudos
I’m running 3 separate dbt processes in parallel. all of them are reading data from different databrick databases, creating different staging tables by using dbt alias, but they all at the end update/insert to the same target table. the 3 processes r...
- 938 Views
- 1 replies
- 0 kudos
Latest Reply
You’re likely running into the issue described here and a solution to it as well. While Delta does support concurrent writers to separate partitions of a table, depending on your query structure join/filter/where in particular, there may still be a n...
- 4068 Views
- 1 replies
- 1 kudos
Can you please explain the difference between Jobs and Delta Live tables?
- 4068 Views
- 1 replies
- 1 kudos
Latest Reply
Jobs are designed for automated execution (scheduled or manually) of Databricks Notebooks, JARs, spark-submit jobs etc. Its essentially a generic framework to run any kind of Data Engg, Data Analysis or Data Science workload. Delta Live Tables on the...
- 9673 Views
- 1 replies
- 0 kudos
Is the best practice for tuning shuffle partitions to have the config "autoOptimizeShuffle.enabled" on? I see it is not switched on by default. Why is that?
- 9673 Views
- 1 replies
- 0 kudos
Latest Reply
AQE (enabled by default from 7.3 LTS + onwards) adjusts the shuffle partition number automatically at each stage of the query, based on the size of the map-side shuffle output. So as data size grows or shrinks over different stages, the task size wi...
- 1098 Views
- 1 replies
- 0 kudos
I see the default in the UI is to always create clusters in a single AZ (e.g. us-west-2a), but want to distribute workloads across all available AZs.
- 1098 Views
- 1 replies
- 0 kudos
Latest Reply
Found the answer - not available in the UI, but via API, you can submit the cluster definition with "aws_attributes": {
"zone_id": "auto"
},This is documented in the Cluster API: https://docs.databricks.com/dev-tools/api/latest/clusters.html#aw...
- 626 Views
- 1 replies
- 0 kudos
I would like to use Databricks to teach large-scale analytics in my classroom; does Databricks have any resources or community assets that can help me out?
- 626 Views
- 1 replies
- 0 kudos
Latest Reply
For folks that are looking to leverage Databricks as a teaching asset, please sign contact us for Databricks University Alliance. https://databricks.com/p/teach
- 645 Views
- 1 replies
- 0 kudos
I want to learn how to use Databricks for my courses at university, and maybe to get a Databricks Certification. Can you help me out?
- 645 Views
- 1 replies
- 0 kudos
Latest Reply
We have a ton of great resources available for people who are wanting to learn Databricks, specifically for university students. Checkout our our university page, to learn more about Databricks Community Edition, Free workshops, and self-paced course...
- 564 Views
- 1 replies
- 0 kudos
I am trying to design a stream a data analytics project using functions --> event hub --> storage --> Azure factory --> databricks --> SQL server.What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed m...
- 564 Views
- 1 replies
- 0 kudos
Latest Reply
Check out our auto loader capabilities that can automatically track and process files that need to be processed. AutoloaderThere are two options: directory listing, which is essentially completing the same steps that you have listed above but in a sl...
- 1229 Views
- 1 replies
- 0 kudos
I would like to provide row and column level security on my tables I have created in my workspace. Is there any way to do this?
- 1229 Views
- 1 replies
- 0 kudos
Latest Reply
Databricks includes two user functions that allow you to express column- and row-level permissions dynamically in the body of a view definition.current_user(): return the current user name.is_member(): determine if the current user is a member of a s...
- 4674 Views
- 1 replies
- 0 kudos
I am storing excel files in Azure data lake (gen 1). They follow filenames follow the same pattern "2021-06-18T09_00_07ONR_Usage_Dataset", "2021-06-18T09_00_07DSS_Usage_Dataset", etc. depending on the date and time. I want to read all the files in th...
- 4674 Views
- 1 replies
- 0 kudos
Latest Reply
If you are attempting to read all the files in a directory you should be able to use a wild card and filter using the extension. For example: df = (spark
.read
.format("com.crealytics.spark.excel")
.option("header", "True")
.option("inferSchema", "tr...