Getting into the data space can feel overwhelming, with so many tools, terms, and technologies. But after years in
Expect failure. Design for it.
Jobs will fail. The data will be late. Build systems that can recover gracefully, and continually monitor your pipelines.
Think like an engineer.
Use GitโAutomate where possible. Learn the basics of DevOps (CI/CD, testing, infrastructure as code). You'll stand out because many skip this.
Reproducibility builds trust.
If someone can't trace how you got a result, it's not reliable. Always aim for results that are transparent and repeatable.
Understand the problem, not just the data.
Tools change, but solving real-world problems doesn't. Stay close to the "why" behind the work โ it's what separates good from great.
Whether you're just starting or mentoring others, what do you think belongs on this list?
Thanks,
Boitumelo