Greetings @Maylin , A single many-to-many multilingual model, fine-tuned jointly across your 17 languages, is usually the best trade-off between quality, scalability, and operational simplicity; combine it with lightweight adapters to preserve quality and add languages over time. Use built-in translation functions for any supported pairs as strong baselines, and run a focused evaluation against services like DeepL on your domain data before committing.
Model strategy
- Multilingual NMT can match or exceed bilingual systems for many directions, especially when low- and midāresource languages benefit from transfer learning.
- One model covers all language pairs, reducing deployment/monitoring complexity versus maintaining many separate models.
Adding languages over time
- Naively fine-tuning on a new language risks catastrophic forgetting; mitigate with replay (mix prior-language data), regularization (e.g., EWC/L2), and adapter/LoRA layers.
- Use conservative learning rates, early stopping, and per-language validation to detect regressions when you introduce new languages.
Multiple targets per source
- Many-to-many models naturally translate one source into multiple targets using language tags/prompts without dedicated models per pair.
- Shared representations encourage positive transfer between related targets and improve consistency.
Training on Databricks with HF scripts
- The Hugging Face translation script is a solid foundation; on Databricks, pair it with distributed training (e.g., Accelerate/DeepSpeed), MLflow tracking, and robust data pipelines.
- Key practices: balanced/temperature sampling across languages, a shared tokenizer (e.g., SentencePiece), and per-direction metrics (BLEU/chrF/COMET) with stratified validation sets.
Handling long documents
- Translate at sentence level with smart segmentation, but include a small preceding/following context window to improve cohesion, pronouns, and terminology.
- Add a document-level QA pass for terminology consistency, named entities, and formatting; consider glossary/term constraints and post-editing for high-stakes content.
Using Databricks ai_translate
- If your language pairs are supported, use it for rapid, production-grade baselining and batch workflows within Databricks.
- Even then, validate on your own samples with automatic scores and human review to check adequacy, style, and consistency.
DeepL vs bespoke models
- DeepL is strong on many European pairs but performance varies by language and domain; your clientās content and style requirements are decisive.
- Run a bakeāoff on representative documents comparing ai_translate (where supported), DeepL, and your fineātuned multilingual model; score with COMET and targeted human review.
Recommended plan
- Start from a high-quality multilingual baseline and jointly fineātune on all 17 languages using temperature sampling and balanced batches.
- Use per-language or languageācluster adapters/LoRA; when adding new languages, adopt replay + regularization to prevent forgetting.
- Build a doc pipeline: sentence segmentation, small context windows, terminology glossaries, and a documentālevel QA pass.
- Execute an evaluation bakeāoff (automatic + human) on your domain data; choose based on quality, throughput, cost, and governance.
- Operationalize on Databricks with distributed training, MLflow dashboards per language/direction, and continuous quality monitoring.
Hope this helps, Louis.