Managing Software Translation Process Using Alchemy Catalyst


Introduction

Translation is a crucial part of the software localization process. Every organization which wants to provide global software or internet services must deal with this aspect of the international rollout. In the complex environment of the software project this relatively simple task (if the application is properly designed and built for localization) causes additional dependencies and productivity issues for the team:

  1. Translations should be done in context, that is: a language specialist who is able to fully understand how the text appears on the screen in the context of the user action invariably will do a better translation.
  2. Applications (especially those originally written for the English market) are very often not ready for the translation: sentences are broken into separate strings put together by code instructions, strings are reused in different context (for example fax as a noun and fax as a verb) and application functional content is not separated from the code (for example error messages are often located as string consts or variables in the code)
  3. Translation must integrate with standard engineering build, test and deployment processes. Functional content such as names of buttons, menu options, error messages etc … obviously requires translation but it is connected to code (and very often is the code). In Microsoft.NET localized program strings should be contained in the resource files (a.k.a resx files). Those files are compiled with the code and deployed as satellite assemblies by the engineers during the build process.
  4. Most organizations start from inefficient, “manual” processes that usually involve exporting functional strings to excel spreadsheets or word files and then importing translated files manually or through scripts.
  5. Mature organizations implement translation and localization management software from one of the leading vendors such as Trados, Idiom Technologies or Alchemy.
  6. The translation software uses pattern of Translation Memory, which is defined as “collection of units of associated text strings in language pairs from previous translations which can be suggested to translators translating similar content and language pair document.”
  7. Management of the localization process revolves around storing, indexing and reusing Translation Memories. In the typical case translation engineer creates a project for base (neutral) language that contains base to base TM’s. Subsequently the base project is then used to create base to target projects. As a result one typically ends up with the collection of parent to children TM projects.

This paper presents typical translation processes for the .NET application implemented using alchemy Catalyst product from Alchemy software (www.alchemy.ie)

High Level Features of The Alchemy Catalyst

Feature Explanation
TM’s storage In Catalyst Translation Memories are stored in the project file (extension ttk). This has benefits and drawbacks: because one can easily share TM information by simply emailing the project, on the other hand lack of centralized repository (like a data base offered by other vendors) creates file management overhead.
Comparison Expert The Comparison Expert is used to compare two application files. It detects missing, added and modified resources and records these changes in the Results Toolbar and an optional XML report file. This Expert is useful in determining the scope of change between revisions of software.
Leveraging Alchemy CATALYST provides Translation Memory technology called ezMatch™. ezMatch allows translators to re-use previous translations. The Leverage Expert guides users through using ezMatch technology and is designed to maximize the amount of translations that can be leveraged from multiple file formats.
Pseudo Translation Pseudo Translation simulates the effects of translation on the application files. It does this by substituting vowel characters in the source files with diacritical or accented characters. Pseudo-Translation can be used early during product development cycle to determine if an application can be translated easily. For example, one may use it to determine if a product crashes if a series of strings are translated, or if a series of strings will fit if during translation they expand 15%
Validation Validation automates the detection of common localization errors normally introduced during the translation process. The Validate Expert also has a companion technology, Runtime Validation Expert, which allows user to validate applications as they run on the Windows desktop.
Power Translation Power Translation is used to automate the lookup and translation of translation units in the active Project TTK file. It operates on translation units and helps the translator locate and translate matching terms in active translations memories.
Spell checking Custom dictionaries
Data Base Connector Ability to create TM’s via reading content directly from data base through a provided connector
License Management Either through additional License Manager or through built in ability to borrow license for up to 2wks.
ezParse Ability to define custom XML schemas for non standard resource files
Command line interface Most of the features of Catalyst can be called from the command line. This feature could be used during the script driven build process in Nant or MsBuild.
Free translation version (Alchemy Lite) Catalyst has a free, stripped down version of its software. This version could be used by translators that do not require advanced features such as leveraging, power translating etc…. The free version reduces the overall TCO for the product.

Localizing Large, Live Website

Process of localization of the large, live website consists of several sub-processes , each with its own level of complexity and requiring specialized resources:
  1. Creation of the master translation file
  2. Localization of the master file to a specific locale
  3. Parallel vs. Serial translation process
  4. Site Maintenenace - adding new section or page to the master
  5. Adding new string to the existing resource / page

Creation of the master translation file

In most cases localization of the existing websites happens because the business decision to open the site in another country. Most websites are built for specific market / country / locale and therefore are not designed with globalization in mind. Therefore before your team embarks on the localization process the Software Architect must insure that the software is even compatibile with different language, data/time formats, currencies, systems of measure etc... There ale well known i18n or l10n patterns available on Microsoft (for .NET) or SunJava sites to help you with refactoring your app for globalization.

Assuming that your application is globalization ready your first task in the localization process is creation of the base language application. This will be your source, master for all translations. This may or may not be “real” (that is deployed or even deployable) application especially if your website offers different features to different user profiles. You should create you base application using neutral culture (using Microsoft.NET localization terminology); a decent compromise is English locale.

In this process (see Figure 1 ) resource files (resx and custom xml) are imported into the Catalyst. For the custom localization files (custom XML schema etc…) the import rules must be entered into Catalyst via the easyParse interface before the import.There are two options of importing files: file import and file/folder import. The latter option is preferable because it preserves the structure of the source directories and Catalyst is able to replicate the source file structure during export.



Figure 2 shows the initial view of the project after import. Catalyst parses each file and extracts strings. Even though the system treats each string pair as an independent unit of work (Translation Memory) it remembers the relationship between strings, parent files and folders. This is shown in the navigational area on the left.



The initial project settings specify English as a source and target languages. This project needs to be treated as a master for all other projects. In case of a new country rollout we will use it as a basis for translation to a specific language.

New locale rollout

First step in the new country rollout process is the creation of the Catalyst project file for the specific TM pairing. The source language should always be left as your basic locale (for instance en-GB), the target language needs to be set in the locale navigational area. The Project file name should reflect specific language pair. Therefore our naming convention should be: [project actual name].[Source locale].[Target locale].ttk. For instance:

  • Website.en-GB.en-GB.ttk is a master file for the UK code base
  • Website.en-GB.de-DE.ttk contains translation into German
This is shown on the figure 3 below:



The locale specific project file is shipped to translators (via quickship option which essentially creates a self-extracting executable of the catalyst project file) who work on the file using the translators/lite version of the system. Before the file is shipped to translators the translation process owner should “lock” certain strings i.e. prevent them from being translated. Generally this should be rare; however it particularly applies in cases where resource files are used to store business logic.

After receiving the translated file from the translator a senior language resource should approve all translations within the tool (which would change the visual status for every string from en eye to checkmark) and give it back to engineers to extract TMs into application resource files.

In the last part of the process locale specific resx files are extracted and included in the next build. Alternatively the approval of the translated strings should happen after the development-stage or development-integration (depending on your process) build has been approved. After the build the language QA takes place on the translation testing environment. In case of issues, defects are entered and the translation process repeats. After launch you must maintain the locale specific TM’s – most appropriately in the SCM system since they represent our reference data. This puts an overhead of keeping the TTK files in sync between the master and localized versions. You should automate this process using Alchemy’s command-line interface.

The figure 4 below illustrates this process in detail:



Parallel vs. Serial Translation Process

Small websites could be usualy handled by one translator, who is using one project file. Thus the translation process occurs serially with one project file (ttk) being send back end forth between the pm and translator. For very large websites or applications the process of translation will not use single unit but it will require chunks of work that could be handled by several translators at the same time. Catalyst supports this scenario with the “section” concept. The idea is to export parts of the site into “subprojects” to be sent to translators. After translation the sections must be imported into the main projects and section projects should be discarded. This is illustrated on figure 5 below:



Please note that original names (and extensions) of resource file that were loaded into the master project will be preserved in the localized projects. This not a big problem because files with correct names and extensions can be always exported from that file. Alternatively we could maintain a neutral version of resx file (with no language extension) and use it as master. Figure 5 shows the parallel translation process in detail. Depending on the project timeline we should export the main project into as many section projects as it appropriate. Those section projects are handled by translators and translator managers. Translation QA of the parallel translation process should follow the main workflow illustrated on the Figure 4. Your translation build schedule should be aligned with delivery of translated chunks.

Figure 6 and Figure 7 show screens for importing and exporting sections from Catalyst.




Maintenance Process

So we have successfully localized our site and now we have thriving web businesses in several locales. Now we need to add a new page, site section or functionality to the website. In this scenario we are assuming that there is an existing master translation project as well as one or several projects containing TMs for live non-english sites. As it was mentioned before we need to maintain the master and localized versions in sync. Therefore we would add a new folder/ new page.en-GB.resx file to all ttk projects at the same time and start parallel translation efforts (via exporting sections) when appropriate and congruent with the release schedule.

Another maintenance scenario is adding new strings to existing pages and applications. This usually happens if we are changing functionality of existing page by adding new buttons or call to action on pages.

The complication here is that Catalyst manages translation of the application TM’s. It does not manage application strings themselves; therefore one cannot add or remove strings inside Catalyst. If we need to add a new application string to the resource file we need to follow a slightly different process illustrated on the Figure 8. A developer needs remove the old resx file from the master project and re-import the new version. Subsequently all localized projects need to be recreated (by changing the target language and saving master with localized names according to our naming convention). Engineers need to then leverage (see Figure 9) translated TM’s from the old localized projects into new versions. Non-translated strings need to be then exported to translators and after receiving the translated TM’s re-imported into the new projects. Finally local resx files must be exported, application rebuilt and QAd. Yes it is a costly and time consuming process but necessary one if you want to maintain control over your assets





Sumary

Alchemy is a decent and cost effective tool to help you with your translation. The main problem in the transaltion projects is really not the technology but people management. The TPM needs to keep track deliverables fro mthe translation team, while making sure that engineering tea,m incorporates translation into the build process. No software tool is perfect however and we have to deal with limitations and compromises made by the designers and architects. Alchemy is no exception so when working with it you should be mindful of few pitfalls:

  1. Leverage once: The biggest efficiency in applying Alchemy on projects is its ability to reuse translation memories in the process called “leveraging”. Once certain key phrases or dictionaries are translated the resulting TMs could be used to translate the same phrases occurring in other documents. In this way we can achieve both significant efficiency increases and consistency in translating the terms. Alchemy supports this process well; unfortunately if the rework of “base” is required the model breaks because in Alchemy leveraging can be done only once. Hence changes in base require that you discard already translated sections and leverage everything from the beginning
  2. Workflow history is often lost: the translation process is workflow driven so preserving the history of who translated and approved particular TM is an important feature. It appears that the workflow history is maintained by the Alchemy data based so when merging sections back into the bigger data base the individual history if often lost.
  3. Alchemy data bases can grow big: Translated TMs should be kept in the SCM system as a “trusted” source. This works relatively well for applications with limited amount of strings. For applications or websites with large amounts of strings the Alchemy data bases grow to many MBs thus presenting logistical challenges. A centralized data base driven system might do a better job.