DECEMBER 2005 RESEARCH.loc LOCALISATION FOCUS 9
Research
1 Introduction
LOCALISATION OF SOFTWARE IN OPEN SOURCE PROJECTS
IS USUALLY HANDLED BY GETTEXT, a set of tools
帮写澳大利亚论文from the GNU project. This toolkit contains tools for extracting
and merging messages from source code for localisation, as well
as libraries for loading the translated messages from resource files at
runtime. Gettext uses its own file format, the Portable Object (PO)
format, for storing resources in the localisation process. With open
source desktop environments — such as GNOME and KDE —
today having translation teams for over 80 different languages, it is
evident that Linux and open source software (OSS) is reaching out
to a global market, and is not limited to English-speaking cultures
and communities. As industry and governments around the world
continue to embrace open source, localisation has become a critical
factor in OSS adoption. Thus, it is increasingly important that OSS
localisation processes become aligned with industry standards —
allowing seamless integration of commercial and open source
processes when developing for open source platforms.
In the main contribution of this paper, we present the results of
working with open source contributors in defining an XLIFF
Representation Guide for Gettext PO – a work now submitted to
the XLIFF Technical Committee in line with their programme to
develop canonical XLIFF profiles for common file formats. We propose
a bridge between current localisation practices in open source
and established localisation industry standards. As open source
localisation is based around the common PO resource format, this
paper will focus on XLIFF as a possible replacement for PO as the
common format in the OSS localisation process; further, we provide
standards alignment through a canonical representation of the PO
file format within the XLIFF standard. XLIFF was designed principally
to address needs in commercial localisation processes, especially
the localisation of Microsoft Windows-based resource formats.
This research aims to identify aspects of Gettext and other PO-based
localisation processes for which XLIFF lacks support, and to address
these deficiencies.
Building on this foundation, we then examine whether the adoption
of other localisation standards may further improve OSS localisation
practices. This evaluation considers the handling of terminology,
translation memories and localisation workflows — areas that
have that have up until now only been addressed on an ad hoc basis
in open source localisation. We will investigate how successfully
standards such as TMX and TBX might be incorporated into OSS
localisation processes, and discuss the need for a service-based architecture,
evaluating emerging standards such as TWS in open source
localisation.
We have chosen to provide detailed coverage of the case for
XLIFF adoption, but only a high-level overview of the case for other
standards such as TMX, TBX and TWS. This is a natural focus at
this time, as XLIFF is currently the only format for which there is an
existing ‘equivalent’ in OSS localisation processes. Usage of
Translation Memory technologies, structured Terminology
Databases and service-based architectures is very limited in the open
source community, as open source tools for these technologies do
not exist at present. Thus, when we present models incorporating
these standards, we therefore build on best practices from the wider
industry and not current practices from OSS localisation alone.
This research is a first look at how open source can benefit from
standards-based localisation formats; hence, much of the focus of
the paper is to identify areas needing further research. Nevertheless,
this paper builds a solid foundation for further research, providing a
solution for representing the PO format in XLIFF, and arguing the
merits of XLIFF as a common resource format in all open source
localisation processes.
This paper is organised as follows: in Section 2 we provide a thorough
survey of the relevant localisation an