Frequently Asked Questions - Globalese Neural Machine Translation System

What are the supported language combinations?

Please check the Features page to get the answer to the question.

Can I use terminology lists in my engines?

Yes, absolutely. You can add a terminology list to your engine. The terms and phrases listed in the terminology will be given priority during translation.

Do I need to do any preparation on my training data?

If you are using standard formats such as TMX, XLIFF, etc., no technical cleanup of the training data is required. There is no need to remove the tags. However, it is important to use relevant, clean, and consistent training data from a language perspective. This means that a cleaning from a linguistic perspective is always recommended.

Does the size of the training data matter?

Yes and no. Using more training data usually improves the quality, but quantity is not everything. It is just as important to use relevant and consistent content for your training. Therefore, if you meet the minimum requirements in terms of volume, it is more important to use qualified data. It makes no sense to increase the training data set with unknown or unqualified content. In some cases, reducing the training data by removing obsolete content can even improve the quality.

Is my training data safe?

Yes, your all–important training data is safe. Each Globalese system is a dedicated, single–tenant system. The training data is only used to train your engines, run the translations, and improve the quality of your engine. We are not providing or selling your data to third parties, nor are we using your data to improve the engine of any other of our customers.

What are the advantages of Globalese compared to other MT solutions?

These include the option for custom engines, AI–boosted engines, terminology support, custom prompts, tag handling, and, last but not least, the price–to–quality ratio. However, this is only our opinion. We highly recommend that you try a free trial so that you can judge for yourself.

What are the supported formats?

We natively support the most important standard bilingual formats: XLIFF, TMX, TBX, and bilingual CAT tool files. Other formats can be supported through CAT tool or API integration.

How long does it take to train an engine?

This depends on the size of the training data, the engine, and the type of training. Stock+ engines can be trained quickly, usually taking between half an hour and a few hours. Domain–adapted engines are initially trained for longer, typically between 24 and 32 hours. However, a quick training only takes a few hours.

What is the difference between a full and a quick training?

A full training always begins from scratch. Therefore, the engine training time is longer. In exchange, the engine will have a better understanding of the new content of the new or updated Master corpora. A quick training runs faster, but it can be thought of more like a tuning of the existing engine based on the new or updated Master training corpora, rather than a full training.

When should I run a full training and when a quick training?

A full re–training of an engine is recommended if there are significant changes to the master corpus. For example, if a larger volume of new master corpus is available (over 10% of the existing master corpus size), or if a terminology change was performed in the master training data set, it is recommended to run a full training. However, if only a smaller amount of new training data has been added to the master corpus, it is enough to run a quick training.

How often should I retrain an engine?

This depends largely on how often new training data is available for the master corpus of the engine. For a frequently used engine with quickly growing new training data sets, a typical training cycle can be one full retraining every month, with weekly quick trainings during the month. For less frequently used engines, a typical training cycle can be one full retraining every three months, with one quick training every month.

How can I measure the quality of an engine?

There are several options for measuring the quality of an engine. The most common method is to use some of the automated metrics like Bleu or ChrF. These compare the MT output with the human translation reference and provide a score. Most CAT tools provide options for such measurements. However, one should know that these metrics do not actually show the translation quality. They just provide a score of how close the MT translation is to the reference translation. For example, if the MT output uses a linguistically proper synonym or slightly different word order, the automated metrics will penalize it, even though the translation might be as good as, or even better than, the reference. Therefore, our recommendation is to perform at least spot checks by human translators/post–editors.

What is the difference between the Document Translation and the Cloud Text Translation service?

The Document Translation service is an asynchronous translation service which can be used for batch pre-translation of files via the browser-based UI, CAT tool plugins or the API.

The Cloud Text Translation service is a synchronous translation service with auto-scaling feature. It can be used for online propagation of MT matches during translation from a CAT tool, for pre-translation as you can do it in your CAT tool or via the API for any synchronous service.

The two services have a different price tag and are using different plugins and API integration.

What is the meaning of life?