tJapaneseTransliterate
Converts textual data in Japanese to kana and Latin scripts.
Transliteration is a phonetic operation where the
tJapaneseTransliterate component attempts to create in kana
characters or Roman characters (rōmaji) an equivalent of the original textual data based
on the sounds the string represents.
The modern Japanese writing system uses a combination of kanji (Chinese characters) and
syllabic kana (hiragana and katakana). For the benefit of non-Japanese speakers who
cannot read kanji or kana, romanization systems have been developed to write the
Japanese language in Latin script.
rōmaji (Roman characters):
- Kana characters
- Hiragana
- Katakana reading
- Katakana pronunciation
- Rōmaji
- Revised Hepburn: This is the most widely used romanization system.
- Kunrei-shiki: This romanization system has been standardized by the
Japanese Government and the International Organisation for
Standardisation as ISO 3602. It is a modified version of the Nihon-shiki
system for modern standard Japanese. - Nihon-shiki: This romanization system is the most regular romanization
system because it maintains a one-to-one correspondence between kana and
rōmaji.
In local mode, Apache Spark 1.6.0, 2.3.0 and 2.4.0 are supported.
Depending on the Talend
product you are using, this component can be used in one, some or all of the following
Job frameworks:
- Standard: see tJapaneseTransliterate Standard properties.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.
- Spark Batch: see tJapaneseTransliterate properties for Apache Spark Batch.
The component in this framework is available in all Talend Platform products with Big Data and in Talend Data Fabric.
- Spark Streaming: see tJapaneseTransliterate properties for Apache Spark Streaming.
This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.
- tJapaneseTransliterate Standard properties
These properties are used to configure tJapaneseTransliterate running in the Standard Job framework. - Transliterating Japanese text
- tJapaneseTransliterate properties for Apache Spark Batch
These properties are used to configure tJapaneseTransliterate running in the Spark Batch Job framework. - tJapaneseTransliterate properties for Apache Spark Streaming
These properties are used to configure tJapaneseTransliterate running in the Spark Streaming Job framework.