tJapaneseTokenize properties for Apache Spark Batch
These properties are used to configure tJapaneseTokenize
running in the Spark Batch Job framework.
The Spark Batch
tJapaneseTokenize component belongs to the Data Quality family.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.
Schema and Edit Schema
A schema is a row description. It defines the number of fields
Built-In: You create and store the schema locally for this component
Repository: You have already created the schema and stored it in the
The columns from the output schema are added to the
For each of the schema columns containing Japanese text to be tokenized,
You can select the check box in the header row to select all schema
Select this check box to gather the Job processing metadata at the Job level
This component is usually used as an intermediate component, and it requires an
In the Spark
Configuration tab in the Run
view, define the connection to a given Spark cluster for the whole Job. In
addition, since the Job expects its dependent jar files for execution, you must
specify the directory in the file system to which these jar files are
transferred so that Spark can access these files:
This connection is effective on a per-Job basis.