tSparkStore

Warning

This component will be available in the Palette of
Talend Studio on the condition that you have subscribed to one of
the Talend
solutions with Big Data.

tSparkStore properties

Component family	Big Data / Spark
Function	tSparkStore uses the Spark connection created by a given tSparkConnection component and writes the datasets it receives from its preceding Spark component into a specific target, such as an HDFS system.
Purpose	tSparkStore ends the Spark process you are designing and writes the processed datasets into a specific file system.
Basic settings	Spark connection	Select the Spark connection component to be used from the drop-down list in order to reuse the connection created by that component.
	Schema and Edit Schema	A schema is a row description. It defines the number of fields to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.
	Storage target	Select the type of the target system you write the processed data in. Local: this option is available only when you have selected the Local mode in the tSparkConnection component. It allows the Job to write data in the local machine where the Job is executed. Note that the local mode of tSparkStore works only with the Linux system. HDFS: the data to be read is stored in an HDFS system. You need to provide the URI of the NameNode service of this HDFS system in the NameNode field that is displayed. Custom: the data to be read is stored in a system that is not officially supported by the Spark components yet. In this situation, you need to use the protocol recognized by the system. Note that the connection to this custom distribution should be configured in the tSparkConnection component.
	Result folder URI	Enter the directory in which you need to write the data in the target system. Along with this location parameter, you need to set the following parameter about the target data: Field separator: enter the field separator you need to use in the data to be written. Note that this file system cannot be the Windows system.
	Remove result directory if exists	When the Storage target to be used is Local, you can select this check box to remove the folder for the result if it already exists.
Advanced settings	tStatCatcher Statistics	Select this check box to collect log data at the component level.
Global Variables	ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box. A Flow variable functions during the execution of a component while an After variable functions after the execution of the component. To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it. For further information about variables, see Talend Studio User Guide.
Usage	This component is the end component of a Spark process.
Limitations	It is strongly recommended to use this component in a Spark-only Job, that is to say, to design and run a Spark Job separately from the non Spark components or Jobs. For example, it is not recommended to use the tRunJob component to coordinate a Spark Job and a non Spark Job, or to use the tHDFSPut component along with the Spark components in the same Job.

Related scenario

No scenario is available for this component yet.

Document get from Talend https://help.talend.com

Thank you for watching.

Docs 5.x

0 Comments

Inline Feedbacks

View all comments

tSparkStore – Docs for ESB 5.x

tSparkStore

Warning

tSparkStore properties

Related scenario

My Website Links

Tags