tTikaExtractor – Talend Custom Components – Talend Skill

tTikaExtractor
Name:	tTikaExtractor
Icon:
Author:	Fxp
Resources:
Download:	tTikaExtractor version 0.1 – Direct download
Install Instructions:
Example:	Coming soon…
Features:

Overview:
tTikaExtractor use Apache TIKA parser to easily extract information from many different formats like (html, pdf, doc, odt, image, audio, video, …). See http://tika.apache.org/1.0/formats.html for more information about available parsers. ![screenshot](https://talendforge.org/exchange/tos/upload_tos/extension-475/screenshot.jpg)
Release Notes:
Release version: 0.1 – 2012-01-25 17:03:59 This first release parse any document supported by TIKA parser and provide properties to do further processing: * TIKA Metadata object (METADATA_OBJ property) * TIKA Metadata as as text (METADATA property) * Resource content as text (CONTENT property) * Resource content as XHTML (CONTENT_XHTML property) which could be used in tExtractXMLField for further extraction If you have trouble parsing some formats, download the complete tika-app jar file from http://tika.apache.org/download.html and replace the one included in that pack which was modified in order to upload the component to exchange which has probably a limit around 18Mo.
Compatible:
5.0 (obsolete) 5.2 (obsolete) 5.3 (obsolete) 5.4 (obsolete) 6.0 (obsolete) 6.1 (obsolete) 6.2 (obsolete)

Document get from Talend Exchange

Thank you for watching.

Talend Custom Components

Components Custom Components Talend Custom Components Talend Exchange Talend Javajet components Talend Open Studio

July 30, 2023

Was this article helpful?

Subscribe

0 Comments

Inline Feedbacks

View all comments

0

Would love your thoughts, please comment.x

()