August 15, 2023

Extracting the hashtag field from the raw Tweet data – Docs for ESB 6.x

Extracting the hashtag field from the raw Tweet data

  1. Double-click tExtractJSONFields to open its Component view.

    use_case-tkafkainput_twitter4.png

    As you can read from https://dev.twitter.com/overview/api/entities-in-twitter-objects#hashtags, the raw Tweet data uses the JSON format.
  2. Click Sync columns to retrieve the schema
    from its preceding component. This is actually the read-only schema of tKafkaInput, since tWindow does not impact the schema.
  3. Click the […] button next to Edit
    schema
    to open the schema editor.

    use_case-tkafkainput_twitter5.png

  4. Rename the single column of the output schema to hashtag. This column is used to carry the hashtag field extracted from the Tweet JSON data.
  5. Click OK to validate these changes.
  6. From the Read by list, select JsonPath.
  7. From the JSON field list, select the column of
    the input schema from which you need to extract fields. In this scenario, it is
    payload.
  8. In the Loop Jsonpath query field, enter JSON path
    pointing to the element over which extraction is looped. According to the JSON
    structure of a Tweet as you can read from the documentation of Twitter, enter
    $.entities.hashtags to loop over the
    hashtags entity.
  9. In the Mapping table, in which the hashtag column of the output schema has been filled in
    automatically, enter the element on which the extraction is performed. In this
    example, this is the text attribute of each
    hashtags entity. Therefore, enter text within double quotation marks in the Json query column.

Document get from Talend https://help.talend.com
Thank you for watching.
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x