Warning
This component will be available in the Palette of the studio on the condition that you have subscribed to
one of the Talend solutions with Big
Data.
Component family |
Big Data / Google BigQuery |
|
Function |
This component writes the data it receives in a user-specified |
|
Purpose |
This component transfers the data provided by its preceding |
|
Basic settings |
Schema and Edit |
A schema is a row description. It defines the number of fields to be processed and passed on Since version 5.6, both the Built-In mode and the Repository mode are Click Edit schema to make changes to the schema. If the
|
|
Property type |
Built-In: You create and store the schema locally for this |
|
|
Repository: You have already created the schema and Since version 5.6, both the Built-In mode and the Repository mode are |
Local filename |
Browse to, or enter the path to the file you want to write the |
|
Append |
Select this check box to add rows to the existing data in the file |
|
Connection |
Client ID and Client |
Paste the client ID and the client secret, both created and viewable on the API Access tab To enter the client secret, click the […] button next |
Project ID |
Paste the ID of the project hosting the BigQuery service you need to use. The default ID of this project can be found in the URL of the Google API Console, or by |
|
Authorization code |
Paste the authorization code provided by Google for the access you are building. To obtain the authorization code, you need to execute the Job using this component and |
|
Dataset |
Enter the name of the dataset you need to transfer data to. |
|
Table |
Enter the name of the table you need to transfer data to. If this table does not exist, select the Create the table if it doesn’t exist check |
|
Action on data |
Select the action to be performed from the drop-down list when
|
|
Google storage configuration |
Access key and Secret key |
Paste the authentication information obtained from Google for making requests to Google To enter the secret key, click the […] button next to These keys can be consulted on the Interoperable Access tab view under the Google Cloud |
Bucket |
Enter the name of the bucket, the Google Cloud Storage container, that holds the data to |
|
File |
Enter the directory of the data stored on Google Cloud Storage and to be transferred to If the data is not on Google Cloud Storage, this directory is used as the intermediate |
|
Header |
Set values to ignore the header of the transferred data. For |
|
Die on error |
This check box is cleared by default, meaning to skip the row on |
|
Advanced settings |
token properties File Name |
Enter the path to, or browse to the refresh token file you need to use. At the first Job execution using the Authorization code With only the token file name entered, Talend Studio considers the directory of that token file For further information about the refresh token, see the manual of Google BigQuery. |
Field Separator |
Enter character, string or regular expression to separate fields for the transferred |
|
Create directory if not exists |
Select this check box to create the directory you defined in the File field for Google Cloud Storage, if it does not exist. |
|
Custom the flush buffer size |
Enter the number of rows to be processed before the memory is freed. |
|
Check disk space |
Select this check box to throw an exception during execution if |
|
Encoding |
Select the encoding from the list or select Custom and |
|
tStatCatcher Statistics |
Select this check box to collect the log data at the component |
|
Global Variables |
ERROR_MESSAGE: the error message generated by the A Flow variable functions during the execution of a component while an After variable To fill up a field or expression with a variable, press Ctrl + For further information about variables, see Talend Studio |
|
Usage |
This is an output component used at the end of a Job. It receives |
|
Limitation |
N/A |
This scenario uses two components to write data in Google BigQuery.
-
In the Integration perspective
of Talend Studio,
create an empty Job, named WriteBigQuery for example, from the Job Designs node in the Repository tree view.For further information about how to create a Job, see the Talend Studio User
Guide. -
Drop tRowGenerator and tBigQueryOutput onto the workspace.
The tRowGenerator component generates the
data to be transferred to Google BigQuery in this scenario. In the
real-world case, you can use other components such as tMysqlInput or tMap in the
place of tRowGenerator to design a
sophisticated process to prepare your data to be transferred. -
Connect them using the Row > Main
link.
-
Double-click tRowGenerator to open its
Component view. -
Click RowGenerator Editor to open the
editor. -
Click three times to add three rows in the Schema table.
-
In the Column column, enter the name of
your choice for each of the new rows. For example, fname, lname and
States. -
In the Functions column, select TalendDataGenerator.getFirstName for the
fname row, TalendDataGenerator.getLastName for the lname row and TalendDataGenerator.getUsState for the States row. -
In the Number of Rows for RowGenerator
field, enter, for example, 100 to define
the number of rows to be generated. -
Click OK to validate these
changes.
Building access to BigQuery
-
Double-click tBigQueryOutput to open its
Component view. -
Click Sync columns to retrieve the schema
from its preceding component. -
In the Local filename field, enter the
directory where you need to create the file to be transferred to
BigQuery. -
Navigate to the Google APIs Console in your web browser to access the
Google project hosting the BigQuery and the Cloud Storage services you need
to use. -
Click the API Access tab to open its view.
-
In the Component view of the Studio,
paste Client ID, Client secret and Project ID from the API Access tab view
to the corresponding fields, respectively. -
In the Dataset field, enter the dataset
you need to transfer data in. In this scenario, it is documentation.This dataset must exist in BigQuery. The following figure shows the
dataset used by this scenario. -
In the Table field, enter the name of the
table you need to write data in, for example, UScustomer.If this table does not exist in BigQuery you are using, select Create the table if it doesn’t exist.
-
In the Action on data field, select the
action. In this example, select Truncate to
empty the contents, if there are any, of target table and to repopulate it
with the transferred data.
Building access to Cloud Storage
-
Navigate to the Google APIs Console in your web browser to access the
Google project hosting the BigQuery and the Cloud Storage services you need
to use. -
Click Google Cloud Storage > Interoperable Access to open its view.
-
In the Component view of the Studio,
paste Access key, Access secret from the Interoperable Access tab view to
the corresponding fields, respectively. -
In the Bucket field, enter the path to
the bucket you want to store the transferred data in. In this example, it is
talend/documentationThis bucket must exist in the directory in Cloud Storage
-
In the File field, enter the directory
where in Google Clould Storage you receive and create the file to be
transferred to BigQuery. In this example, it is gs://talend/documentation/biquery_UScustomer.csv. The file
name must be the same as the one you defined in the Local filename field.Note
Troubleshooting: if you encounter issues such as Unable to read source URI of the file
stored in Google Cloud Storage, check whether you put the same file name
in these two fields. -
Enter 0 in the Header field to ignore no rows in the transferred
data.
Getting Authorization code
-
In the Run view of Talend Studio, click Run to execute this Job. The execution will pause at a given
moment to print out in the console the URL address used to get the
authorization code. -
Navigate to this address in your web browser and copy the authorization
code displayed. -
In the Component view of tBigQueryOutput, paste the authorization code in
the Authorization Code field.