# TaDataxWriter Plug-in
# I. Introduction
Ta-Datax-Writer is a DataX plug-in for writing data, which provides the function of transferting data to TA clusters in the DataX ecosystem. You can deploy DataX on the data transmission server, and use the data source supported by DataX to read the plug-in and this plug-in, thus achieving data synchronization between multiple data sources and TA clusters.
To learn about DataX, you can visit DataX's Github homepage (opens new window)
The data is sent to the TA receiver for data transfer
# II. Functions and Limitations
TaDataWriter can convert the data from the DataX protocol to the internal data in the TA clusters. TaDataWriter has the following functions:
- Support and only support writing to TA clusters.
- Support data compression. Existing compression formats are gzip, lzo, lz4, snappy.
- Support multi-thread transmission
# III. Instructions for Use
# 3.1 Download datax
- Visit DataX official website (opens new window)
- Download DataX toolkit: DataX download (opens new window)
wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
# 3.2 Decompres datax
tar -zxvf datax.tar.gz
# 3.3 Install ta-datax-writer plugin
- Download ta-datax-writer plugin: ta-datax-writer download (opens new window)
wget https://download.thinkingdata.cn/tools/release/ta-datax-writer.tar.gz
- copy ta-datax-writer.tar.gz datax/plugin/writer
cp ta-datax-writer.tar.gz datax/plugin/writer
- Decompress the plugin package
tar -zxvf ta-datax-writer.tar.gz
- Delete package
rm -rf ta-datax-writer.tar.gz
# IV. Function Description
# 4.1 Sample configuration
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [
{
"value": "123123",
"type": "string"
},
{
"value": "testbuy",
"type": "string"
},
{
"value": "2019-08-16 08:08:08",
"type": "date"
},
{
"value": "2222",
"type": "string"
},
{
"value": "2019-08-16 08:08:08",
"type": "date"
},
{
"value": "test",
"type": "bytes"
},
{
"value": true,
"type": "bool"
}
],
"sliceRecordCount": 10
}
},
"writer": {
"name": "ta-datax-writer",
"parameter": {
"thread": 3,
"type": "track",
"pushUrl": "http://{data receiving address}",
"appid": "6f9e64da5bc74792b9e9c1db4e3e3822",
"column": [
{
"index": "0",
"colTargetName": "#distinct_id"
},
{
"index": "1",
"colTargetName": "#event_name"
},
{
"index": "2",
"colTargetName": "#time",
"type": "date",
"dateFormat": "yyyy-MM-dd HH:mm:ss.SSS"
},
{
"index": "3",
"colTargetName": "#account_id",
"type": "string"
},
{
"index": "4",
"colTargetName": "testDate",
"type": "date",
"dateFormat": "yyyy-MM-dd HH:mm:ss.SSS"
},
{
"index": "5",
"colTargetName": "os_1",
"type": "string"
},
{
"index": "6",
"colTargetName": "testBoolean",
"type": "boolean"
},
{
"colTargetName": "add_clo",
"value": "123123",
"type": "string"
}
]
}
}
}
]
}
}
# 4.2 Parameter description
- thread
- Description: number of threads, used concurrently within each channel, not related to the number of channels in DataX.
- Required: No
- Default value: 3
- pushUrl
- Description: access point address.
- Required: Yes
- Default value: none
- uuid
- Description: Add "#uuid": "uuid value" in the transferred data, and enable it with the data unique ID function.
- Required: No
- Default value: false
- type
- Description: written data type user_set, track.
- Required: Yes
- Default value: none
- compress
- Description: text compression type. By default, non-filling means no compression. The compression type is zip, lzo, lzop, tgz, bzip2.
- Required: No
- Default value: no compression
- appid
- Description: project appid.
- Required: Yes
- Default value: none
- column
- Description: read the list of fields. type specifies the type of data, index specifies the current column corresponding to reader (starting with 0). value specifies the current type as a constant, does not read data from reader, but automatically generates the corresponding column according to the value.
The user can specify the Column field information, configured as follows:
[
{
"type": "Number",
"colTargetName": "test_col", //generate the column names corresponding to the data
"index": 0 //transfer the first column from reader to dataX to get the Number field
},
{
"type": "string",
"value": "testvalue",
"colTargetName": "test_col"
//generate the string field of testvalue from TaDataWriter as the current field
},
{
"index": 0,
"type": "date",
"colTargetName": "testdate",
"dateFormat": "yyyy-MM-dd HH:mm:ss.SSS"
}
]
4.3 Array type description
- When using the array type, the data at the read end is required to be of string type, split by \t
- Sample data on read end: "aaa\tbbb\tccc\tddd"
- Converted result: ["aaa","bbb","ccc","ddd"]
# 4.3 Type conversion
- The type is defined by TaDataWriter:
DataX internal type | TaDataWriter data type |
---|---|
Int | Number |
Long | Number |
Double | Number |
String | String |
Boolean | Boolean |
Date | Date |
Array | Array |