# External User Property Association Import Function
# I. Introduction
In some cases, you need to import external user data into the TA cluster, but the user ID in the data is not #account_id or #distinct_id in the TA system, for example, the data uses the mobile phone number, ID number or other identification ID as the primary key.If you need to import this part of data into the TA system as user features, you need to set the association relationship through update_user_by_foreignkey
command to update the external user features to the TA system. Currently, all supported data sources of datax are supported:
# II. Instructions for Use
# 2.1 Command description
The command for data import is as follows:
ta-tool update_user_by_foreignkey -conf <config files> [--date xxx]
# 2.2 Command parameter description
# 2.2.1 -conf
The incoming parameters are the configuration file path of the import task. Each task is a configuration file. It supports multiple tasks to be imported at the same time. It supports wild-card methods, such as/data/config/
or ./config/.json
# 2.2.1 --date
Optional parameter ** --date **: Optional, the parameter indicates the data date, the time macro will be replaced based on this reference time, can not be transmitted, not the default to take the current date, the format is YYYY-MM-DD
, the specific use of the time macro, you can refer to time macro usage
# 2.3 Time macro usage
You can replace time parameters with time macros inside the configuration file. ta-tool will use the imported start time as a benchmark to calculate the offset of time based on the parameters of the time macro, and replace the time macro in the configuration file. The available time macro formats are @[{yyyyMMdd}]
, @[{yyyyMMdd}-{nday}]
, @[{yyyyMMdd}+{nday}]
, etc.
yyyyMMdd
can be replaced with any date format that can be parsed by JavadateFormat
, for example:yyyy-MM-dd HH:mm:ss.SSS
,yyyyMMddHH000000
- n can be any integer, representing the offset value of time
- day represents the offset unit of time, which can be selected as follows:
day
,hour
,minute
,week
,month
- Example: Suppose the current time is
2018-07-01 15:13:23.234
@[{yyyyMMdd}]
is replaced with20180701
@[{yyyy-MM-dd}-{1day}]
is replaced with2018-06-31
@[{yyyyMMddHH}+{2hour}]
is replaced with2018070117
@[{yyyyMMddHHmm00}-{10minute}]
is replaced with20180701150300
# III. Function Description
# 3.1 Sample configuration
{
"job": {
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "username",
"password": "password",
"connection": [
{
"querySql": [
"SELECT card_id, property1, property2,property3 FROM table1;"
],
"jdbcUrl": [
"jdbc:mysql://ip:port/database"
]
}
]
}
},
"writer": {
"parameter": {
"appid": "6f9e64da5bc74792b9e9c1db4e3e3822",
"column": [{
"type": "string",
"name": "card_id"
},
{
"type": "string",
"name": "property1"
},
{
"type": "string",
"name": "property2"
},
{
"type": "double",
"name": "property3"
}
],
"joinkey":{
"importDataKey": ["card_id"],
"taUserTableKey": ["card_id"]
}
}
}
}]
}
}
# 3.2 Parameter description
# 3.2.1 reader part
- The configuration of the reader is consistent with the reader supported by datax
# 3.2.2 writer Part
- appid
- Description: project appid
- Required: Yes
- Default value: none
- column
- Description: Read the list of fields,
type
specifies the type of data,name
specifies the column at the corresponding position of reader, and the property name when importing ta system.
- Description: Read the list of fields,
The user can specify the Column
field information, configured as follows:
[
{
"type": "double",
"name": "property1"
},
{
"type": "string",
"name": "property2"
},
{
"type": "bigint",
"name": "property3"
}
]
- joinkey.importDataKey
- Description: The column of the writer in the configuration information is used as the associated column name.
- Required: Yes
- Default value: none
- joinkey.taUserTableKey
- Description: The user table in the TA system is used as the associated column name.
- Required: Yes
- Default value: none
# 3.3 Type conversion
DataX internal type | HIVE data type |
---|---|
Long | TINYINT,SMALLINT,INT,BIGINT |
Double | FLOAT,DOUBLE |
String | STRING,VARCHAR,CHAR |
Boolean | BOOLEAN |
Date | DATE,TIMESTAMP |