menu
Is this helpful?

# External User Property Association Import Function

# I. Introduction

In some cases, you need to import external user data into the TA cluster, but the user ID in the data is not #account_id or #distinct_id in the TA system, for example, the data uses the mobile phone number, ID number or other identification ID as the primary key.If you need to import this part of data into the TA system as user features, you need to set the association relationship through update_user_by_foreignkeycommand to update the external user features to the TA system. Currently, all supported data sources of datax are supported:

# II. Instructions for Use

# 2.1 Command description

The command for data import is as follows:

ta-tool update_user_by_foreignkey -conf <config files> [--date xxx]

# 2.2 Command parameter description

# 2.2.1 -conf

The incoming parameters are the configuration file path of the import task. Each task is a configuration file. It supports multiple tasks to be imported at the same time. It supports wild-card methods, such as/data/config/ or ./config/.json

# 2.2.1 --date

Optional parameter ** --date **: Optional, the parameter indicates the data date, the time macro will be replaced based on this reference time, can not be transmitted, not the default to take the current date, the format is YYYY-MM-DD, the specific use of the time macro, you can refer to time macro usage

# 2.3 Time macro usage

You can replace time parameters with time macros inside the configuration file. ta-tool will use the imported start time as a benchmark to calculate the offset of time based on the parameters of the time macro, and replace the time macro in the configuration file. The available time macro formats are @[{yyyyMMdd}], @[{yyyyMMdd}-{nday}], @[{yyyyMMdd}+{nday}], etc.

  • yyyyMMddcan be replaced with any date format that can be parsed by Java dateFormat, for example: yyyy-MM-dd HH:mm:ss.SSS, yyyyMMddHH000000
  • n can be any integer, representing the offset value of time
  • day represents the offset unit of time, which can be selected as follows: day, hour, minute, week, month
  • Example: Suppose the current time is2018-07-01 15:13:23.234
    • @[{yyyyMMdd}] is replaced with 20180701
    • @[{yyyy-MM-dd}-{1day}] is replaced with 2018-06-31
    • @[{yyyyMMddHH}+{2hour}] is replaced with 2018070117
    • @[{yyyyMMddHHmm00}-{10minute}] is replaced with 20180701150300

# III. Function Description

# 3.1 Sample configuration

{
        "job": {
                "content": [{
                         "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "username",
                        "password": "password",
                        "connection": [
                            {
                                "querySql": [
                                    "SELECT card_id, property1, property2,property3 FROM table1;"
                                ],
                                "jdbcUrl": [
                                    "jdbc:mysql://ip:port/database"
                                ]
                            }
                        ]
                    }
                },
                        "writer": {
                                "parameter": {
                    "appid": "6f9e64da5bc74792b9e9c1db4e3e3822",
                                        "column": [{
                                                        "type": "string",
                                                        "name": "card_id"
                                                },
                                                {
                                                        "type": "string",
                                                        "name": "property1"
                                                },
                                                {
                                                        "type": "string",
                                                        "name": "property2"
                                                },
                                                {
                                                        "type": "double",
                                                        "name": "property3"
                                                }
                                        ],
                                                "joinkey":{
                                              "importDataKey": ["card_id"],
                                                  "taUserTableKey": ["card_id"]
                                        }
                                }
                        }
                }]
        }
}

# 3.2 Parameter description

# 3.2.1 reader part

  • The configuration of the reader is consistent with the reader supported by datax

refer to datax doc

# 3.2.2 writer Part

  • appid
    • Description: project appid
    • Required: Yes
    • Default value: none
  • column
    • Description: Read the list of fields, type specifies the type of data, name specifies the column at the corresponding position of reader, and the property name when importing ta system.

The user can specify the Column field information, configured as follows:

[
  {
    "type": "double",
    "name": "property1"
  },
  {
    "type": "string",
    "name": "property2"
  },
  {
    "type": "bigint",
    "name": "property3"
  }
]
  • joinkey.importDataKey
    • Description: The column of the writer in the configuration information is used as the associated column name.
    • Required: Yes
    • Default value: none
  • joinkey.taUserTableKey
    • Description: The user table in the TA system is used as the associated column name.
    • Required: Yes
    • Default value: none

# 3.3 Type conversion

DataX internal type HIVE data type
Long TINYINT,SMALLINT,INT,BIGINT
Double FLOAT,DOUBLE
String STRING,VARCHAR,CHAR
Boolean BOOLEAN
Date DATE,TIMESTAMP