# Data Backtrack Function
# I. Overview
The data backtrack function is a function of querying the data in the TA database through SQL statements, and then entering the returned results into the TA database to generate new events or new user properties.
# II. Instructions for Use
# 2.1 Command description
Log in to any TA server, execute the su - ta
command, and switch to the TA user.
Execute ta-tool user_event_import -conf
to read the configuration file, the command is as follows
ta-tool user_event_import -conf <config file> [--date data date]
# 2.2 Command parameter description
# 2.2.1 -conf
As a mandatory parameter, the parameter is the path corresponding to the data backtrack task configuration file, which supports wild-card methods, such as/data/config/
or ./config/.json
# 2.2.1 --date
As an optional parameter, the parameter represents the data date, and the time macro will be replaced based on this reference time. The parameter may not pass, it is deemed to take the current date by default, in the format of YYYY-MM-DD
. For the specific use of the time macro, please refer to time macro usage
# 2.2.3 Example
ta-tool user_event_import -conf /data/home/ta/import_configs/*.json
The parameter is the full path of the configuration file, which supports reading multiple configuration files using wild-card
# 2.3 Description of configuration file
# 2.3.1 The sample configuration file is as follows:
The core of the data backtrack function is a configuration file containing query statements and configuration parameters. A configuration file corresponds to a data backtrack task, and the configuration file for a backtrack event is as follows:
{
"event_desc": {
"ltv_event": "user life cycle"
},
"appid": "APPID",
"type": "event",
"property_desc": {
"register_date": "registration date",
"date_prop": "LTV days"
},
"sql": "SELECT 'thinkinggame' \"#account_id\",'ltv_event' \"#event_name\",register_date \"#time\",register_date,ltv,date_prop FROM (SELECT recharge_money ltv,register_date,CASE date_trunc('day', cast(register_date AS TIMESTAMP)) WHEN CURRENT_DATE - interval '1' DAY THEN 'next day' WHEN CURRENT_DATE - interval '2' DAY THEN '3 days' WHEN CURRENT_DATE - interval '6' DAY THEN '7 days' WHEN CURRENT_DATE - interval '13' DAY THEN '14 days' WHEN CURRENT_DATE - interval '29' DAY THEN '30 days' ELSE NULL END date_prop FROM (SELECT sum(recharge_money) recharge_money ,register_date FROM (SELECT \"#user_id\" ,sum(recharge_value) recharge_money ,\"$part_date\" recharge_date FROM v_event_0 WHERE \"$part_event\" = 'recharge' AND \"$part_date\" > '2018-06-30' AND \"$part_date\" < '2018-07-30' GROUP BY \"#user_id\" , \"$part_date\") a LEFT JOIN (SELECT \"#user_id\" , \"$part_date\" register_date FROM v_event_0 WHERE \"$part_event\" = 'player_register' AND \"$part_date\" > '2018-06-30' AND \"$part_date\" < '2018-07-30') b ON a.\"#user_id\" = b.\"#user_id\" WHERE b.\"#user_id\" IS NOT NULL AND recharge_date >= register_date GROUP BY register_date) c) d WHERE date_prop IS NOT NULL"
}
TIP
When you need to backtrack the List type, because the underlying storage list type is string separated by\ t, you need to do the following:
split("arrayColumn", chr(0009)) as arrayColumn
# 2.3.2 Description of configuration parameters
Each configuration file is represented as JSON, and the following is the meaning of each element:
event_desc
:- Description: optional configuration, JSON object, to set the display name of the new event
- key: event name
- value: display name
appid
:- Description: mandatory configuration, to write the APPID of the target project of the query result
type
:- Description: mandatory configuration, to write the
event
table or theuser
table of the target item - Available values: event, user
- Description: mandatory configuration, to write the
property_desc
:- Description: optional configuration, JSON object, to set the display name of the property name
- key: property name
- value: display name
sql
:- Description: mandatory configuration, string, query statement. Please note that the column name of the returned result will determine the specific meaning of the column data, which must have the following:
- Necessary column name 1:
#account_id
or#distinct_id
, at least one of which should correspond to the account ID and anonymous ID of the triggering user. If the generated event is not triggered by an individual, such as LTV in the above example, it is recommended to use a fixed value outside the ID rule, such as 'system', 'admin'. - Necessary column name 2:
#event_name
, event name, recommended setting value - Necessary column name 3:
#time
, event time, formatyyyy-MM-dd HH:mm:ss
oryyyy-MM-dd HH:mm:ss.SSS
- Necessary column name 1:
- Description: mandatory configuration, string, query statement. Please note that the column name of the returned result will determine the specific meaning of the column data, which must have the following:
In addition to the above column names, the data of the remaining columns will be used as the properties of the event, and the column name is the property name.
In addition to backtrack events, it can also generate new user properties or overwrite existing user properties through query results. The configuration file is as follows:
{
"appid": "8d1820678a064397bbfcc9732f352e75",
"type": "user",
"property_desc": {
"user_level": "user level",
"coin_num": "coin number"
},
"sql": "select \"#account_id\",localtimestamp \"#time\",user_level,coin_num from v_user_0"
}
Similar to the backtrack event, its configuration file is also represented by JSON, which is different from the configuration file of the backtrack event as follows:
- No need for
event_desc
type
is "user"- The necessary column names in sql do not require
#event_name
, only the following two:- Necessary column 1:
#account_id
or#distinct_id
, at least one of them - Necessary column 2:
#time
, indicating time
- Necessary column 1:
# 2.4 Time macro usage
You can replace time parameters with time macros inside the data backtrack task configuration file. When executing the data backtrack command, the ta-tool tool will use --date
as a benchmark to calculate the offset of time based on the parameters of the time macro, and replace the time macro in the configuration file. The available time macro formats are@[{yyyyMMdd}]
, @[{yyyyMMdd}-{nday}]
,@[{yyyyMMdd}+{nday}]
, etc.
yyyyMMdd
can be replaced with any date format that can be parsed by JavadateFormat
, for example:yyyy-MM-dd HH:mm:ss.SSS
,yyyyMMddHH000000
- n can be any integer representing the offset of time
- day represents the offset unit of time, which can be taken as follows:
day
,hour
,minute
,week
,month
- Example: Suppose the current time is
2018-07-01 15:13:23.234
@[{yyyyMMdd}]
is replaced with20180701
@[{yyyy-MM-dd}-{1day}]
is replaced with2018-06-31
@[{yyyyMMddHH}+{2hour}]
is replaced with2018070117
@[{yyyyMMddHHmm00}-{10minute}]
is replaced with20180701150300