# Data Archiving and Reloading

# I. Data archiving function (ta_data_archive)

The data archiving function is to migrate and archive some historical data or data that does not need to be used for the time being to cheap storage, so as to release the disk resources of the TA cluster and save the use cost.

# 1.1 Archiving commands

#start
ta-tool data_archive start

#stop
ta-tool data_archive stop

#retry
ta-tool data_archive retry -jobid *******

# 1.2 Archiving method

# 1.2.1 S3 method

# 1.2.1.1 Environmental preparation

Apply for Amazon S3 service
Create a bucket (Bucket) for archiving, and the area of the bucket is recommended to be consistent with the TA cluster server
Create the key to access the bucket

# 1.2.1.2 Sample commands

[ta@ta1 ~]$ ta-tool data_archive start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487f6b**********f9c379aa9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time for project archiving: YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > s3
------------------------------------------------------------
Please enter S3 AccesskeyID> AK************YO6G3
------------------------------------------------------------
Please enter S3 secretAccessKey> J23************rZb
------------------------------------------------------------
Please enter S3 region code> cn-****-1
------------------------------------------------------------
Please enter S3 bucket name>ta************ive
------------------------------------------------------------
Please enter the S3 file storage class (default: STANDARD)> S*****D
------------------------------------------------------------
Please enter the target directory for project archiving>  data*****_test
------------------------------------------------------------

# 1.2.1.3 Step description

Enter jobid, which can be customized or generated in the background. Specify jobid when the task fails and retries.
Enter project appid
Enter the start date (outside the most recent month)
Enter the end date (outside the most recent month)
Enter the specified event type (not required) to archive a single event type
Select the type of archival storage S3
Enter the accesskeyid for s3
Enter the secretAccessKey (managed in the S3 IAM service)
Specify bucket (opens new window) region code
Enter bucket name
Select the storage type (opens new window) (standard mode by default). The GLACIER and DEEP_ARCHIVE storage classes in the storage type are designed for low-cost data archiving, but they require thawing operations during data recovery. It is more cumbersome.
The archived target directory (the directory will be created under the target bucket, and the archived data will be placed in the directory)

# 1.2.2 HDFS method

# 1.2.2.1 Environmental preparation

Prepare HDFS environment interworking with TA cluster network

# 1.2.2.2 Sample commands

[ta@ta1 ~]$ ta-tool data_archive start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487************a9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > hdfs
------------------------------------------------------------
Please enter HFDS URL address for project archiving> hdfs-nm-url
------------------------------------------------------------
Please enter HFDS user name for project archiving> hdfsUserName
------------------------------------------------------------
Please enter the target directory for project archiving>  hdfs******test
------------------------------------------------------------

# 1.2.2.3 Step description

Enter jobid, which can be customized or generated in the background. Specify jobid when the task fails and retries.
Enter project appid
Enter the start date (outside the most recent month)
Enter the end date (outside the most recent month)
Enter the specified event type (not required) to archive a single event type
Select the type of archival storage hdfs
Enter the hdfs address of the writing end, if the port defaults to fill in the hostname
Enter the user name of hdfs on the writing end
Enter the target directory for archiving. It is recommended to use an absolute path, otherwise it will be stored in the/user/hdfs user directory/target directory/

# 1.2.3 rsync method

# 1.2.3.1 Environmental preparation

Use rsync daemon mode to set up the server, and copy the secret key file to the command running node in the TA cluster

# 1.2.3.2 Sample commands

[ta@ta1 ~]$ ta-tool data_archive start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 548*****************9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > rsync
------------------------------------------------------------
Please enter the target RSYNC server IP address> rsyncIp
------------------------------------------------------------
Please enter the target RSYNC server port> rsyncPort
------------------------------------------------------------
Please enter the target RSYNC server username> rsyncUser
------------------------------------------------------------
Please enter the target RSYNC server key file location> passwordFilePath
------------------------------------------------------------
Please enter the target RSYNC server model name> modelName
------------------------------------------------------------
sending incremental file list
/tmp/
/tmp/d41d8c*****ecf8427e.data

sent 99 bytes  received 15 bytes  228.00 bytes/sec
total size is 11  speedup is 0.10 (DRY RUN)
Please enter the target directory for project archiving>  rsync******test_dir

# 1.2.3.3 Step description

Enter jobid, which can be customized or generated in the background. Specify jobid when the task fails and retries.
Enter project appid
Enter the start date (outside the most recent month)
Enter the end date (outside the most recent month)
Enter the specified event type (not required) to archive a single event type
Select the type of archival storage rsync
Enter rsync server level ip
Enter rsync server port
Enter tsync user name
Enter the file location of the rsync key and put it in a certain directory. The file permissions shall chmod 600
Enter the model name of rsync (this step will use the information previously entered to verify whether rsync is available)
Enter the archived target directory

# II. Data reloading function (ta_data_reload)

The data reloading function is to import the previously archived data into the TA cluster and use it again. It is generally used when viewing the historical trend.

Make sure you have enough disk space before importing.

# 2.1 Reload commands

#start
ta-tool data_reload start

#stop
ta-tool data_reload stop

#retry
ta-tool data_reload retry -jobid *******

# 2.2 Reload method

# 2.2.1 S3 method

# 2.2.1.1 Environmental preparation

Apply for Amazon S3 service
Create a bucket for archiving. The area of the bucket is recommended to be consistent with the TA cluster server
Create the key to access the bucket

# 2.2.1.2 Sample commands

[ta@ta1 log]$ ta-tool data_reload start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487f6************a9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > s3
------------------------------------------------------------
Please enter S3 AccesskeyID> AK***********3
------------------------------------------------------------
Please enter S3 secretAccessKey> J23w************b
------------------------------------------------------------
Please enter S3 region code> cn*****-1
------------------------------------------------------------
Please enter S3 bucket name>ta*****ve
------------------------------------------------------------
Please enter the target directory for project archiving>  data*******t_1
------------------------------------------------------------

# 2.2.1.3 Step description

Enter jobid, which can be customized or generated in the background. Specify jobid for retry when the task fails.
Enter project appid
Enter the start date (outside the most recent month)
Enter the end date (outside the most recent month)
Enter the specified event type (not required) to archive a single event type
Select S3 as the event type for project archiving
Enter the accesskeyid for s3
Enter the secretAccessKey (managed in the S3 IAM service)
Specify bucket (opens new window) region code
Enter bucket name
Select the storage type (opens new window) (standard mode by default).If the storage type is GLACIER and DEEP_ARCHIVE, please do the data thawing operation in S3 in advance, otherwise the data is not allowed to be pulled
The archived target directory (the directory will be created under the target bucket, and the archived data will be placed in the directory)

Note: When entering parameters, ensure that the bucket name and directory path are consistent with the archive.

# 2.2.2 HDFS method

# 2.2.2.1 Environment preparation

Prepare HDFS environment interworking with TA cluster network

# 2.2.2.2 Sample commands

[ta@ta1 log]$ ta-tool data_reload start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487*******************9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > hdfs
------------------------------------------------------------
Please enter HFDS URL address for project archiving> hdfs-nm-url
------------------------------------------------------------
Please enter the target directory for project archiving>  hdfs******test
------------------------------------------------------------

# 2.2.2.3 Step description

Enter jobid, which can be customized or generated in the background. Specify jobid for retry when the task fails.
Enter project appid
Enter the start date (outside the most recent month)
Enter the end date (outside the most recent month)
Enter the specified event type (not required) to archive a single event type
Select hdfs as the event type for project archiving
Enter the hdfs address of the writing end, if the port defaults to fill in the hostname
Enter the user name of hdfs on the writing end
Enter the archived target directory

Note: When entering parameters, it is guaranteed to be consistent with the directory path when archiving.

# 2.2.3 rsync method

# 2.2.3.1 Environment preparation

Use rsync daemon mode to set up the server, and copy the secret key file to the command running node in the TA cluster

# 2.2.3.2 Sample commands

[ta@ta1 log]$ ta-tool data_reload start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 54****************9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > rsync
------------------------------------------------------------
Please enter the target RSYNC server IP address> rsyncIp
------------------------------------------------------------
Please enter the target RSYNC server port> rsyncPort
------------------------------------------------------------
Please enter the target RSYNC server user name> rsyncUser
------------------------------------------------------------
Please enter the target RSYNC server key file location> passwordFilePath
------------------------------------------------------------
Please enter the target RSYNC server model name> modelName
------------------------------------------------------------
sending incremental file list
/tmp/
/tmp/d41d8cd98f00b204e9800998ecf8427e.data
sent 99 bytes  received 15 bytes  20.73 bytes/sec
total size is 11  speedup is 0.10 (DRY RUN)
Please enter the target directory for project archiving>  rsync******test_dir

# 2.2.3.3 Step description

Enter jobid, which can be customized or generated in the background. Specify jobid for retry when the task fails.
Enter project appid
Enter the start date (outside the most recent month)
Enter the end date (outside the most recent month)
Enter the specified event type (not required) to archive a single event type
Select rsync as the type of archival storage
Enter rsync server level ip
Enter rsync server port
Enter tsync username
Enter the file location of the rsync key and put it in a certain directory. The file permissions shall chmod 600
Enter the model name of rsync (this step will use the information previously entered to verify whether rsync is available)
Enter the archived target directory

Note: When entering parameters, it is guaranteed to be consistent with the directory path when archiving.

← Data Backtrack Function TaDataWriter Plug-in →