# Data Archiving and Reloading
# I. Data archiving function (ta_data_archive)
The data archiving function is to migrate and archive some historical data or data that does not need to be used for the time being to cheap storage, so as to release the disk resources of the TA cluster and save the use cost.
# 1.1 Archiving commands
#start
ta-tool data_archive start
#stop
ta-tool data_archive stop
#retry
ta-tool data_archive retry -jobid *******
# 1.2 Archiving method
# 1.2.1 S3 method
# 1.2.1.1 Environmental preparation
- Apply for Amazon S3 service
- Create a bucket (Bucket) for archiving, and the area of the bucket is recommended to be consistent with the TA cluster server
- Create the key to access the bucket
# 1.2.1.2 Sample commands
[ta@ta1 ~]$ ta-tool data_archive start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487f6b**********f9c379aa9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time for project archiving: YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > s3
------------------------------------------------------------
Please enter S3 AccesskeyID> AK************YO6G3
------------------------------------------------------------
Please enter S3 secretAccessKey> J23************rZb
------------------------------------------------------------
Please enter S3 region code> cn-****-1
------------------------------------------------------------
Please enter S3 bucket name>ta************ive
------------------------------------------------------------
Please enter the S3 file storage class (default: STANDARD)> S*****D
------------------------------------------------------------
Please enter the target directory for project archiving> data*****_test
------------------------------------------------------------
# 1.2.1.3 Step description
- Enter jobid, which can be customized or generated in the background. Specify jobid when the task fails and retries.
- Enter project appid
- Enter the start date (outside the most recent month)
- Enter the end date (outside the most recent month)
- Enter the specified event type (not required) to archive a single event type
- Select the type of archival storage S3
- Enter the accesskeyid for s3
- Enter the secretAccessKey (managed in the S3 IAM service)
- Specify bucket (opens new window) region code
- Enter bucket name
- Select the storage type (opens new window) (standard mode by default). The GLACIER and DEEP_ARCHIVE storage classes in the storage type are designed for low-cost data archiving, but they require thawing operations during data recovery. It is more cumbersome.
- The archived target directory (the directory will be created under the target bucket, and the archived data will be placed in the directory)
# 1.2.2 HDFS method
# 1.2.2.1 Environmental preparation
- Prepare HDFS environment interworking with TA cluster network
# 1.2.2.2 Sample commands
[ta@ta1 ~]$ ta-tool data_archive start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487************a9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving:YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > hdfs
------------------------------------------------------------
Please enter HFDS URL address for project archiving> hdfs-nm-url
------------------------------------------------------------
Please enter HFDS user name for project archiving> hdfsUserName
------------------------------------------------------------
Please enter the target directory for project archiving> hdfs******test
------------------------------------------------------------
# 1.2.2.3 Step description
- Enter jobid, which can be customized or generated in the background. Specify jobid when the task fails and retries.
- Enter project appid
- Enter the start date (outside the most recent month)
- Enter the end date (outside the most recent month)
- Enter the specified event type (not required) to archive a single event type
- Select the type of archival storage hdfs
- Enter the hdfs address of the writing end, if the port defaults to fill in the hostname
- Enter the user name of hdfs on the writing end
- Enter the target directory for archiving. It is recommended to use an absolute path, otherwise it will be stored in the/user/hdfs user directory/target directory/
# 1.2.3 rsync method
# 1.2.3.1 Environmental preparation
- Use rsync daemon mode to set up the server, and copy the secret key file to the command running node in the TA cluster
# 1.2.3.2 Sample commands
[ta@ta1 ~]$ ta-tool data_archive start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 548*****************9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving:YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > rsync
------------------------------------------------------------
Please enter the target RSYNC server IP address> rsyncIp
------------------------------------------------------------
Please enter the target RSYNC server port> rsyncPort
------------------------------------------------------------
Please enter the target RSYNC server username> rsyncUser
------------------------------------------------------------
Please enter the target RSYNC server key file location> passwordFilePath
------------------------------------------------------------
Please enter the target RSYNC server model name> modelName
------------------------------------------------------------
sending incremental file list
/tmp/
/tmp/d41d8c*****ecf8427e.data
sent 99 bytes received 15 bytes 228.00 bytes/sec
total size is 11 speedup is 0.10 (DRY RUN)
Please enter the target directory for project archiving> rsync******test_dir
# 1.2.3.3 Step description
- Enter jobid, which can be customized or generated in the background. Specify jobid when the task fails and retries.
- Enter project appid
- Enter the start date (outside the most recent month)
- Enter the end date (outside the most recent month)
- Enter the specified event type (not required) to archive a single event type
- Select the type of archival storage rsync
- Enter rsync server level ip
- Enter rsync server port
- Enter tsync user name
- Enter the file location of the rsync key and put it in a certain directory. The file permissions shall
chmod 600
- Enter the model name of rsync (this step will use the information previously entered to verify whether rsync is available)
- Enter the archived target directory
# II. Data reloading function (ta_data_reload)
The data reloading function is to import the previously archived data into the TA cluster and use it again. It is generally used when viewing the historical trend.
Make sure you have enough disk space before importing.
# 2.1 Reload commands
#start
ta-tool data_reload start
#stop
ta-tool data_reload stop
#retry
ta-tool data_reload retry -jobid *******
# 2.2 Reload method
# 2.2.1 S3 method
# 2.2.1.1 Environmental preparation
- Apply for Amazon S3 service
- Create a bucket for archiving. The area of the bucket is recommended to be consistent with the TA cluster server
- Create the key to access the bucket
# 2.2.1.2 Sample commands
[ta@ta1 log]$ ta-tool data_reload start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487f6************a9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving:YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > s3
------------------------------------------------------------
Please enter S3 AccesskeyID> AK***********3
------------------------------------------------------------
Please enter S3 secretAccessKey> J23w************b
------------------------------------------------------------
Please enter S3 region code> cn*****-1
------------------------------------------------------------
Please enter S3 bucket name>ta*****ve
------------------------------------------------------------
Please enter the target directory for project archiving> data*******t_1
------------------------------------------------------------
# 2.2.1.3 Step description
- Enter jobid, which can be customized or generated in the background. Specify jobid for retry when the task fails.
- Enter project appid
- Enter the start date (outside the most recent month)
- Enter the end date (outside the most recent month)
- Enter the specified event type (not required) to archive a single event type
- Select S3 as the event type for project archiving
- Enter the accesskeyid for s3
- Enter the secretAccessKey (managed in the S3 IAM service)
- Specify bucket (opens new window) region code
- Enter bucket name
- Select the storage type (opens new window) (standard mode by default).If the storage type is GLACIER and DEEP_ARCHIVE, please do the data thawing operation in S3 in advance, otherwise the data is not allowed to be pulled
- The archived target directory (the directory will be created under the target bucket, and the archived data will be placed in the directory)
Note: When entering parameters, ensure that the bucket name and directory path are consistent with the archive.
# 2.2.2 HDFS method
# 2.2.2.1 Environment preparation
- Prepare HDFS environment interworking with TA cluster network
# 2.2.2.2 Sample commands
[ta@ta1 log]$ ta-tool data_reload start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487*******************9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving:YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > hdfs
------------------------------------------------------------
Please enter HFDS URL address for project archiving> hdfs-nm-url
------------------------------------------------------------
Please enter the target directory for project archiving> hdfs******test
------------------------------------------------------------
# 2.2.2.3 Step description
- Enter jobid, which can be customized or generated in the background. Specify jobid for retry when the task fails.
- Enter project appid
- Enter the start date (outside the most recent month)
- Enter the end date (outside the most recent month)
- Enter the specified event type (not required) to archive a single event type
- Select hdfs as the event type for project archiving
- Enter the hdfs address of the writing end, if the port defaults to fill in the hostname
- Enter the user name of hdfs on the writing end
- Enter the archived target directory
Note: When entering parameters, it is guaranteed to be consistent with the directory path when archiving.
# 2.2.3 rsync method
# 2.2.3.1 Environment preparation
- Use rsync daemon mode to set up the server, and copy the secret key file to the command running node in the TA cluster
# 2.2.3.2 Sample commands
[ta@ta1 log]$ ta-tool data_reload start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 54****************9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving:YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > rsync
------------------------------------------------------------
Please enter the target RSYNC server IP address> rsyncIp
------------------------------------------------------------
Please enter the target RSYNC server port> rsyncPort
------------------------------------------------------------
Please enter the target RSYNC server user name> rsyncUser
------------------------------------------------------------
Please enter the target RSYNC server key file location> passwordFilePath
------------------------------------------------------------
Please enter the target RSYNC server model name> modelName
------------------------------------------------------------
sending incremental file list
/tmp/
/tmp/d41d8cd98f00b204e9800998ecf8427e.data
sent 99 bytes received 15 bytes 20.73 bytes/sec
total size is 11 speedup is 0.10 (DRY RUN)
Please enter the target directory for project archiving> rsync******test_dir
# 2.2.3.3 Step description
- Enter jobid, which can be customized or generated in the background. Specify jobid for retry when the task fails.
- Enter project appid
- Enter the start date (outside the most recent month)
- Enter the end date (outside the most recent month)
- Enter the specified event type (not required) to archive a single event type
- Select rsync as the type of archival storage
- Enter rsync server level ip
- Enter rsync server port
- Enter tsync username
- Enter the file location of the rsync key and put it in a certain directory. The file permissions shall
chmod 600
- Enter the model name of rsync (this step will use the information previously entered to verify whether rsync is available)
- Enter the archived target directory
Note: When entering parameters, it is guaranteed to be consistent with the directory path when archiving.