# 데이터 아카이브와 리로드

# 1. 데이터 아카이브 기능 (ta_data_archive)

데이터 아카이브 기능은 일부 히스토리 데이터 또는 당분간 사용할 필요가 없는 데이터를 저렴한 저장소로 마이그레이션하고 아카이브하여 TA 클러스터의 디스크 리소스를 해제하고 비용을 절감하는 것입니다.

# 1.1 아카이브 명령어

#start
ta-tool data_archive start

#stop
ta-tool data_archive stop

#retry
ta-tool data_archive retry -jobid *******

# 1.2 아카이브 방법

# 1.2.1 S3 방법

# 1.2.1.1 환경 준비

Amazon S3 서비스 신청
아카이브용 버킷(Bucket) 생성, 버킷의 지역은 TA 클러스터 서버와 일치하는 것을 권장
버킷에 접근하기 위한 키 생성

# 1.2.1.2 샘플 명령어

[ta@ta1 ~]$ ta-tool data_archive start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487f6b**********f9c379aa9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time for project archiving: YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > s3
------------------------------------------------------------
Please enter S3 AccesskeyID> AK************YO6G3
------------------------------------------------------------
Please enter S3 secretAccessKey> J23************rZb
------------------------------------------------------------
Please enter S3 region code> cn-****-1
------------------------------------------------------------
Please enter S3 bucket name>ta************ive
------------------------------------------------------------
Please enter the S3 file storage class (default: STANDARD)> S*****D
------------------------------------------------------------
Please enter the target directory for project archiving>  data*****_test
------------------------------------------------------------

# 1.2.1.3 단계 설명

jobid 입력, 사용자 지정 가능 또는 백그라운드에서 생성됨. 작업 실패 시 재시도를 위해 jobid 지정
프로젝트 appid 입력
시작 날짜 입력 (최근 한 달 이외)
종료 날짜 입력 (최근 한 달 이외)
특정 이벤트 타입 입력 (필수 아님), 단일 이벤트 타입 아카이브
아카이브 저장 유형 S3 선택
s3의 accesskeyid 입력
secretAccessKey 입력 (S3 IAM 서비스에서 관리)
버킷 (opens new window) 지역 코드 지정
버킷 이름 입력
저장 유형 (opens new window) 선택 (기본 모드로 설정). GLACIER 및 DEEP_ARCHIVE 저장 클래스는 저비용 데이터 아카이브용으로 설계되었지만, 데이터 복구 시 해동 작업이 필요하여 더 번거로움
아카이브 대상 디렉토리 지정 (대상 버킷 아래에 디렉토리가 생성되고, 아카이브된 데이터는 해당 디렉토리에 저장됨)

# 1.2.2 HDFS 방법

# 1.2.2.1 환경 준비

TA 클러스터 네트워크와 상호작용하는 HDFS 환경 준비

# 1.2.2.2 샘플 명령어

[ta@ta1 ~]$ ta-tool data_archive start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487************a9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > hdfs
------------------------------------------------------------
Please enter HFDS URL address for project archiving> hdfs-nm-url
------------------------------------------------------------
Please enter HFDS user name for project archiving> hdfsUserName
------------------------------------------------------------
Please enter the target directory for project archiving>  hdfs******test
------------------------------------------------------------

# 1.2.2.3 단계 설명

jobid 입력, 사용자 지정 가능 또는 백그라운드에서 생성됨. 작업 실패 시 재시도를 위해 jobid 지정
프로젝트 appid 입력
시작 날짜 입력 (최근 한 달 이외)
종료 날짜 입력 (최근 한 달 이외)
특정 이벤트 타입 입력 (필수 아님), 단일 이벤트 타입 아카이브
아카이브 저장 유형 hdfs 선택
작성 엔드의 hdfs 주소 입력, 포트 기본값은 호스트 이름으로 입력
작성 엔드의 hdfs 사용자 이름 입력
아카이브 대상 디렉토리 입력, 절대 경로 사용 권장, 그렇지 않으면 /user/hdfs 사용자 디렉토리/대상 디렉토리에 저장됨

# 1.2.3 rsync 방법

# 1.2.3.1 환경 준비

rsync 데몬 모드를 사용하여 서버 설정, 비밀 키 파일을 TA 클러스터의 명령 실행 노드에 복사

# 1.2.3.2 샘플 명령어

[ta@ta1 ~]$ ta-tool data_archive start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 548*****************9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > rsync
------------------------------------------------------------
Please enter the target RSYNC server IP address> rsyncIp
------------------------------------------------------------
Please enter the target RSYNC server port> rsyncPort
------------------------------------------------------------
Please enter the target RSYNC server username> rsyncUser
------------------------------------------------------------
Please enter the target RSYNC server key file location> passwordFilePath
------------------------------------------------------------
Please enter the target RSYNC server model name> modelName
------------------------------------------------------------
sending incremental file list
/tmp/
/tmp/d41d8c*****ecf8427e.data

sent 99 bytes  received 15 bytes  228.00 bytes/sec
total size is 11  speedup is 0.10 (DRY RUN)
Please enter the target directory for project archiving>  rsync******test_dir

# 1.2.3.3 단계 설명

jobid 입력, 사용자 지정 가능 또는 백그라운드에서 생성됨. 작업 실패 시 재시도를 위해 jobid 지정
프로젝트 appid 입력
시작 날짜 입력 (최근 한 달 이외)
종료 날짜 입력 (최근 한 달 이외)
특정 이벤트 타입 입력 (필수 아님), 단일 이벤트 타입 아카이브
아카이브 저장 유형 rsync 선택
rsync 서버 레벨 ip 입력
rsync 서버 포트 입력
tsync 사용자 이름 입력
rsync 키 파일 위치 입력, 특정 디렉토리에 저장, 파일 권한은 chmod 600으로 설정
rsync 모델 이름 입력 (이 단계에서 이전에 입력한 정보를 사용하여 rsync가 사용 가능한지 확인)
아카이브 대상 디렉토리 입력

# 2. 데이터 리로드 기능 (ta_data_reload)

데이터 리로드 기능은 이전에 아카이브된 데이터를 TA 클러스터에 다시 불러와 재사용하는 기능입니다. 주로 히스토리 트렌드를 볼 때 사용됩니다. 가져오기 전에 충분한 디스크 공간을 확보하세요.

# 2.1 리로드 명령어

#start
ta-tool data_reload start

#stop
ta-tool data_reload stop

#retry
ta-tool data_reload retry -jobid *******

# 2.2 리로드 방법

# 2.2.1 S3 방법

# 2.2.1.1 환경 준비

Amazon S3 서비스 신청
아카이브용 버킷 생성. 버킷의 지역은 TA 클러스터 서버와 일치하는 것을 권장
버킷에 접근하기 위한 키 생성

# 2.2.1.2 샘플 명령어

[ta@ta1 log]$ ta-tool data_reload start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487f6************a9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > s3
------------------------------------------------------------
Please enter S3 AccesskeyID> AK***********3
------------------------------------------------------------
Please enter S3 secretAccessKey> J23w************b
------------------------------------------------------------
Please enter S3 region code> cn*****-1
------------------------------------------------------------
Please enter S3 bucket name>ta*****ve
------------------------------------------------------------
Please enter the target directory for project archiving>  data*******t_1
------------------------------------------------------------

# 2.2.1.3 단계 설명

jobid 입력, 사용자 지정 가능 또는 백그라운드에서 생성됨. 작업 실패 시 재시도를 위해 jobid 지정
프로젝트 appid 입력
시작 날짜 입력 (최근 한 달 이외)
종료 날짜 입력 (최근 한 달 이외)
특정 이벤트 타입 입력 (필수 아님), 단일 이벤트 타입 아카이브
프로젝트 아카이브 이벤트 타입으로 S3 선택
s3의 accesskeyid 입력
secretAccessKey 입력 (S3 IAM 서비스에서 관리)
버킷 (opens new window) 지역 코드 지정
버킷 이름 입력
저장 유형 (opens new window) 선택 (기본 모드로 설정). 저장 유형이 GLACIER 및 DEEP_ARCHIVE인 경우 사전에 S3에서 데이터 해동 작업을 수행해야 하며, 그렇지 않으면 데이터를 가져올 수 없음
아카이브 대상 디렉토리 (대상 버킷 아래에 디렉토리가 생성되고, 아카이브된 데이터는 해당 디렉토리에 저장됨)

주의: 파라미터를 입력할 때 버킷 이름과 디렉토리 경로가 아카이브와 일치해야 함

# 2.2.2 HDFS 방법

# 2.2.2.1 환경 준비

TA 클러스터 네트워크와 상호작용하는 HDFS 환경 준비

# 2.2.2.2 샘플 명령어

[ta@ta1 log]$ ta-tool data_reload start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 5487*******************9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > hdfs
------------------------------------------------------------
Please enter HFDS URL address for project archiving> hdfs-nm-url
------------------------------------------------------------
Please enter the target directory for project archiving>  hdfs******test
------------------------------------------------------------

# 2.2.2.3 단계 설명

jobid 입력, 사용자 지정 가능 또는 백그라운드에서 생성됨. 작업 실패 시 재시도를 위해 jobid 지정
프로젝트 appid 입력
시작 날짜 입력 (최근 한 달 이외)
종료 날짜 입력 (최근 한 달 이외)
특정 이벤트 타입 입력 (필수 아님), 단일 이벤트 타입 아카이브
프로젝트 아카이브 이벤트 타입으로 hdfs 선택
작성 엔드의 hdfs 주소 입력, 포트 기본값은 호스트 이름으로 입력
작성 엔드의 hdfs 사용자 이름 입력
아카이브 대상 디렉토리 입력

주의: 파라미터를 입력할 때 아카이브 경로와 일치하도록 보장

# 2.2.3 rsync 방법

# 2.2.3.1 환경 준비

rsync 데몬 모드를 사용하여 서버 설정, 비밀 키 파일을 TA 클러스터의 명령 실행 노드에 복사

# 2.2.3.2 샘플 명령어

[ta@ta1 log]$ ta-tool data_reload start
Please enter JobId for this job without background random generation>
------------------------------------------------------------
Please enter the project appid that needs to be archived> 54****************9bb
------------------------------------------------------------
Please enter the start time for project archiving: YYYY-MM-DD > 2018-01-01
------------------------------------------------------------
Please enter the end time of project archiving：YYYY-MM-DD > 2018-12-31
------------------------------------------------------------
Please enter the event type for project archiving (not required)>
------------------------------------------------------------
Please enter the type of archive storage: hdfs or rsync or s3 > rsync
------------------------------------------------------------
Please enter the target RSYNC server IP address> rsyncIp
------------------------------------------------------------
Please enter the target RSYNC server port> rsyncPort
------------------------------------------------------------
Please enter the target RSYNC server user name> rsyncUser
------------------------------------------------------------
Please enter the target RSYNC server key file location> passwordFilePath
------------------------------------------------------------
Please enter the target RSYNC server model name> modelName
------------------------------------------------------------
sending incremental file list
/tmp/
/tmp/d41d8cd98f00b204e9800998ecf8427e.data
sent 99 bytes  received 15 bytes  20.73 bytes/sec
total size is 11  speedup is 0.10 (DRY RUN)
Please enter the target directory for project archiving>  rsync******test_dir

# 2.2.3.3 단계 설명

jobid 입력, 사용자 지정 가능 또는 백그라운드에서 생성됨. 작업 실패 시 재시도를 위해 jobid 지정
프로젝트 appid 입력
시작 날짜 입력 (최근 한 달 이외)
종료 날짜 입력 (최근 한 달 이외)
특정 이벤트 타입 입력 (필수 아님), 단일 이벤트 타입 아카이브
rsync를 아카이브 저장 유형으로 선택
rsync 서버 레벨 ip 입력
rsync 서버 포트 입력
tsync 사용자 이름 입력
rsync 키 파일 위치 입력, 특정 디렉토리에 저장, 파일 권한은 chmod 600으로 설정
rsync 모델 이름 입력 (이 단계에서 이전에 입력한 정보를 사용하여 rsync가 사용 가능한지 확인)
아카이브 대상 디렉토리 입력

주의: 파라미터를 입력할 때 아카이브 경로와 일치하도록 보장

← 데이터 백트랙 기능 TaDataWriter 플러그인 →