ACC/Mean transformation
=======================

Input: Sequences transformed with 5PC-scores or any other values. Each entry data
  need to be listed in a single line (no line break). Each line starts with the 
  entry ID followed by the given number of values.  For 5 values, for example:
  ID aa1var1 aa1var2 aa1var3 aa1var4 aa1var5 aa2var1 aa2var2 aa2var3 ...
  
Output types: 
Auto/crosscovariance - calculated as (1/n)Sigma[X(i,t) - X^(i)][X(j,t+k) - X^(j)], 
  where n is the amino acid length, i or j is the variables (i, j = 1, ..., m), 
  t is the amino acid position (t = 1, ..., n-k), k is the lag size (k = 0, ..., k_max, 
  and k_max is the maximum lag size), and X^(i) or X^(j) is the mean of the 
  variable X(i) or X(j), which is calculated as (1/n)Sigma[X(i, t)], where 
  t = 1, ..., n. This is based on the implementation in R/S-plus. Note that k 
  starts from 0.
Standardized auto/crosscovariance (auto/crosscorrelation) - calculated as
  C(k)/C(0), where C(k) is the auto/crosscovariance at the lag size of k as 
  shown above, and C(0) is calculated as sqrt[Sigma[Var(i)*Var(j)].
Wold's ACC - calculated following the original method by Wold et al. (1993, 
  Anal. Chim. Acta 277:239), which is {1/(n - k)}SigmaX(i,t)X(j,t+k), where n is 
  the amino acid length, i or j is the variables (i, j = 1, ..., m), t is the 
  amino acid position (t = 1, ..., n-k), k is the lag size (k = 1, ..., k_max, 
  and k_max is the maximum lag size). Note that k starts from 1.
Mean - calculated as (1/n)Sigma(PCi), where n is the amino acid length and
  i = 1, ..., m (for m variables, 5 for PC5).

Output file format:
[PLS format] this format can be used as the input file for the PLS analysis 
  using R and the provided model file, PLS-ACC.Rdata. The first column includes the
  entry ID followed by ACC values, acc(i, j, k), where i, j = 1, ..., m for m scores, 
  and the lag size of k. The order of acc values with m = 2, k = 3 is as follows:
  acc(1,1,1), acc(1,1,2), acc(1,1,3), acc(1,2,1), acc(1,2,2), ..., acc(2,2,1), acc(2,2,2), acc(2,2,3)

[SVM_light format] this can be used as the input file for SVM-light. The 
  format is shown below. Note that the first letter '0' is not the entry name, 
  but it is used to identify the type of each sequence (1 for a positive sample, 
  -1 for a negative sample, or 0 for an unknown sample). In this output, 
  all sequences are assigned with '0'. The following example is for acc values with 
  m = 2, k = 3:

0 1:acc(1,1,1) 2:acc(1,2,1) 3:acc(2,1,1) 4:acc(2,2,1) 5:acc(1,1,2) 6:acc(1,2,2) 7:acc(2,1,2) 8:acc(2,2,2) 9:acc(1,1,3) 10:acc(1,2,3) 11:acc(2,1,3) 12:acc(2,2,3)

[TAB-delimited flat table] this is a simple table format with the sequence ID
  in the first column. The order of acc values is the same as for the SVM-light format.
  
Standardized variables: this option is to obtain standardized variables without 
  performing ACC transformation. The standardization is performed as
  [X(i,j) - X^(i)]/rms(i), where i is the i-th variable, j is an amino acid position,
  X^(i) is the mean of the variable X(i), and rms(i) is calculated as
  rms(i) = sqrt{Sigma[X(i,j)-X^(i)]^2/(n-1)}
  
[TAB-delimited 3-way table] each amino acid is converted to corresponding five
  scores. The format is shown below. The first column is the sequence ID, 2nd
  column is the aa position, followed by the standardized five scores.

  At1g11000.1	1	0.8672771	0.22258	-1.562412	-0.4718294	-0.1297437
  At1g11000.1	2	-0.2388994	1.011137	-0.9719389	-2.042766	1.008086
  At1g11000.1	3	0.05577879	1.442711	-1.010034	-0.1761236	-2.597505

  The output uses N lines for an entry with N amino acids. This format is 
  convenient for using 'sapply' R/S function (e.g., to obtain the column mean).