Tuesday, December 4, 2012

Archaeal COG (arCOG) Sequence Statistics

Sequence length histograms of unique sequences compared with well characterized (wc) COG data.

Note. for wcCOG,

  • Maximum sequence length: 5627
  • Minimum sequence length: 18
  • Mean sequence length: 888
  • Median sequence length: 802






Sequence length histograms of unique sequences compared with well characterized (wc) COG data for sequences with length <= 1000.


Wednesday, September 12, 2012


COG 183362+4872 NW PID Log(1-d^6) with Sammon


Description

DataSet: COG Size: 183362+4872 Unique: Yes* (5 from 4872 overlaps with 183362)
Aligner: NeedlemanWunsch ScoringMatrix: BLOSUM62 GapOpen: -9 GapExt: -1
DistanceType: (1 - PercentIdentity) Transformation: TM12,TP6
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All

Links

Images


Full Sample with Selected Clusters


Full Sample with Selected Clusters Zoomed-in


Full Sample with Selected Clusters and Consensus

Tuesday, May 29, 2012

COG 95672 NW PID Log(1-d^4) with Sammon


Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: NeedlemanWunsch ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM12,TP4
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters




Full Sample with Selected Clusters Zoomed-in


COG 95672 NW PID Log(1-d^2) with Sammon


Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: NeedlemanWunsch ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM12,TP2
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters




Full Sample with Selected Clusters Zoomed-in


Friday, May 25, 2012

COG 95672 NW PID Log(1-d^6) with Sammon


Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: NeedlemanWunsch ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM12,TP6
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters


Full Sample with Selected Clusters Zoomed-in


Full Sample with Selected Clusters Zoomed-in Further




Wednesday, May 23, 2012

COG 95672 NW PID Log(1-d) with Sammon

Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: NeedlemanWunsch ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM12
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images

 

Full Sample with Selected Clusters



Full Sample with Selected Clusters Zoomed-in

Friday, March 30, 2012

The Role of Seven Clusters

The 7 clusters were chosen early on as interesting ways of looking at value of transformation. They are
COG0444 137 members
COG4608 130 members
COG1131 240 members
COG1126 114 members
COG1136 195 members
COG3842 110 members
COG3849 135 members

We show analysis in terms of
Original distance versus Euclidean 3D map
and
Original Distance for two different methods

The intercluster is collection of all pairs of points inside same cluster and this can measure how well individual clusters are mapped
The intracluster is collection of all pairs -- one in one of seven clusters; the other in another. The quality of these plots measures the relative placement of clusters

Wednesday, March 28, 2012

Distance Types

Distance between a given pair of sequences is calculated depending on the alignment resulting from running an algorithm like Smith-Waterman, Needleman-Wunsch, or Blast. In general the alignment of two sequences may appear as shown below.

Each square represents a character and dashes indicate gaps. Characters and gaps existing outside the aligned region is possible only with a global alignment algorithm like Needleman-Wunsch. Local alignments resulting from both Smith-Waterman and Blast will have aligned region being identical to the entire alignment. Also, note that with a local alignment the starting pair and ending pair of characters from the two sequences will not include a gap character. 

If the two aligned sequences are S1 and S2 then aligned region is defined from StartIndex to EndIndex inclusively defined as below.

StartIndex = Max (S1.FirstNonGapIndex, S2.FirstNonGapIndex)
EndIndex = Min(S1.LastNonGapIndex, S2.LastNonGapIndex)

The length of the aligned region, AlignedLenth = EndIndex - StartIndex + 1

  • Percent Identity (PID) Distance
    • Let NumOfIdenticalPairs be the number of identical pairs within the aligned region. For example in the above picture there are five such pairs (2 greens, 1 purple, 1 blue, 1 red)
    • PID = NumOfIdenticalPairs / AlignedLength
    • Convert percent identity as a distance by taking 1 - PID
  • Score
    • Each pair of aligned characters is assigned a score using the substitution matrix and gap penalties. 
    • The summation of all such scores is called the score of the alignment. The alignment algorithm always tries to maximize this value.
    • See [1] for more details.
  • Normalized Score
    • Compute the score for aligning S1 with S2, S1 with S1, and S2 with S2. Let these values be named as S1S2, S1S1, and S2S2 respectively.
    • NormalizedScore = 2 * S1S2 / (S1S1 + S2S2) 
  • BitScore
    • Blast alignment has a value called BitScore, which is a log scaled version of the Score.
    • See [1,2] for more details. 
  • Normalized BitScore
    • Similar to NormalizedScore, compute BitScore for aligning S1 with S2S1 with S1, and S2 with S2.
    • Used the same formula as in NormalizedScore to compute NormalizedBitScore
References:

FAQ

  1. What does PID stands for? 
    • PID stands for Percent Identity and it implies the particular Manxcat run used (1 - PID) value of each aligned pair of sequences as the distance between the two original sequences corresponding to that particular pair. 
    • See more on different distance types at DistanceTypes
  2. What is Simple Points file?
    • Given the input sequence file used in the particular Manxcat run, the Simple Points file presents 3D coordinates for each sequence in order. These coordinates are computed by the Manxcat program with its best effort to preserve the original distance between each pair. The term original distance refers to the distance (transformed distance if specified - see Distances and Transformations) calculated through aligning the corresponding two sequences.
    • Note. In cases where Blast is used to do the alignment it's possible not to get alignments for certain sequence pairs. In such cases Manxcat may not produce coordinates for all the sequences. Therefore you may find some point numbers are missing in the Simple Points file although they are ordered by the point number. The value of the Distance Cut may also ignore pairs of sequences having a distance value greater than that, resulting similar missing points in the output. 
  3. How do the coordinates in Simple Points correspond to COG clusters?
    • Predefined cluster assignment is available for each sequence in the used set of COG sequences. These are available in the Introduction page.
  4. What is the difference between COG95672 and COG50000? 
  5. Can you please give more details on distance transformations, i.e. Transformation: TM10,TP4 ?
  6. What does DistanceCut: 0.96 mean?
  7. Is PlotViz available for Linux?
    • Currently, PlotViz is available for Windows and Mac environments only.
  8. What are the selected clusters?

Distances and Transformations

Distances are rather arbitrary. We have different "biology" approaches including
a) Needleman Wunsch
b) Smith Waterman Gotoh (SWG) Percentage Identity (PID)
c) SWG Score
d) Blast PID
e) Blast Bitscore

Further given any distance d(i,j), I can replace it by m(d(i,j)) for any monotonic mapping function y-->m(y) satisfying
m(y1) > m(y2) for y1>y2

Typically mapping is chosen to make distribution of distances "more reasonable". Note if you have a high dimensional random distribution, one can show that

D = Formal Dimension = 2 Mean^2 / Sigma^2
showing high dimension corresponds to a standard deviation of distances sigma that is small compared to mean.
High dimension translates into mapped points concentrated at edge  of surface (sphere) when mapped to 3D. So cases like Blast and SWG with a huge peak at distances = 1 have a high dimension as sigma small compared to mean

So we choose mapping m to reduce dimension while retaining ordering of distances. We have looked at several choices of m.

a) Transformation Method 10

m(y) = y^TP where TP is transformation parameter -- TP = 2 4 6 investigated

b) Transformation Method 8
    m(y) maps to 4 dimensions.  If you assume your data  is randomly distributed in a sphere in dimension D, you can analytically derive formula for mapping so m(d) corresponds to points randomly distributed in 2 or 4 dimensions. Transformation method 8 implements this mapping for final dimension 4. Note original data is not random so one doesn't get an exact final dimension but it is typically around 4.

c) Transformation Method 9 or SQRT(4D)
  Here we start with mapping m8(d) mapping to 4 dimensions.
Then we INCREASE formal mapped dimension by
m9(d) = m8(d)^0.5

Note in Transformation Method 10, TP > 1 lowers formal dimension but TP < 1 increases formal dimension.
Thus m9(d) has larger formal dimension than m8 which is ~4. m9(d) for COG has formal dimension around 14



Distance Cut

There is some evidence that distance computations are unreliable when they are near 1. This is illustrated by lack of correlation between Blast and Needleman Wunsch for Blast distances above 0.9.

So in some dimension reduction runs we remove terms where distances are larger than "Distance Cut" which value is specified on input. Remember we are minimizing sum of N(N-1)/2 terms
(distance between i and j - Euclidean distance between mapped i and j)^2
So we drop terms where distance between i and j> Distance Cut. Typically any point i and j are still left with some distances less than cut and so you can determine mapping.
For Blast sometimes distances are not calculated and those terms are also left out of sum

Manxcat is only dimension  reduction program that allows missing distances. It removes any point that has less than or equal Linkcut valid distances. Linkcut is defaulted to 5

COG Project Comments

As well as output of program, we link to plot files to be plotted by Plotviz  http://salsahpc.indiana.edu/pviz3/ (latest version http://salsahpc.indiana.edu/pviz3/data/PVIZ3-0.8.11-win64.exe). Note there are mac and Windows version of Plotviz. Currently no Linux version
We also give typical screendumps of 2D projections of 3D mappings. There are 3 heat maps in each case which scatterplot original v mapped distance for
a) Full data sample
b) Distances between points inside 7 clusters (intra)
c) Distances between points in all pairs of 7 clusters (inter) excluding those cases where points inside same cluster
Note on heat map 2 versions of Euclidean histograms are given -- rotated by 90 degrees
There are too many entries to use normal scatterplots; so we use heat maps that can use sophisticated representation of density

Manxcat is one of our two major dimension reduction programs. It can cope with methods like Sammon that have non-unit weight in sum of distance discrepancies. It can also cope with undefined distances

We will adding more explanation

Note Needleman Wunsch, Blast Bitscore and SWG Score give reasonable answers; SWG PID and Blast PID are no good

Monday, March 26, 2012

COG 95672 SWG Normalized Score No Cut TM10 TP6 with Sammon

Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: SmithWaterman ScoringMatrix: BLOSUM62 GapOpen: -9 GapExt: -1
DistanceType: Normalized SWG Score (2*AB/AA+BB) Transformation: TM10,TP6
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters



Full Sample with Selected Clusters Zoomed-in


Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\cog\100k\input\cog_95672_swg_distance_normscore_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     2
 ChisqPrintConstant:            1
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    95672
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   -1
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           none
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           2
 InitializationOption:          0
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         120
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              0
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               10
 TransformParameter:            6
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              63,82,145,265,362,684,708
 XmaxBound:                     1.5
 Xres:                          50
 YmaxBound:                     1.5
 Yres:                          50

Saturday, March 24, 2012

COG 95672 SWG Normalized Score TM10 TP6 with Sammon

Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: SmithWaterman ScoringMatrix: BLOSUM62 GapOpen: -9 GapExt: -1
DistanceType: Normalized SWG Score (2*AB/AA+BB) Transformation: TM10,TP6
Mapping: Sammon DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters Zoomed-in

Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\cog\100k\input\cog_95672_swg_distance_normscore_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     2
 ChisqPrintConstant:            1
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    95672
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   0.96
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           none
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           2
 InitializationOption:          0
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         120
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              0
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               10
 TransformParameter:            6
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              63,82,145,265,362,684,708
 XmaxBound:                     1
 Xres:                          50
 YmaxBound:                     1
 Yres:                          50

Friday, March 23, 2012

COG 95672 Blast TM10 TP6 with Sammon

Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: Blast ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: Normalized BitScore (2*AB/AA+BB) Transformation: TM10,TP6
Mapping: Sammon DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters



Full Sample with Selected Clusters Zoomed-in


Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\cog\100k\input\cog_95672_bitscore_refined_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     2
 ChisqPrintConstant:            1
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    95672
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   0.96
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           none
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           1
 InitializationOption:          1
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         80
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              100
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               10
 TransformParameter:            6
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              63,82,145,265,362,684,708
 XmaxBound:                     1.8
 Xres:                          50
 YmaxBound:                     1.8
 Yres:                          50

COG 50000 Blast PID TM10 TP4 with Chisq1

Description

DataSet: COG Size: 50000 Unique: Yes
Aligner: Blast ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM10,TP4
Mapping: Chisq1 DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters


Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\cog\100k\input\cog_95672_bitscore_refined_first50k_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     1
 ChisqPrintConstant:            1
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    50000
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   0.96
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           none
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           1
 InitializationOption:          1
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         80
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              100
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               10
 TransformParameter:            4
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              63,82,145,265,362,684,708
 XmaxBound:                     1.8
 Xres:                          50
 YmaxBound:                     1.8
 Yres:                          50

COG 50000 Blast TM10 TP2 with Sammon

Description

DataSet: COG Size: 50000 Unique: Yes
Aligner: Blast ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: Normalized BitScore (2*AB/AA+BB) Transformation: TM10,TP2
Mapping: Sammon DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters


Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\cog\100k\input\cog_95672_bitscore_refined_first50k_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     2
 ChisqPrintConstant:            1
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    50000
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   0.96
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           none
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           1
 InitializationOption:          1
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         80
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              100
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               10
 TransformParameter:            2
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              63,82,145,265,362,684,708
 XmaxBound:                     1.8
 Xres:                          50
 YmaxBound:                     1.8
 Yres:                          50

COG 50000 Blast Untransformed with Chisq1

Description

DataSet: COG Size: 50000 Unique: Yes
Aligner: Blast ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: Normalized BitScore (2*AB/AA+BB) Transformation: None
Mapping: Chisq1 DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters


Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\cog\100k\input\cog_95672_bitscore_refined_first50k_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     1
 ChisqPrintConstant:            1
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    50000
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   0.96
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           none
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           1
 InitializationOption:          1
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         80
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              100
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               0
 TransformParameter:            0.125
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              63,82,145,265,362,684,708
 XmaxBound:                     1.8
 Xres:                          50
 YmaxBound:                     1.8
 Yres:                          50

COG 95672 SWG PID TM10 TP4 with Sammon

Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: SmithWaterman ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM10,TP4
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters


Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\cog\100k\input\cog_95672_swg_blosum62_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     2
 ChisqPrintConstant:            1
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    95672
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   -1
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           none
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           1
 InitializationOption:          1
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         80
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              100
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               10
 TransformParameter:            4
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              63,82,145,265,362,684,708
 XmaxBound:                     1
 Xres:                          50
 YmaxBound:                     1
 Yres:                          50

Wednesday, March 21, 2012

COG 95672 NW PID SQRT4D with Sammon

Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: NeedlemanWunsch ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM9
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters



Full Sample with Selected Clusters Zoomed-in



Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\cog\100k\input\cog_95672_nw_4d_sqrt_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     2
 ChisqPrintConstant:            1
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    95672
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   -1
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           none
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           1
 InitializationOption:          1
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         80
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              100
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               0
 TransformParameter:            0.125
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              63,82,145,265,362,684,708
 XmaxBound:                     1
 Xres:                          50
 YmaxBound:                     1
 Yres:                          50

Tuesday, March 20, 2012

COG 95672 NW PID Untransformed with Sammon

Description

DataSet: COG Size: 95672 Unique: Yes
Aligner: NeedlemanWunsch ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: None
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85

Links

Images


Full Sample with Selected Clusters


Configuration

I/O
 CoordinateWriteFrequency:      0
 DistanceMatrixFile:            F:\Salsa\saliya\cog\100k\input\cog_95672_nw_c#.bin


ManxcatCore
 AddonforQcomputation:          2
 CalcFixedCrossFixed:           True
 CGResidualLimit:               1E-05
 ChisqChangePerPoint:           0.001
 Chisqnorm:                     2
 ChisqPrintConstant:            1
 ConversionInformation:         
 ConversionOption:              
 DataPoints:                    95672
 Derivtest:                     False
 DiskDistanceOption:            2
 DistanceCut:                   -1
 DistanceFormula:               1
 DistanceProcessingOption:      0
 DistanceWeigthsCuts:           
 Eigenvaluechange:              0.001
 Eigenvectorchange:             0.001
 ExtraOption1:                  0
 Extraprecision:                0.05
 FixedPointCriterion:           none
 FletcherRho:                   0.25
 FletcherSigma:                 0.75
 FullSecondDerivativeOption:    0
 FunctionErrorCalcMultiplier:   10
 HistogramBinCount              100
 InitializationLoops:           1
 InitializationOption:          1
 InitialSteepestDescents:       0
 LinkCut:                       5
 LocalVectorDimension:          3
 Maxit:                         80
 MinimumDistance:               -0.001
 MPIIOStrategy:                 0
 Nbadgo:                        6
 Omega:                         1.25
 OmegaOption:                   0
 PowerIterationLimit:           200
 ProcessingOption:              100
 QgoodReductionFactor:          0.5
 QHighInitialFactor:            0.01
 QLimiscalecalculationInterval: 1
 RotationOption:                0
 Selectedfixedpoints:           
 Selectedvariedpoints:          
 StoredDistanceOption:          2
 TimeCutmillisec:               -1
 TransformMethod:               0
 TransformParameter:            0.125
 UndefindDistanceValue:         -1
 VariedPointCriterion:          all
 WeightingOption:               0
 Write2Das3D:                   True


Density
 Alpha:                         2
 Pcutf:                         0.85
 SelectedClusters:              63,82,145,265,362,684,708
 XmaxBound:                     1.2
 Xres:                          50
 YmaxBound:                     1.2
 Yres:                          50