The 7 clusters were chosen early on as interesting ways of looking at value of transformation. They are
COG0444 137 members
COG4608 130 members
COG1131 240 members
COG1126 114 members
COG1136 195 members
COG3842 110 members
COG3849 135 members
We show analysis in terms of
Original distance versus Euclidean 3D map
and
Original Distance for two different methods
The intercluster is collection of all pairs of points inside same cluster and this can measure how well individual clusters are mapped
The intracluster is collection of all pairs -- one in one of seven clusters; the other in another. The quality of these plots measures the relative placement of clusters
Friday, March 30, 2012
Wednesday, March 28, 2012
Distance Types
Distance between a given pair of sequences is calculated depending on the alignment resulting from running an algorithm like Smith-Waterman, Needleman-Wunsch, or Blast. In general the alignment of two sequences may appear as shown below.
Each square represents a character and dashes indicate gaps. Characters and gaps existing outside the aligned region is possible only with a global alignment algorithm like Needleman-Wunsch. Local alignments resulting from both Smith-Waterman and Blast will have aligned region being identical to the entire alignment. Also, note that with a local alignment the starting pair and ending pair of characters from the two sequences will not include a gap character.
If the two aligned sequences are S1 and S2 then aligned region is defined from StartIndex to EndIndex inclusively defined as below.
StartIndex = Max (S1.FirstNonGapIndex, S2.FirstNonGapIndex)
EndIndex = Min(S1.LastNonGapIndex, S2.LastNonGapIndex)
The length of the aligned region, AlignedLenth = EndIndex - StartIndex + 1
- Percent Identity (PID) Distance
- Let NumOfIdenticalPairs be the number of identical pairs within the aligned region. For example in the above picture there are five such pairs (2 greens, 1 purple, 1 blue, 1 red)
- PID = NumOfIdenticalPairs / AlignedLength
- Convert percent identity as a distance by taking 1 - PID
- Score
- Each pair of aligned characters is assigned a score using the substitution matrix and gap penalties.
- The summation of all such scores is called the score of the alignment. The alignment algorithm always tries to maximize this value.
- See [1] for more details.
- Normalized Score
- Compute the score for aligning S1 with S2, S1 with S1, and S2 with S2. Let these values be named as S1S2, S1S1, and S2S2 respectively.
- NormalizedScore = 2 * S1S2 / (S1S1 + S2S2)
- BitScore
- Blast alignment has a value called BitScore, which is a log scaled version of the Score.
- See [1,2] for more details.
- Normalized BitScore
- Similar to NormalizedScore, compute BitScore for aligning S1 with S2, S1 with S1, and S2 with S2.
- Used the same formula as in NormalizedScore to compute NormalizedBitScore
FAQ
- What does PID stands for?
- PID stands for Percent Identity and it implies the particular Manxcat run used (1 - PID) value of each aligned pair of sequences as the distance between the two original sequences corresponding to that particular pair.
- See more on different distance types at DistanceTypes
- What is Simple Points file?
- Given the input sequence file used in the particular Manxcat run, the Simple Points file presents 3D coordinates for each sequence in order. These coordinates are computed by the Manxcat program with its best effort to preserve the original distance between each pair. The term original distance refers to the distance (transformed distance if specified - see Distances and Transformations) calculated through aligning the corresponding two sequences.
- Note. In cases where Blast is used to do the alignment it's possible not to get alignments for certain sequence pairs. In such cases Manxcat may not produce coordinates for all the sequences. Therefore you may find some point numbers are missing in the Simple Points file although they are ordered by the point number. The value of the Distance Cut may also ignore pairs of sequences having a distance value greater than that, resulting similar missing points in the output.
- How do the coordinates in Simple Points correspond to COG clusters?
- Predefined cluster assignment is available for each sequence in the used set of COG sequences. These are available in the Introduction page.
- What is the difference between COG95672 and COG50000?
- COG95672 stands for the unique sequences filtered from the original set of sequences. It has 95672 sequences.
- COG50000 stands for the first 50,000 sequences taken out of the unique sequences file.
- Also note both these files preserve the same order as in the original set of sequences. In other words the sequences are not randomized.
- Can you please give more details on distance transformations, i.e. Transformation: TM10,TP4 ?
- Please refer to the separate post on Distances and Transformations
- What does DistanceCut: 0.96 mean?
- Please refer to the separate post on Distance Cut
- Is PlotViz available for Linux?
- Currently, PlotViz is available for Windows and Mac environments only.
- What are the selected clusters?
- Please refer to the separate post on The Role of Seven Clusters
Distances and Transformations
Distances are rather arbitrary. We have different "biology" approaches including
a) Needleman Wunsch
b) Smith Waterman Gotoh (SWG) Percentage Identity (PID)
c) SWG Score
d) Blast PID
e) Blast Bitscore
Further given any distance d(i,j), I can replace it by m(d(i,j)) for any monotonic mapping function y-->m(y) satisfying
m(y1) > m(y2) for y1>y2
Typically mapping is chosen to make distribution of distances "more reasonable". Note if you have a high dimensional random distribution, one can show that
D = Formal Dimension = 2 Mean^2 / Sigma^2
showing high dimension corresponds to a standard deviation of distances sigma that is small compared to mean.
High dimension translates into mapped points concentrated at edge of surface (sphere) when mapped to 3D. So cases like Blast and SWG with a huge peak at distances = 1 have a high dimension as sigma small compared to mean
So we choose mapping m to reduce dimension while retaining ordering of distances. We have looked at several choices of m.
a) Transformation Method 10
m(y) = y^TP where TP is transformation parameter -- TP = 2 4 6 investigated
b) Transformation Method 8
m(y) maps to 4 dimensions. If you assume your data is randomly distributed in a sphere in dimension D, you can analytically derive formula for mapping so m(d) corresponds to points randomly distributed in 2 or 4 dimensions. Transformation method 8 implements this mapping for final dimension 4. Note original data is not random so one doesn't get an exact final dimension but it is typically around 4.
c) Transformation Method 9 or SQRT(4D)
Here we start with mapping m8(d) mapping to 4 dimensions.
Then we INCREASE formal mapped dimension by
m9(d) = m8(d)^0.5
Note in Transformation Method 10, TP > 1 lowers formal dimension but TP < 1 increases formal dimension.
Thus m9(d) has larger formal dimension than m8 which is ~4. m9(d) for COG has formal dimension around 14
a) Needleman Wunsch
b) Smith Waterman Gotoh (SWG) Percentage Identity (PID)
c) SWG Score
d) Blast PID
e) Blast Bitscore
Further given any distance d(i,j), I can replace it by m(d(i,j)) for any monotonic mapping function y-->m(y) satisfying
m(y1) > m(y2) for y1>y2
Typically mapping is chosen to make distribution of distances "more reasonable". Note if you have a high dimensional random distribution, one can show that
D = Formal Dimension = 2 Mean^2 / Sigma^2
showing high dimension corresponds to a standard deviation of distances sigma that is small compared to mean.
High dimension translates into mapped points concentrated at edge of surface (sphere) when mapped to 3D. So cases like Blast and SWG with a huge peak at distances = 1 have a high dimension as sigma small compared to mean
So we choose mapping m to reduce dimension while retaining ordering of distances. We have looked at several choices of m.
a) Transformation Method 10
m(y) = y^TP where TP is transformation parameter -- TP = 2 4 6 investigated
b) Transformation Method 8
m(y) maps to 4 dimensions. If you assume your data is randomly distributed in a sphere in dimension D, you can analytically derive formula for mapping so m(d) corresponds to points randomly distributed in 2 or 4 dimensions. Transformation method 8 implements this mapping for final dimension 4. Note original data is not random so one doesn't get an exact final dimension but it is typically around 4.
c) Transformation Method 9 or SQRT(4D)
Here we start with mapping m8(d) mapping to 4 dimensions.
Then we INCREASE formal mapped dimension by
m9(d) = m8(d)^0.5
Note in Transformation Method 10, TP > 1 lowers formal dimension but TP < 1 increases formal dimension.
Thus m9(d) has larger formal dimension than m8 which is ~4. m9(d) for COG has formal dimension around 14
Distance Cut
There is some evidence that distance computations are unreliable when they are near 1. This is illustrated by lack of correlation between Blast and Needleman Wunsch for Blast distances above 0.9.
So in some dimension reduction runs we remove terms where distances are larger than "Distance Cut" which value is specified on input. Remember we are minimizing sum of N(N-1)/2 terms
(distance between i and j - Euclidean distance between mapped i and j)^2
So we drop terms where distance between i and j> Distance Cut. Typically any point i and j are still left with some distances less than cut and so you can determine mapping.
For Blast sometimes distances are not calculated and those terms are also left out of sum
Manxcat is only dimension reduction program that allows missing distances. It removes any point that has less than or equal Linkcut valid distances. Linkcut is defaulted to 5
So in some dimension reduction runs we remove terms where distances are larger than "Distance Cut" which value is specified on input. Remember we are minimizing sum of N(N-1)/2 terms
(distance between i and j - Euclidean distance between mapped i and j)^2
So we drop terms where distance between i and j> Distance Cut. Typically any point i and j are still left with some distances less than cut and so you can determine mapping.
For Blast sometimes distances are not calculated and those terms are also left out of sum
Manxcat is only dimension reduction program that allows missing distances. It removes any point that has less than or equal Linkcut valid distances. Linkcut is defaulted to 5
COG Project Comments
As well as output of program, we link to plot files to be plotted by Plotviz http://salsahpc.indiana.edu/pviz3/ (latest version http://salsahpc.indiana.edu/pviz3/data/PVIZ3-0.8.11-win64.exe). Note there are mac and Windows version of Plotviz. Currently no Linux version
We also give typical screendumps of 2D projections of 3D mappings. There are 3 heat maps in each case which scatterplot original v mapped distance for
a) Full data sample
b) Distances between points inside 7 clusters (intra)
c) Distances between points in all pairs of 7 clusters (inter) excluding those cases where points inside same cluster
Note on heat map 2 versions of Euclidean histograms are given -- rotated by 90 degrees
There are too many entries to use normal scatterplots; so we use heat maps that can use sophisticated representation of density
Manxcat is one of our two major dimension reduction programs. It can cope with methods like Sammon that have non-unit weight in sum of distance discrepancies. It can also cope with undefined distances
We will adding more explanation
Note Needleman Wunsch, Blast Bitscore and SWG Score give reasonable answers; SWG PID and Blast PID are no good
We also give typical screendumps of 2D projections of 3D mappings. There are 3 heat maps in each case which scatterplot original v mapped distance for
a) Full data sample
b) Distances between points inside 7 clusters (intra)
c) Distances between points in all pairs of 7 clusters (inter) excluding those cases where points inside same cluster
Note on heat map 2 versions of Euclidean histograms are given -- rotated by 90 degrees
There are too many entries to use normal scatterplots; so we use heat maps that can use sophisticated representation of density
Manxcat is one of our two major dimension reduction programs. It can cope with methods like Sammon that have non-unit weight in sum of distance discrepancies. It can also cope with undefined distances
We will adding more explanation
Note Needleman Wunsch, Blast Bitscore and SWG Score give reasonable answers; SWG PID and Blast PID are no good
Monday, March 26, 2012
COG 95672 SWG Normalized Score No Cut TM10 TP6 with Sammon
Description
DataSet: COG Size: 95672 Unique: YesAligner: SmithWaterman ScoringMatrix: BLOSUM62 GapOpen: -9 GapExt: -1
DistanceType: Normalized SWG Score (2*AB/AA+BB) Transformation: TM10,TP6
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85
Links
Images
Full Sample with Selected Clusters Zoomed-in
Configuration
I/O CoordinateWriteFrequency: 0 DistanceMatrixFile: F:\Salsa\saliya\cog\100k\input\cog_95672_swg_distance_normscore_c#.bin ManxcatCore AddonforQcomputation: 2 CalcFixedCrossFixed: True CGResidualLimit: 1E-05 ChisqChangePerPoint: 0.001 Chisqnorm: 2 ChisqPrintConstant: 1 ConversionInformation: ConversionOption: DataPoints: 95672 Derivtest: False DiskDistanceOption: 2 DistanceCut: -1 DistanceFormula: 1 DistanceProcessingOption: 0 DistanceWeigthsCuts: Eigenvaluechange: 0.001 Eigenvectorchange: 0.001 ExtraOption1: 0 Extraprecision: 0.05 FixedPointCriterion: none FletcherRho: 0.25 FletcherSigma: 0.75 FullSecondDerivativeOption: 0 FunctionErrorCalcMultiplier: 10 HistogramBinCount 100 InitializationLoops: 2 InitializationOption: 0 InitialSteepestDescents: 0 LinkCut: 5 LocalVectorDimension: 3 Maxit: 120 MinimumDistance: -0.001 MPIIOStrategy: 0 Nbadgo: 6 Omega: 1.25 OmegaOption: 0 PowerIterationLimit: 200 ProcessingOption: 0 QgoodReductionFactor: 0.5 QHighInitialFactor: 0.01 QLimiscalecalculationInterval: 1 RotationOption: 0 Selectedfixedpoints: Selectedvariedpoints: StoredDistanceOption: 2 TimeCutmillisec: -1 TransformMethod: 10 TransformParameter: 6 UndefindDistanceValue: -1 VariedPointCriterion: all WeightingOption: 0 Write2Das3D: True Density Alpha: 2 Pcutf: 0.85 SelectedClusters: 63,82,145,265,362,684,708 XmaxBound: 1.5 Xres: 50 YmaxBound: 1.5 Yres: 50
Saturday, March 24, 2012
COG 95672 SWG Normalized Score TM10 TP6 with Sammon
Description
DataSet: COG Size: 95672 Unique: YesAligner: SmithWaterman ScoringMatrix: BLOSUM62 GapOpen: -9 GapExt: -1
DistanceType: Normalized SWG Score (2*AB/AA+BB) Transformation: TM10,TP6
Mapping: Sammon DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85
Links
Images
Full Sample with Selected Clusters Zoomed-in
Configuration
I/O CoordinateWriteFrequency: 0 DistanceMatrixFile: F:\Salsa\saliya\cog\100k\input\cog_95672_swg_distance_normscore_c#.bin ManxcatCore AddonforQcomputation: 2 CalcFixedCrossFixed: True CGResidualLimit: 1E-05 ChisqChangePerPoint: 0.001 Chisqnorm: 2 ChisqPrintConstant: 1 ConversionInformation: ConversionOption: DataPoints: 95672 Derivtest: False DiskDistanceOption: 2 DistanceCut: 0.96 DistanceFormula: 1 DistanceProcessingOption: 0 DistanceWeigthsCuts: Eigenvaluechange: 0.001 Eigenvectorchange: 0.001 ExtraOption1: 0 Extraprecision: 0.05 FixedPointCriterion: none FletcherRho: 0.25 FletcherSigma: 0.75 FullSecondDerivativeOption: 0 FunctionErrorCalcMultiplier: 10 HistogramBinCount 100 InitializationLoops: 2 InitializationOption: 0 InitialSteepestDescents: 0 LinkCut: 5 LocalVectorDimension: 3 Maxit: 120 MinimumDistance: -0.001 MPIIOStrategy: 0 Nbadgo: 6 Omega: 1.25 OmegaOption: 0 PowerIterationLimit: 200 ProcessingOption: 0 QgoodReductionFactor: 0.5 QHighInitialFactor: 0.01 QLimiscalecalculationInterval: 1 RotationOption: 0 Selectedfixedpoints: Selectedvariedpoints: StoredDistanceOption: 2 TimeCutmillisec: -1 TransformMethod: 10 TransformParameter: 6 UndefindDistanceValue: -1 VariedPointCriterion: all WeightingOption: 0 Write2Das3D: True Density Alpha: 2 Pcutf: 0.85 SelectedClusters: 63,82,145,265,362,684,708 XmaxBound: 1 Xres: 50 YmaxBound: 1 Yres: 50
Friday, March 23, 2012
COG 95672 Blast TM10 TP6 with Sammon
Description
DataSet: COG Size: 95672 Unique: YesAligner: Blast ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: Normalized BitScore (2*AB/AA+BB) Transformation: TM10,TP6
Mapping: Sammon DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85
Links
Images
Full Sample with Selected Clusters
Full Sample with Selected Clusters Zoomed-in
Configuration
I/O CoordinateWriteFrequency: 0 DistanceMatrixFile: F:\Salsa\saliya\cog\100k\input\cog_95672_bitscore_refined_c#.bin ManxcatCore AddonforQcomputation: 2 CalcFixedCrossFixed: True CGResidualLimit: 1E-05 ChisqChangePerPoint: 0.001 Chisqnorm: 2 ChisqPrintConstant: 1 ConversionInformation: ConversionOption: DataPoints: 95672 Derivtest: False DiskDistanceOption: 2 DistanceCut: 0.96 DistanceFormula: 1 DistanceProcessingOption: 0 DistanceWeigthsCuts: Eigenvaluechange: 0.001 Eigenvectorchange: 0.001 ExtraOption1: 0 Extraprecision: 0.05 FixedPointCriterion: none FletcherRho: 0.25 FletcherSigma: 0.75 FullSecondDerivativeOption: 0 FunctionErrorCalcMultiplier: 10 HistogramBinCount 100 InitializationLoops: 1 InitializationOption: 1 InitialSteepestDescents: 0 LinkCut: 5 LocalVectorDimension: 3 Maxit: 80 MinimumDistance: -0.001 MPIIOStrategy: 0 Nbadgo: 6 Omega: 1.25 OmegaOption: 0 PowerIterationLimit: 200 ProcessingOption: 100 QgoodReductionFactor: 0.5 QHighInitialFactor: 0.01 QLimiscalecalculationInterval: 1 RotationOption: 0 Selectedfixedpoints: Selectedvariedpoints: StoredDistanceOption: 2 TimeCutmillisec: -1 TransformMethod: 10 TransformParameter: 6 UndefindDistanceValue: -1 VariedPointCriterion: all WeightingOption: 0 Write2Das3D: True Density Alpha: 2 Pcutf: 0.85 SelectedClusters: 63,82,145,265,362,684,708 XmaxBound: 1.8 Xres: 50 YmaxBound: 1.8 Yres: 50
COG 50000 Blast PID TM10 TP4 with Chisq1
Description
DataSet: COG Size: 50000 Unique: YesAligner: Blast ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM10,TP4
Mapping: Chisq1 DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85
Links
Images
Full Sample with Selected Clusters
Configuration
I/O CoordinateWriteFrequency: 0 DistanceMatrixFile: F:\Salsa\saliya\cog\100k\input\cog_95672_bitscore_refined_first50k_c#.bin ManxcatCore AddonforQcomputation: 2 CalcFixedCrossFixed: True CGResidualLimit: 1E-05 ChisqChangePerPoint: 0.001 Chisqnorm: 1 ChisqPrintConstant: 1 ConversionInformation: ConversionOption: DataPoints: 50000 Derivtest: False DiskDistanceOption: 2 DistanceCut: 0.96 DistanceFormula: 1 DistanceProcessingOption: 0 DistanceWeigthsCuts: Eigenvaluechange: 0.001 Eigenvectorchange: 0.001 ExtraOption1: 0 Extraprecision: 0.05 FixedPointCriterion: none FletcherRho: 0.25 FletcherSigma: 0.75 FullSecondDerivativeOption: 0 FunctionErrorCalcMultiplier: 10 HistogramBinCount 100 InitializationLoops: 1 InitializationOption: 1 InitialSteepestDescents: 0 LinkCut: 5 LocalVectorDimension: 3 Maxit: 80 MinimumDistance: -0.001 MPIIOStrategy: 0 Nbadgo: 6 Omega: 1.25 OmegaOption: 0 PowerIterationLimit: 200 ProcessingOption: 100 QgoodReductionFactor: 0.5 QHighInitialFactor: 0.01 QLimiscalecalculationInterval: 1 RotationOption: 0 Selectedfixedpoints: Selectedvariedpoints: StoredDistanceOption: 2 TimeCutmillisec: -1 TransformMethod: 10 TransformParameter: 4 UndefindDistanceValue: -1 VariedPointCriterion: all WeightingOption: 0 Write2Das3D: True Density Alpha: 2 Pcutf: 0.85 SelectedClusters: 63,82,145,265,362,684,708 XmaxBound: 1.8 Xres: 50 YmaxBound: 1.8 Yres: 50
COG 50000 Blast TM10 TP2 with Sammon
Description
DataSet: COG Size: 50000 Unique: YesAligner: Blast ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: Normalized BitScore (2*AB/AA+BB) Transformation: TM10,TP2
Mapping: Sammon DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85
Links
Images
Full Sample with Selected Clusters
Configuration
I/O CoordinateWriteFrequency: 0 DistanceMatrixFile: F:\Salsa\saliya\cog\100k\input\cog_95672_bitscore_refined_first50k_c#.bin ManxcatCore AddonforQcomputation: 2 CalcFixedCrossFixed: True CGResidualLimit: 1E-05 ChisqChangePerPoint: 0.001 Chisqnorm: 2 ChisqPrintConstant: 1 ConversionInformation: ConversionOption: DataPoints: 50000 Derivtest: False DiskDistanceOption: 2 DistanceCut: 0.96 DistanceFormula: 1 DistanceProcessingOption: 0 DistanceWeigthsCuts: Eigenvaluechange: 0.001 Eigenvectorchange: 0.001 ExtraOption1: 0 Extraprecision: 0.05 FixedPointCriterion: none FletcherRho: 0.25 FletcherSigma: 0.75 FullSecondDerivativeOption: 0 FunctionErrorCalcMultiplier: 10 HistogramBinCount 100 InitializationLoops: 1 InitializationOption: 1 InitialSteepestDescents: 0 LinkCut: 5 LocalVectorDimension: 3 Maxit: 80 MinimumDistance: -0.001 MPIIOStrategy: 0 Nbadgo: 6 Omega: 1.25 OmegaOption: 0 PowerIterationLimit: 200 ProcessingOption: 100 QgoodReductionFactor: 0.5 QHighInitialFactor: 0.01 QLimiscalecalculationInterval: 1 RotationOption: 0 Selectedfixedpoints: Selectedvariedpoints: StoredDistanceOption: 2 TimeCutmillisec: -1 TransformMethod: 10 TransformParameter: 2 UndefindDistanceValue: -1 VariedPointCriterion: all WeightingOption: 0 Write2Das3D: True Density Alpha: 2 Pcutf: 0.85 SelectedClusters: 63,82,145,265,362,684,708 XmaxBound: 1.8 Xres: 50 YmaxBound: 1.8 Yres: 50
COG 50000 Blast Untransformed with Chisq1
Description
DataSet: COG Size: 50000 Unique: YesAligner: Blast ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: Normalized BitScore (2*AB/AA+BB) Transformation: None
Mapping: Chisq1 DistanceCut: 0.96
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85
Links
Images
Full Sample with Selected Clusters
Configuration
I/O CoordinateWriteFrequency: 0 DistanceMatrixFile: F:\Salsa\saliya\cog\100k\input\cog_95672_bitscore_refined_first50k_c#.bin ManxcatCore AddonforQcomputation: 2 CalcFixedCrossFixed: True CGResidualLimit: 1E-05 ChisqChangePerPoint: 0.001 Chisqnorm: 1 ChisqPrintConstant: 1 ConversionInformation: ConversionOption: DataPoints: 50000 Derivtest: False DiskDistanceOption: 2 DistanceCut: 0.96 DistanceFormula: 1 DistanceProcessingOption: 0 DistanceWeigthsCuts: Eigenvaluechange: 0.001 Eigenvectorchange: 0.001 ExtraOption1: 0 Extraprecision: 0.05 FixedPointCriterion: none FletcherRho: 0.25 FletcherSigma: 0.75 FullSecondDerivativeOption: 0 FunctionErrorCalcMultiplier: 10 HistogramBinCount 100 InitializationLoops: 1 InitializationOption: 1 InitialSteepestDescents: 0 LinkCut: 5 LocalVectorDimension: 3 Maxit: 80 MinimumDistance: -0.001 MPIIOStrategy: 0 Nbadgo: 6 Omega: 1.25 OmegaOption: 0 PowerIterationLimit: 200 ProcessingOption: 100 QgoodReductionFactor: 0.5 QHighInitialFactor: 0.01 QLimiscalecalculationInterval: 1 RotationOption: 0 Selectedfixedpoints: Selectedvariedpoints: StoredDistanceOption: 2 TimeCutmillisec: -1 TransformMethod: 0 TransformParameter: 0.125 UndefindDistanceValue: -1 VariedPointCriterion: all WeightingOption: 0 Write2Das3D: True Density Alpha: 2 Pcutf: 0.85 SelectedClusters: 63,82,145,265,362,684,708 XmaxBound: 1.8 Xres: 50 YmaxBound: 1.8 Yres: 50
COG 95672 SWG PID TM10 TP4 with Sammon
Description
DataSet: COG Size: 95672 Unique: YesAligner: SmithWaterman ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM10,TP4
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85
Links
Images
Full Sample with Selected Clusters
Configuration
I/O CoordinateWriteFrequency: 0 DistanceMatrixFile: F:\Salsa\saliya\cog\100k\input\cog_95672_swg_blosum62_c#.bin ManxcatCore AddonforQcomputation: 2 CalcFixedCrossFixed: True CGResidualLimit: 1E-05 ChisqChangePerPoint: 0.001 Chisqnorm: 2 ChisqPrintConstant: 1 ConversionInformation: ConversionOption: DataPoints: 95672 Derivtest: False DiskDistanceOption: 2 DistanceCut: -1 DistanceFormula: 1 DistanceProcessingOption: 0 DistanceWeigthsCuts: Eigenvaluechange: 0.001 Eigenvectorchange: 0.001 ExtraOption1: 0 Extraprecision: 0.05 FixedPointCriterion: none FletcherRho: 0.25 FletcherSigma: 0.75 FullSecondDerivativeOption: 0 FunctionErrorCalcMultiplier: 10 HistogramBinCount 100 InitializationLoops: 1 InitializationOption: 1 InitialSteepestDescents: 0 LinkCut: 5 LocalVectorDimension: 3 Maxit: 80 MinimumDistance: -0.001 MPIIOStrategy: 0 Nbadgo: 6 Omega: 1.25 OmegaOption: 0 PowerIterationLimit: 200 ProcessingOption: 100 QgoodReductionFactor: 0.5 QHighInitialFactor: 0.01 QLimiscalecalculationInterval: 1 RotationOption: 0 Selectedfixedpoints: Selectedvariedpoints: StoredDistanceOption: 2 TimeCutmillisec: -1 TransformMethod: 10 TransformParameter: 4 UndefindDistanceValue: -1 VariedPointCriterion: all WeightingOption: 0 Write2Das3D: True Density Alpha: 2 Pcutf: 0.85 SelectedClusters: 63,82,145,265,362,684,708 XmaxBound: 1 Xres: 50 YmaxBound: 1 Yres: 50
Wednesday, March 21, 2012
COG 95672 NW PID SQRT4D with Sammon
Description
DataSet: COG Size: 95672 Unique: YesAligner: NeedlemanWunsch ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: TM9
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85
Links
Images
Full Sample with Selected Clusters
Full Sample with Selected Clusters Zoomed-in
Configuration
I/O CoordinateWriteFrequency: 0 DistanceMatrixFile: F:\Salsa\saliya\cog\100k\input\cog_95672_nw_4d_sqrt_c#.bin ManxcatCore AddonforQcomputation: 2 CalcFixedCrossFixed: True CGResidualLimit: 1E-05 ChisqChangePerPoint: 0.001 Chisqnorm: 2 ChisqPrintConstant: 1 ConversionInformation: ConversionOption: DataPoints: 95672 Derivtest: False DiskDistanceOption: 2 DistanceCut: -1 DistanceFormula: 1 DistanceProcessingOption: 0 DistanceWeigthsCuts: Eigenvaluechange: 0.001 Eigenvectorchange: 0.001 ExtraOption1: 0 Extraprecision: 0.05 FixedPointCriterion: none FletcherRho: 0.25 FletcherSigma: 0.75 FullSecondDerivativeOption: 0 FunctionErrorCalcMultiplier: 10 HistogramBinCount 100 InitializationLoops: 1 InitializationOption: 1 InitialSteepestDescents: 0 LinkCut: 5 LocalVectorDimension: 3 Maxit: 80 MinimumDistance: -0.001 MPIIOStrategy: 0 Nbadgo: 6 Omega: 1.25 OmegaOption: 0 PowerIterationLimit: 200 ProcessingOption: 100 QgoodReductionFactor: 0.5 QHighInitialFactor: 0.01 QLimiscalecalculationInterval: 1 RotationOption: 0 Selectedfixedpoints: Selectedvariedpoints: StoredDistanceOption: 2 TimeCutmillisec: -1 TransformMethod: 0 TransformParameter: 0.125 UndefindDistanceValue: -1 VariedPointCriterion: all WeightingOption: 0 Write2Das3D: True Density Alpha: 2 Pcutf: 0.85 SelectedClusters: 63,82,145,265,362,684,708 XmaxBound: 1 Xres: 50 YmaxBound: 1 Yres: 50
Tuesday, March 20, 2012
COG 95672 NW PID Untransformed with Sammon
Description
DataSet: COG Size: 95672 Unique: YesAligner: NeedlemanWunsch ScoringMatrix: BLOSUM62 GapOpen: -16 GapExt: -4
DistanceType: (1 - PercentIdentity) Transformation: None
Mapping: Sammon DistanceCut: None
Initialization: Random
Fixed: None
Varied: All
DensitySat: 0.85
Links
Images
Full Sample with Selected Clusters
Configuration
I/O CoordinateWriteFrequency: 0 DistanceMatrixFile: F:\Salsa\saliya\cog\100k\input\cog_95672_nw_c#.bin ManxcatCore AddonforQcomputation: 2 CalcFixedCrossFixed: True CGResidualLimit: 1E-05 ChisqChangePerPoint: 0.001 Chisqnorm: 2 ChisqPrintConstant: 1 ConversionInformation: ConversionOption: DataPoints: 95672 Derivtest: False DiskDistanceOption: 2 DistanceCut: -1 DistanceFormula: 1 DistanceProcessingOption: 0 DistanceWeigthsCuts: Eigenvaluechange: 0.001 Eigenvectorchange: 0.001 ExtraOption1: 0 Extraprecision: 0.05 FixedPointCriterion: none FletcherRho: 0.25 FletcherSigma: 0.75 FullSecondDerivativeOption: 0 FunctionErrorCalcMultiplier: 10 HistogramBinCount 100 InitializationLoops: 1 InitializationOption: 1 InitialSteepestDescents: 0 LinkCut: 5 LocalVectorDimension: 3 Maxit: 80 MinimumDistance: -0.001 MPIIOStrategy: 0 Nbadgo: 6 Omega: 1.25 OmegaOption: 0 PowerIterationLimit: 200 ProcessingOption: 100 QgoodReductionFactor: 0.5 QHighInitialFactor: 0.01 QLimiscalecalculationInterval: 1 RotationOption: 0 Selectedfixedpoints: Selectedvariedpoints: StoredDistanceOption: 2 TimeCutmillisec: -1 TransformMethod: 0 TransformParameter: 0.125 UndefindDistanceValue: -1 VariedPointCriterion: all WeightingOption: 0 Write2Das3D: True Density Alpha: 2 Pcutf: 0.85 SelectedClusters: 63,82,145,265,362,684,708 XmaxBound: 1.2 Xres: 50 YmaxBound: 1.2 Yres: 50
Subscribe to:
Posts (Atom)