C Dataset Properties

Figure C.1: Distribution of alignment diversity (\(=\sqrt(\frac{N}{L})\)) in the dataset and its ten subsets.

Figure C.2: Distribution of gap percentage of alignments in the dataset and its ten subsets.

Figure C.3: Distribution of alignment size (number of sequences N) in the dataset and its ten subsets.

Figure C.4: Distribution of protein length L in the dataset and its ten subsets.