I am starting a new side project. My wife is a biology graduate student at UNO. She is studying the genetics of plants. The software they use is either (1) WAY too expensive or (2) really lousy. What she needs is a way to look at nucleotide sequences visually. These sequences are strings of the characters A, T, G, C and U. G and C are one type of nucleotide and A/T/U are the other.
She needs to be able to view the sequence in a way that will allow her to see the density of ATU in a region. Apparently, A/T/U rich regions are usually exons (genes that matter) and G/C regions are introns (genes that don't).
I am writting a C# Windows application that will allow her to take a set of sequences and look at them side-by-side. It will assign a color to the nucleotide based on the density of ATU in the region.
The first interesting part is how to assign colors. I am using an algorithm that assigns decending weights to nucleotides as they get farther from the target. I then scale this to 0-255 since I am displaying it in 8-bit color.
For example,
AATATCGGCTATAGCATTCGATCAG
Target
Weight=1.0
The nucleotides immediately to the left and right are given a weight of 0.9 and so on until they reach zero.
I am assigning a 1 to an A/T/U and a 0 to G/C. So in this example, the total density for the target is:
Target-9 A = 0.1
Target-8 T = 0.2
Target-7 A = 0.3
Target-6 T = 0.4
Target-5 C = 0
Target-4 G = 0
Target-3 G = 0
Target-2 C = 0
Target-1 T = 0.9
Target A = 1.0
Target+1 T = 0.9
Target+2 A = 0.8
Target+3 G = 0
Target+4 C = 0
Target+5 A = 0.5
Target+6 T = 0.4
Target+7 T = 0.3
Target+8 C = 0
Target+9 G = 0
---------------------
TOTAL 5.8 out of a possible 10.0
Scaling this to 0-255 gives us:
Density = 255 * (5.8 / 10.0) = 135.15 -> 135
Therefore, if I am rendering in a grayscale, this nucleotide's RBG color is (135, 135, 135).
Now that I have the density colors for each nucleotide, I am going to render a ribbon graph with these colors. This will allow someone to look at it and say "here is an ATU-rich region".
I am going to have a lot of fun figuring out how to let them zoom and pan around this graph for analysis. I have a new book on GDI+ development that should help. If anyone has any helpful ideas, please let me know.