SB302: "Biologists at the computer" by Leon Peshkin

2-3:30pm Wed & Fri Countway 204

This page was used in the spring 2006 and is now obsolete, if you are taking this course this year please go here, at any case feel free to look around:

Syllabus

DateTopicHandoutsRemarks
Wed, Feb 1Introduction, Computer anatomy, Setup Unix cheat In which students find out course goals and agenda, dissect my old ThinkPad in search of how it works, and learn two key concepts relating Computer Science and Biology: "modularity" and "abstraction". We learn that the difference among image, sound, gene expression and notably code files is in the eye of observer. That brings us to the concept of object-oriented thinking. To conclude, we set up software to move files between filesystems and do remote login to the cluster.
Fri: Feb 3 Unix, Shell, secret messages in proteomes. Emacs cheat In which we get familiar with the multitude of UNIX controls and commands, learn about the filesystem hierarchy and multi-layered software structure within the Operating System. The power of the Shell transpires as we slice and dice complete proteomes, collecting curious motif statistics. Finally we embark on Mad scientist project: searching for the secret message hidden among these 20-character texts.
Wed, Feb 8Computing on a cluster. AWK Scripts.
AWK cheat
RegEx cheat
In which we learn about an "orchestra" - super computing cluster, conducted by the orchestra - dispatch machine; the business of computing in a multi-user multi-process environment with its politics, priorities, queues and resource limits. We get introduced to an elegant way of describing complex sets of strings and phrases - regular expressions, and a laconic language for string processing - AWK.
Fri: Feb 10Using the BioPerl toolbox for automating tasks code In which we witness the power of code sharing in a community of rational self-interested biologists. We pick at a few examples of Bio-Perl scripting and dive right into tailoring these to our purposes of automated batch BLASTing and parsing the results.
Wed, Feb 15post-Valentine protein analysis Rasmol cheatIn which we get a virtual reality tour of protein-DNA complexes, learn the art of selectively "display and paint" of various functional groups and residues, interrogate the proximity and bond info and combine the structural information with BLAST similarity and CLASTAL alignment queries.
Fri: Feb 17Hacking phylogeny with Python Handout In which we think about phylogenetic trees as related to the phyletic patterns - absence/presence of orthologous genes across multiple genomes. We get the groups of orthologous proteins using Python scripts and resolve the plausible tree of evolution, while learning to re-use wealth of Bio-Python libraries.
Wed, Feb 22 biological databases and WWWeb crontab tutorial
curl manual
In which we learn some more about the Web almighty, revisit the concept of the client-server design and the protocol, learn about command-line HTTP/FTP clients wget and curl, and create an automated updater for exotic genomes and expression data using cron- scheduled task execution under UNIX.
Fri: Feb 24 Algorithms + Data structures, Objects Matlab In which we discuss the computational complexity and recursion exemplified by two implementations of the Fibonacci sequence. We learn to do step-by-step break-point debugging and memory interrogation to get to the source of exponential growth and plot the result. Finally we investigate the internal structure of the plot itself with the emphasis on child-parent relationship between the plot window and objects in it, and try to manually and graphically alter some attributes.
Wed, Mar 1Debuger, Clustering analysis in Matlab In which we rigorously define what it means for a group of objects to be similar to another group of objects, by reducing it in various ways to the singleton similarity. We interactively learn the outcome of hierarchical and K-means clustering for various free-hand drawn problem cases. Finally, we discuss the algorithm and implementation of contour interpolation for tracking a moving cell.
Fri, Mar 3BioInformatics toolbox, Shell exchange with MatlabCancerdata.xls In which we see how to run MATLAB in a batch mode on a cluster, invoke MATLAB script from PERL, and vice versa, obtain expression profile data and examine clustering and bi-clustering dendrogram. We revisit the notion of object-oriented design, by obtaining a complex structured PDB record. Finally we learn to profile the MATLAB script to eliminate performance bottlenecks.
Wed, Mar 8Image and signal processing in Matlab centrosome.m Image In which we appreciate the challenge of algorithm design for non-bipartite matching and awesome might of combinatorial explosion, while trying to track markers and particles from one movie frame to the next. We learn about locating and identifying features in an image and explore the concept of image similarity. Finally we decide against trying to startup a new Google which searches the Web for images by description.
Fri: Mar 10 Cells segmentation + tracking
Wed, Mar 15 Final Projects

Suggested reading

Useful links