Showing posts with label molecular modeling. Show all posts
Showing posts with label molecular modeling. Show all posts

Tuesday, July 26, 2011

Molecular dynamics and force fields

In the web's early days, I wrote a Java applet to do a little bit of molecular modeling in your web browser. I had picked up a copy of Eric Drexler's book Nanosystems, read the section on MM2, and understood it well enough to implement it in code. That was a lot of fun to work on, and as I've watched web technology progress, I've occasionally thought about taking another stab at it. I am now in the process of doing that, as some parts have gotten easier, others have gotten faster, and some have gotten just plain interesting. JavaScript, despite a few warts, is charming, and widespread deployment of HTML5 is a big help too.

Molecular modeling approaches like MM2 generally work by computing potential energy as a function of the relative positions of atoms within a molecule. This is done by decomposing the potential energy into a series of terms, relating to the lengths of chemical bonds, the angles between bonds that share a single atom, or the dihedral angle between two bonds linked by a third bond. These terms are parameterized based on the elements and hybridizations of the atoms involved. Force contributions of these terms can be computed independently and summed together to find the forces acting on each of the atoms. This is necessarily a simplification, and more accurate approaches exist involving solutions to the Schrodinger wave equation describing probabilistic locations of electrons and nuclei. But the simple mechanical approach is adequate for getting a sense of the molecule's general shape and how it perturbs over familiar temperature ranges.

For a potential energy function E(p) relating to some geometric parameter p, we can use the chain rule to get the force on an atom at position (x,y,z):
(fx, fy, fz) = -E'(p) (∂p/∂x, ∂p/∂y, ∂p/∂z)
where ∂p/∂x is the notation for the partial derivative of p in the x direction. The forms of the potential energy functions are pretty straightforward (1, 2), and taking their derivatives is not difficult. The remaining trick is to determine the partial derivatives of parameters with respect to x, y, and z. Let's consider a simple case where the parameter is the distance between two atoms at positions (ux, uy, uz) and (vx, vy, vz). Then the distance is r where
r2 = (ux - vx)2 + (uy - vy)2 + (uz - vz)2
and taking partial differentials with respect to ux yields
2r ∂r = 2 (ux - vx) ∂ux

∂r/∂ux = (ux - vx) / r
The same operations with uy and uz tell us that the force vector acting on the first atom is f = -E'(r) r / r where the direction of r is from the v atom to the u atom, and the inverse for the second atom. Here, boldface denotes a vector quantity.

It's usually easy to apply some geometric intuition and determine a unit vector in the direction of greatest change for a parameter p (the gradient of p with respect to a particular atom's position). For an angle θ involving three atoms at positions u, v and w, with v being the vertex, the gradient for u lies in the plane and is perpendicular to (u - v), pointing away from the third atom. Then the vector (∂θ/∂x, ∂θ/∂y, ∂θ/∂z) is in the same direction as the gradient unit vector, and it's necessary only to determine a scaling factor. That can be obtained by doing a little trigonometry to determine that a teeny move of distance δ in that direction will produce a parameter change , and then the magnitude of the force is -E'(θ) dθ/δ.

Tuesday, August 25, 2009

I blinked, and web apps advanced ten years

Back in the day, we all watched static HTML pages give way to CGI, followed by MySQL and Perl, followed by Java middleware. Back then I wrote NanoCAD, but client-side Java applets never got traction. Now we have Flash and Silverlight and AJAX. On the server side we have PHP, and on the client side, Javascript, which is totally unrelated to Java. Many modern browsers now support the canvas tag. Kinda makes me want to recast NanoCAD in more modern technology.

Simultaneously we now have netbooks - small cheap wifi-connected laptops that can run a decent web browser and not much more. Google and others are pitching the idea that all our previously shrink-wrapped apps can therefore live in the cloud. It's an intriguing idea.
When you're working in Java or Python, you can use an IDE like Idea or Idle. Is there anything like that for Javascript? I am not aware of it, if there is.

Friday, May 29, 2009

Molecular modeling with Hadoop?

Hadoop is Apache's implementation of the MapReduce distributed computing scheme innovated by Google. Amazon rents out Hadoop services on their cluster. It's fairly straightforward to set up Hadoop on a cluster of Linux boxes. Having myself a long-standing interest in distributed computing approaches to molecular modeling, I have been trying to figure out how Hadoop could be applied to do very large-scale molecular simulations.

MapReduce is great for problems where large chunks of computation can be done in isolation. The difficulty with molecular modeling is that every atom is pushing or pulling on every other atom on every single time step. The problem doesn't nicely partition into large isolated chunks. One could run a MapReduce cycle on each time step, but that would be horribly inefficient - on each time step, every map job needs as input the position and velocity of every atom in the entire simulation.

There are existing solutions like NAMD, which uses DPMTA for the long-range forces between atoms. For a cluster of limited size these are the appropriate tools. For large clusters with hundreds or thousands of machines, the rate of hardware failures becomes a consideration that can't be ignored.

MapReduce provides a few principles for working in the very-large-cluster domain:
  • Let your infrastructure handle hardware failures, just like the way the Internet invisibly routes around dead servers.
  • Individual machines are anonymous. You never write application code that directly addressses an individual machine.
  • Don't waste too much money trying to make the hardware more reliable. It won't pay off in the end.
  • Use a distributed file system that reliably retains the inputs to a task until that task has been successfully completed.

Could the tasks that NAMD assigns to each machine be anonymized with respect to which machine they run on, and the communications routed through a distributed filesystem like Hadoop's HDFS? Certainly it's possible in principle. Whether I'll be able to make any reasonable progress on it in my abundant spare time is another matter.