Sunday, January 17, 2010

How hard is generating scientific hypotheses?

In the 1500s, a Danish astronomer named Tycho Brahe used Galileo's invention of the telescope to collect an enormous amount of numerical data describing the motion of the planets. Brahe's assistant Johannes Kepler studied that data and arrived at some interesting conclusions which we now know as Kepler's laws of planetary motion:
  1. The orbit of every planet is an ellipse with the Sun at a focus.
  2. A line joining a planet and the Sun sweeps out equal areas during equal intervals of time.
  3. The square of the orbital period of a planet is directly proportional to the cube of the semi-major axis of its orbit.
Kepler's laws were the starting point from which Isaac Newton formulated his law of gravitation, the inverse-square law that we all know and love.

We have here a three-step process: collect data, find mathematical patterns in the data, and create a theory that explains those patterns. Collecting data is simple in principle, and looking for mathematical patterns is also simple. Kepler's arithmetic was done by hand, but now we have computer programs (like Eureqa) which use genetic programming to find parsimonious mathematical formulas that fit sets of data. You can find Java applets on the web that demonstrate this idea.

So the first two steps aren't too hard. We can arrive rather easily at mathematical formulas that describe various experimentally measurable aspects of reality. That's a good thing. The hard job is the next step: finding theories or "likely stories" that explain why those formulas take whatever form they do. Sometimes the form of the math suggests a mechanism, because you've learned to associate elliptical orbits with conservative force fields which necessarily have an inverse-square law. (Hundreds of years after Newton, that is now a no-brainer.) But generally the problem is non-trivial and so far, as far as I'm aware, requires human insight.

No comments: