I have a confession to make: I am confused about changes of variables.
Now, don’t get me wrong. I know how to change variables in a given problem. But (as is often the trigger for me realising I don’t have as good a grasp on something as I’d like) I’m trying to write up teaching notes on changes of variables, in a precise and principled way, and keep finding myself bogged down. So this post will be an effort to, as it were, think aloud through the issues that come up.
Let’s start with a nice easy example. Suppose that we’re thinking about the equation of the unit circle, in -, -coordinates:
And now suppose that we want to convert to two-dimensional polar coordinates :
Now, if we want to know the equation of the unit circle in these new coordinates, we substitute and into the above equation, using the inverse expressions to those above:
Which gets us the expected result, . In this case, the use of the inverse expressions maybe doesn’t seem totally necessary since the original expression already only depended on and via ; but it should hopefully be clear that that’s a peculiarity of this simple example.
Now, I want to claim that even this relatively simple case picks up on a few potentially confusing things. For starters, we might be a little perturbed by the fact that the thing we actually use to convert our first equation (in terms of x and y) into our second (in terms of r and ) isn’t the definition of the new coordinates in terms of the old, but the inverse expressions that define the old coordinates in terms of the new. More than that, what – when you get down to it – is an expression like suppose to mean? In particular, is this the defining equation for a real-valued function of and , or a recipe licencing the replacement of the simple expression “” by the complex expression ““? Or, if it’s both at once, then how do these two functions intertwine?
Let’s take a bit of a step back, then. We started with an equation, formulated in a particular mathematical language (namely, one employing the variables and ). This equation may be regarded as a certain kind of syntactic object, formulated in a particular mathematical language: one featuring, in particular, the variables and , in addition to the usual algebraic symbols (+, etc.). We’ll think of this as analogous to a formula formed in a first-order or propositional language.
On the semantic side, this equation is to be interpreted over . We’re helping ourselves to the idea that this structure has been parameterised by and ; that is to say, an element of is really a map from to ; I’m using corner-quotes here to stress that it’s the symbols that are being mapped to points of . (This is supposed to resemble the manner in which an interpretation of a logical formula is a map from the set of propositional or predicate symbols to truth-values or sets of tuples.) Such a point satisfies the equation if . A solution to this equation is a set of points in , such that each point in the set satisfies this equation.
All this might seem like a massive palaver to go through, just in order to state the elementary fact that a solution to the equation is a set of points for which the equation is true. And of course, that’s certainly true; but the hope is that by being as careful as possible, and in particular by distinguishing starkly between syntactic and semantic considerations, we can figure out what’s going on when we start working with new variables.
(Incidentally, as a side-note to the above: if we do want to think of the equation as analogous to a first-order formula like , there’s an interesting question about what the proper analogue of a first-order model of that formula ought to be. Is it an individual point in satisfying the equation, or a solution to the equation?)
On the other side of the divide, we have an equation in a different language – namely, one having the same algebraic and operation symbols, etc., but having instead the variables and . This equation just reads . (Let’s forget, for the moment, that we obtained this equation from the previous one through some substitution work.) On the semantic side, this language is also interpreted over points of , as parameterised by and : that is, it is interpreted by maps from to . We also include the specifications (i) that we’ll only consider maps such that , and (ii) that is periodic, so that if and , then . (Which means that we’re not really interpreting it over after all, but rather over the half-cylinder obtained by gluing to itself.) So analogously to what we had before, a point satisfies the equation if .
Now, there is meant to be some sense in which these two equations “have the same content”. What sense is that? Well, here are two relevant observations. First, at the syntactic level: uniformly substituting for and for in the former equation yields the latter, and uniformly substituting for (and for ) in the latter equation yields the former. Moreover, applying these two substitutions to in turn yields an expression (namely, ) that is provably equivalent to ; the same goes for all the other variables. In other words, we’ve got a pair of mutually inverse “translations” between the two equations.
At the semantic level, suppose I fix the following map from , parameterised by x and y, to the half-cylinder constructed from and parameterised by and : given any , according to
This map then has the following feature: given any point , if $p$ satisfies the first equation then satisfies the second equation. Similarly, we can construct a map from the half-cylinder to , which will have the feature that for any point q of the former, if q satisfies the second equation then satisfies the first equation. And moreover, a little work will show that F and G are inverse to one another (well, other than places like ).
The reason this is interesting is that it mirrors what happens with translations in formal languages: there too, one has both syntactic translations between expressions and semantic maps between interpretational structures. Moreover, it draws out one of the things that makes changes of variables so confusing (at least to me!): I started out by saying that an expression like seems to be both a specification of a function, and a substitution manual. We’ve now got a more precise grip on what is meant by this – these two identities correspond to the semantic and syntactic maps just described. But note that (as, again, happens with formal languages) the two maps associated to an expression like “go in opposite directions”, i.e. are dual concepts. The syntactic side of the coin is a map from expressions in the (x, y)-language to expressions in the -language; the semantic side of the coin, however, is a map from points of the -structure (the half-cylinder) to points of the (x, y)-structure. (That is, the semantic map is half of the instructions needed to construct G above; if I’d thought this through more carefully I’d have explicitly constructed G rather than F, but the point stands – in the construction of F we used the expressions and .)
In any event, all of this was supposed to just be a warmup to the really tricky case – doing all this for differential equations, not just algebraic equations. Still, writing this has helped me unpick some of the tangle (admittedly, which might well be a tangle I have entirely created for myself); extending this to differential equations will have to wait for another day.