cozilikethinking

4 out of 5 dentists recommend this WordPress.com site

Month: October, 2016

The Lagrangian Method

What exactly is the Lagrangian method? It seems to be a popular method to solve Max/Min problems in Calculus. But generations of Calculus students may have found it troubling to understand why it works. We shall discuss this method today.

This is a method of finding local maxima and minima. Clearly, derivatives don’t tell us much about the global property of a function. They’re very much a local property. We’re supposed to maximize f(x,y) under the condition that g(x,y)=c. Note that f(x,y) is embedded in three dimensions, while g(x,y)=c is embedded in two.

In order to crack this problem, we need to rely upon the intuition that f(x,y) at a critical point cannot increase/decrease anymore locally in the direction of the contour. The gradient is the direction along which a function sees its fastest increase/decrease. Hence, the direction in which it will increase/decrease lies completely orthogonal to the contour, which is exactly the direction in which the gradient of the contour lies.

Hence, \nabla f=\lambda \nabla g.

Advertisements

Why manifolds

We know what complex manifolds are. They’re entities which “locally” look like \Bbb{C}^n. We also know about transition functions. However, today we’re going to ask an important question: a question that impedes all progress in modern math- **why** manifolds.

We can sort of understand why manifolds have the condition that each point lies inside a neighbourhood which is homeomorphic to an open set in \Bbb{C}^n. This allows us to do a lot of things on the manifold, because we know how to do those things on \Bbb{C}^n. Calculus is just the tip of the iceberg. It allows us to establish a metric (at least locally), and we also gain a lot of intuition as to how the manifold “looks” if we zoom in a lot. Which sometimes is enough.

However, why transition functions? Why can we not just “continuously” map neighbourhoods to neighbourhoods to \Bbb{C}^n, mapping the intersection of two neighbourhoods to the same points in \Bbb{C}^n?? After all, isn’t a manifold just a slight perturbed, slightly wavy copy of \Bbb{C}^n? Why are we mapping two overlapping sets to humungously different open sets, mapping points in the intersection to different points in \Bbb{C}^n, and then just ensuring that \phi_2\phi_1^{-1} is holomorphic? (I realize that I have not specified what \phi^{-1}_1 and \phi_2 are, but the reader who’s read up on complex manifolds will easily be able to infer this). This is because we want to be able to study objects that locally look like \Bbb{C}^n, but are not all slightly perturbed, wavy versions of \Bbb{C^n}.

Consider \Bbb{P}^1. It is easy to see that it locally looks like \Bbb{C}. However, there’s a major different between \Bbb{P} and \Bbb{C}: \Bbb{P} is compact while \Bbb{C} is not. Hence, there is a major global property that differentiates them, and prevents even a homeomorphism between them. We cannot “continuously” map the open neighbourhoods in \Bbb{P}^1 into \Bbb{C}. To push an image, imagine “continuously” mapping open neighbourhoods on \Bbb{P}^1 to \Bbb{C}. On the ball, you eventually loop around, and move towards where you started. However, on \Bbb{C}, you just keep going further. These two are incompatible.

Hence, we have to weaken what we can ask for. We need to make smaller demands of our mathematical gods. We cannot have a continuous mapping of neighbourhoods. However, we can at least ensure that the transition functions are holomorphic.

Small wins.

Notes on the Zero Forcing Algorithm

In this post I will try and understand the gist of the paper Zero Forcing Sets and the Minimum Rank of Graphs by Brualdi.

Let F be a field. The set of symmetric matrices of order n\times n containing entries from F is called S_n(F). The matrix corresponding to a graph is defined in the following way: if there is an edge between edges i and j, then a_{ij}=a_{ji} are non-zero. Clearly, multiple symmetric matrices can correspond to a single graph.

Let \mathfrak{S}(G) be the set of all symmetric matrices in S_n(F) corresponding to a graph G. Then the minimum rank of G, or mr(G), is defined to be \min(\text{rank}(A):A\in\mathfrak{S}(G)). Also, the corank of a matrix is the dimension of its nullity, and its maximum corank is M(G)=\max(\text{corank}(A):A\in\mathfrak{S}(G)). There’s a theorem which states that for a graph G, mr(G)+M(G)=|G|. All this is for the field F.

Here we talk about Zero-forcing sets, and discuss whether this is similar to the game that Pete and I developed. The colour change rule is the following: Let all vertices of a graph G be either white or black. If a white vertex is attached to a black vertex such that it is the only white neighbour of the black vertex, then it too is coloured black. The zero forcing set of G is a minimal set of vertices (may not be unique) such that colouring them black ensures that the whole graph will eventually be coloured black. Is this related to the game that Pete and I developed?

If we can build a graph which becomes all black by one algorithm but not by another, then we can know that they’re not the same.

1. Assume that 3-vertex: white vertex, and 2-vertex: black vertex.

The three squares is a graph that turns black by the zero forcing algorithm, but not our game.

20161020_200707

The diagram given below is an example of a graph which is forces to have zero sections by our game, but not the zero forcing algorithm.

20161020_195730

2. Assume that 3-vertex: black vertex, and 2-vertex: white vertex.

The three squares graph again is an example of a graph that is forced to have zero sections by the zero forcing algorithm, but not our game.

The graph given above is again an example of a graph which is forced to have zero sections by our game, but not the zero forcing algorithm.

Hence, the zero-forcing algorithm does not have much to do with the game that we’ve developed.

The Hurewicz Theorem

Here we talk about the Hurewicz theorem. Let X be a path connected space with \pi_n(X)=0 for n\geq 2. Then X is determined up to homotopy by \pi_1(X).

What does “determined up to homotopy” mean? It means that all the spaces that satisfy the condition above are homotopic to each other. When two spaces are homotopic, what does that mean? Say A and B are homotopic. This means that there exist maps f:A\to B and g:B\to A such that g\circ f\simeq id_A and f\circ g\simeq id_B. Can we think of any examples of spaces that are not homotopic? Yes. Just map a disconnected space to a connected space. Like mapping two non-intersecting discs to a connected disc. A homotopy between spaces tends to preserve the same “kind” of connectivity between spaces; i.e. two disconnected discs would be homotopic to two discs, and not three disconnected discs.

Why do we need the Hurewicz theorem? Because it is often difficult to calculate the fundamental group of a space, but much easier to calculate the homology groups. Hence, knowing that we can map homology groups to homotopy groups, we can discover properties of the homotopy groups that we couldn’t earlier.

We state and prove the Hurewicz Theorem for the n=1 case. Let \phi:\pi_1(X,x_0)\to H_1(X) be defined such that a path \gamma goes to \gamma. How is \gamma a member of the homology group? Because it is a map from [0,1], which is a 1-simplex, to X! Now for the map to be well-defined, if \gamma'\sim\gamma, then \phi(\gamma')=\phi(\gamma). Why is this true? We shall find out later.

Anyway, we have a homomorphism \phi:\pi_1(X,x_0)\to H_1(X), and H_1(X) is an abelian group. Hence, we have a homomorphism \phi':(\pi_1)_{ab}(X,x_0)\to H_1(X), where (\pi_1)_{ab}(X,x_0) is just the group \pi_1(X,x_0) modulo its commutator subgroup. We have abelianized the fundamental homotopy group. Hurewicz Theorem says that the map h' is an isomorphism. How do we see this? Moreover, we have not even proven that h' is a homomorphism, or even a well-defined map.

We need to prove three things: that h' is well defined, a homomorphism, and an isomorphism.

!. Well-defined- Consider two homotopic paths \gamma\sim\gamma'. We need to prove that \gamma and \gamma' belong to the same homology group. There exists a map H(s.t):I\times I\to X such that H(0,t)=\gamma(t) and H(1,t)=\gamma'(t). The solid square I\times I can be thought of as the union of two 2-simplices \sigma_1 and \sigma_2. We’ll orient the boundary edges in a compatible fashion, and consider the restriction of H to \partial \sigma_1 and \partial \sigma_2.

On a completely different but related topic, note that the constant map from \Delta^1\to X is just the boundary of the constant map from \Delta^2\to X. Coming back to the above argument, we note that H(\partial\sigma_1-\partial\sigma_2)=\gamma'-\gamma-2f_{x_0}. Here f_{x_0} just denotes the mapping of a 1-simplex to the point x_0.

So what exactly is happening here? How do we know that \gamma-\gamma'\in \text{Im}(H_2(X))? Because \gamma-\gamma'=H(\partial\sigma_1)-H(\partial\sigma_2)+2f_{x_0}, and all three of H(\partial\sigma_1), H(\partial\sigma_2) and 2f_{x_0} belong to Im(H_2(X). Hence, we’ve proved that \phi is well-defined.

2. Homomorphism: Let [\gamma] and [\delta] be elements of the fundamental group. Consider \gamma*\delta:I\to X. Moreover, let \Delta^2 be [v_0 v_2 v_3] and let \sigma:\Delta^2\to X. On the [v_0v_2] edge, let the restriction of \sigma be \gamma*\delta. On the other two edges, the restriction of \sigma should be \gamma and \delta.

First we need to prove that reparametrization does not affect homotopy. Why’s that? The homotopy lies in continuously deforming one interval to another, and then making the path (say f) act on it. This homotopy is clear. Now the proof says that \gamma-\gamma*\delta+\delta is a reparametrization of \gamma*\delta. How is that?

Anyway, we have h([\gamma])+h([\delta])-\partial\sigma=\gamma*\delta=h([\gamma]*[\delta]). Hence, we get a homomorphism.

3. Surjectivity: Let us take a 1-cycle \sigma=\sum_i\sigma_i. Here it is possible that \sigma_i=\sigma_j for i\neq j. We can re-write this sum as a sum of loops, but putting together non-loops in the obvious way. Hence, we know that all cycles can be mapped to. These cycles may be disjoint too. It’s just that we’ve covered all cycles.

Now let us move to more general 1-simplices. Let \gamma_i be the path from x_0 to the base point of \sigma_i. Then \gamma_i\sigma_i\overline{\gamma_i} is homologous to \gamma_i+\sigma_i+\overline{\gamma_i}. What does this mean? It means that their difference is a boundary. Why are they homologous? This is because h([\gamma_i][\sigma_i][\gamma'_i])=h([\gamma_i])+h([\sigma_i])+h([\gamma'_i]), as h is a homomorphism. Now \overline{\gamma_i} is homologous to -\gamma_i. Hence, we get \sigma_i is homologous to h([\gamma_i][\sigma_i][\gamma'_i]). By doing this, we can center all our loops at x_0. Now any sum of loops centred at x_0 can be mapped to from the fundamental group, as that is what the fundamental group is, the collection of all loops centred at any point (remember that X is path-connected). Hence, we’ve proven that \phi is surjective.

4. Kernel: We need to prove that the kernel is the commutator subgroup. We can see why \gamma\gamma'\overline{\gamma}\overline{\gamma'} belongs to the kernel. This is because h([\gamma][\gamma'][\overline{\gamma}][\overline{\gamma'}]) is homologous to \gamma+\gamma'-\gamma-\gamma'. Now we need to prove that the kernel is a subset of the commutator subgroup.

I’m not typing up the inclusion of the kernel in the commutator subgroup, but it can be found here.

A foray into Algebraic Combinatorics

I’m trying to understand this paper by Alexander Postnikov. This post is mainly a summary of some the concepts that I do not understand. Some examples.

  • Grassmannian- A Grassmannian G(r,V) of a vector space V is a space that parametrizes all the r dimensional subspaces of V. For instance, G(1,\Bbb{C}^n) would be \Bbb{P}^{n-1}. Why do we need Grassmannians? Because we need a continuous, and hopefully smooth way to characterize the r dimensional subspaces of V. An example would be the tangent spaces on a real m-manifold M. The map \phi which maps x\in M to the tangent space at x is \phi:M\to G(m,\Bbb{R}^m). Some interesting things to note here. First that the tangent space of any manifold has a dimension that is equal to the dimension of the manifold. This much we know. Let us assume for easy visualization that the space in which the manifold and its tangent spaces have been embedded have a dimension bigger than m. What we’re doing here is that we’re mapping each x\in M to the parameter corresponding to the tangent space at x. In general this map may not even be surjective. Because as x changes slightly, so does the parameter corresponding to the tangent space at that points, we have a feel for why this map may be continuous.
  • Plucker coordinates- This is a way to assign six homogeneous coordinates to each line in \Bbb{P}^3. How does one go about doing this, and why is it useful? A brilliant explanation is given on the Wikipedia page for Plucker coordinates. Say we take a line in \Bbb{R}^3. It is uniquely determined by 2 points (say x and y). However, is it uniquely determined by the vector between those two points? No. This vector can be translated and placed anywhere. Hence, we need both the vector between the two points and some sort of an indication as to where this vector lies with respect to the origin. One such indication would be the cross product of the two points x and y. The direction would give the orientation of the plane containing x and y, and the magnitude would give the distance that x and y are from the origin. Hence, we need six coordinates- three for the vector between x and y, and three for x\times y. Will these six coordinates uniquely describe the line? Yes. The direction will specify a plane that the three points x, y and 0 can lie in, and the magnitude will specify how far in that plane the vectors x and y lie. The vector x-y along with the direction of x\times y will specify exactly where x and y lie with respect to 0.

How we shall talk a little about the formal definition of Plucker coordinates. In \Bbb{P}^3, let (x_0,x_1,x_2,x_3) and (y_0,y_1,y_2,y_3) be two coordinates. Let p_{ij}=\begin{vmatrix} x_i&y_i\\ x_j&y_j\end{vmatrix}.

There are {4\choose 2}=6 ways of selecting two elements from \{0,1,2,3\}. Why do we need i\neq j? Because if i=j, then p_{ij}=0. This is because the second row would just be the same as the first row. Also, p_{ij}=-p_{ji}. This is because we’d be exchanging two columns. Hence, there are only {4\choose 2} independent coordinates here. This ratifies the assertion that we need just 6 homogeneous coordinates to specify a line in \Bbb{P}^3.

  • Matroid- A matroid is a structure that generalizes linear independence in vector spaces. More formally, it is a pair (E,I), where E is a finite set, and I is a set of subsets of E which are “linearly independent”. The first property is that \emptyset is a linearly independent set. Secondly, if A is linearly independent, and A'\subset A, then A' is linearly independent too. This is called the hereditary property. Third, if A and B are linearly independent sets, and if A contains more elements than B, then there is an element in A that can be added to B to give a larger linearly independent subset than B.

The first two properties of linearly independent sets carry over smoothly from our intuition of what linearly independent sets are. The third property seems strange, but on a little thinking becomes clear. Think of the two independent sets \{i\} and \{i+j,i-j\} in \Bbb{R}^2. We can add either of i+j or i-j to i to create a larger linearly independent set. However, what if the smaller set was contained within the bigger set; i.e. what if the two sets were \{i\} and \{i,j\}? We could still find j to i to create a bigger linearly independent set. On a little experimentation, you will be able to convince yourself that this is a natural property of linearly independent sets of vector spaces.

Now we discuss some more properties of matroids. A subset that is not independent is called, you guessed correctly, a dependent set. A maximal independent set, one that becomes dependent on the addition of any element outside of it, is called a basis. A minimal dependent set, which becomes independent on the removal of any element, is called a circuit. Does a basis, on addition of an element, become a circuit? I don’t know. But I intend to find out.

p-adic Analysis: A primer

Today I’m going to be studying this paper by Theodor Christian Herwig to learn about p-adic analysis.

An absolute value on a field \Bbb{K} is a map |.|: \Bbb{K}\to \Bbb{R}_+ which satisfies the usual absolute value conditions; namely |x|\geq 0 and |x|=0\iff x=0; |xy|=|x||y|, and |x+y|\leq |x|+|y|. An absolute value which is non-archimedian also satisfies the following additional property: |x+y|\leq\max\{|x|,|y|\}.

Why’re we doing all this? Why are we trying to define a function, that serves the purpose of a norm in most settings, on an algebraic object that might not have any such structure? This is because we want to do topology on such algebraic objects. We want to be able to study a particular object from as many angles and perspectives as we want. An analogy would be representation theory: trying to study groups using properties from Linear Algebra.

Why “non-archimedian” though? Where does this term even come from? Archimedian means that for any \epsilon>0, we can construct an n\in\Bbb{B} such that \frac{1}{n}<\epsilon. In other words, we can construct arbitrarily large integers. When we have a non-Archimediam valuation, what we’re saying is that we cannot build arbitrarily large integers. For example, let N_1 and N_2 be positive integers. Then we should have N_1+N_2>\max\{N_1,N_2\}. However, this is not the case. Numbers seem to behave “funny” here. This is all that non-Archimedian implies.

A p-adic valuation v_p:\Bbb{Z}-\{0\}\to \Bbb{R} is defined in the following way: v_p(a) is the largest power of prime p that divides a. This map can be extended to \Bbb{Q}^\times in the natural way: v_p(a/b)=v_p(a)-v_p(b).

We can see that v_p has the following two properties: V_p(xy)=v_p(x)+v_p(y) and v_p(x+y)\geq \min\{v_p(x),v_p(y)\}. Clearly, these properties are not suggestive of the kind of non-archimedian absolute value properties we’re looking for. We shall rectify that now.

Let |x|_p=p^{-v_p(x)}. Hence, if p^n|x, where n is the highest power of p that divides x, then we map x\to \frac{1}{p^n}. Is this the non-archimedian absolute value that we were looking for? Yes. Just checking for the non-archimedian property, we see that |x+y|=\max\{|x|,|y|\} when at least one of x and y is a multiple of p, and is less than \max\{|x|,|y|\} if neither are multiples of p but their sum x+y is.

Let us now solve an exercise that illustrates how this absolute value metric works really well. Find a sequence of real numbers that converges to 32, 7-adically. Answer- Consider the sequence \{a_n\}_{n\in\Bbb{N}}=\{32+7^n\}_{n\in\Bbb{N}}. This is clearly a divergent sequence under the Euclidean metric- a metric that we’re most used to. However, here we see that |a_n-32|_p=|7^n|_p=\frac{1}{7^n}. Clearly, |a_n-32|\to 0. Hence, we see that this sequence does indeed converge to 32 under the 7-adic norm.

Wait. Norm? How do we know that the p-adic absolute value function is a norm? We’ll check for triangle inequality. Clearly |x-y|\leq \max\{|x-z|,|z-y|\}. Hence, as both |x-z| and |z-y| are positive numbers, it follows that |x-y|\leq |x-z|+|z-y|. Hence, the p-adic absolute value function is indeed a norm.

From this point on, |x| will be the p-adic norm on x, and not the Euclidean norm. Now we shall prove that if |x|\neq |y|, then |x+y|=\max\{|x|,|y|\}. There are two major steps in this proof. The first is to prove that |-x|=|x|. How do we see that? If p^n|x then p^n|-x too. Hence, v_p(x)=v_p(-x). Therefore, |-x|=p^{v_p(-x)}=p^{v_p(x)}=|x|.

The second “trick” in this proof is the following: let |y|>|x|. Now |y|=|y+x-x|\leq \max\{|y+x|,|-x|\}=\max\{|y+x|,|x|\}. As |y|>|x|, we must have |y|\leq |x+y|. Now remember that by the non-archimedian property of this norm, we have |x+y|\leq |y| (note that \max\{|x|,|y|\}=|y|). We’re therefore done.

A corollary from the above theorem says that in such a p-adic space, every triangle is isoceles. What does this mean? Say we have three points x,y and z. Why do at least two of |x-y|,|x-z| and |y-z| have to be the same? Assume that there are two sides that are not the same. Take the third side; which is the sum of the two. Now use the theorem proved above. In fact, we can conclude something stronger: the sides of the isoceles triangle that are of the same length are of a length greater than or equal to the third side.

Now we will begin a study of some non-trivial properties of the p-adic metric space.

  •  If a\in B(x,r), then B(a,r)=B(x,r). This goes completely against any intuition that the Euclidean metric may have given us. We will now prove this non-intuitive fact. Let p\in B(x,r). Then |x-p|<r. Let us now calculate |a-p|. We know that |a-p|\leq\max\{|a-x|,|x-p|\}<r. Hence, we have B(a,r)\subset B(x,r). Similarly, we can prove that B(x,r)\subset B(a,r), which would then imply that B(x,r)=B(a,r).
  • Any open set (ball) B(x,r) is closed. How do we see that? Take any boundary point b of B(x,r). Then for any s>0, we have B(b,s)\cap B(x,r)\neq\emptyset. Now let p\in B(b,s)\cap B(x,r). Then |x-b|\leq\max\{|x-p|,|p-b|\}=\max\{r,s\}. We can make s<r. Hence, we have |x-b|<r, which would prove that b\in B(x,r).
  • If a,b\in\Bbb{K}, and for r,s>0 we have B(a,r)\cap B(b,s)\neq\emptyset, then B(a,r)\subset B(b,s) or B(b,s)\subset B(a,r). In fact, even the converse is true. How do we see this? For any p\in B(a,r)\cap B(b,s), we have B(p,r)=B(a,r) and B(p,s)=B(b,s). If s<r, then B(p,s)\subset B(p,r), which implies B(b,s)\subset B(a,r). Similarly, if r<s, we have B(a,r)\subset B(b,s). What if s=r? Then we have B(a,r)=B(b,s). The converse is obvious. If one set is a subset of another, then their intersection is trivially non-emptyset, assuming the sets are non-empty too.

We will discuss the completion of \Bbb{Q} to \Bbb{Q}_p in the next post.

 

Decomposition of Vector Spaces

 

Let E_1, E_2,\dots, E_s be linear transformations on an n-dimensional vector space such that I=E_1+E_2+\dots+E_s and E_iE_j=0 for i\neq j. Then V=E_1V\oplus E_2V\oplus\dots\oplus E_sV.

How does this happen? Take the expression I=E_1+E_2+\dots+E_s and multiply by any v\in V on both sides. We see that v=E_1v+E_2v+\dots+E_sv. Hence any vector v can be expressed as a sum of elements in E_iV for i\in\{1,2,\dots,s\}.

Why do we have a direct sum decomposition? Let v_1+v_2+\dots+v_s=0 for v_i\in E_iV. Then consider E_k(v_1+v_2+\dots+v_s)=E_k(0)=0. For any v_i where i\neq k, $v_i=E_i v$ for some v\in V. Hence E_kv_i=E_kE_iv=0.v=0. Hence, we have E_kv_k=0. Now v_k=E_kv' for some v'\in V. Hence E_kv_k=E_k^2v'. Note that E_k^2=E_k (just multiply the expression I=E_1+E_2+\dots+E_s by E_k on both sides). Hence, we have E_k v'=0. Now E_kv'=v_k. Hence, we have v_k=0. This is true for all k\in\{1,2,\dots,s\}. Hence, all the v_i's=0, which proves that we have a direct sum decomposition of V=E_1V\oplus E_2V\oplus\dots\oplus E_sV.

Why is all this relevant? Because using the minimal polynomial f(x)=p_1(x)^{e_1}\dots p_s(s)^{e_s} of any transformation T\in L(V,V), we can construct such E_i‘s which satisfy the above two conditions, and can hence decompose the vector space as a direct sum of s subspaces. Moreover, these subspaces have the additional property that they’re T-invariant. Each E_i=f(x)/p_i(x)^{e_i}

Minimal Polynomials of Linear transformations

I’m prepared to embarrass myself by writing about something that should have been clear to me a long time ago.

This is regarding something about the minimal polynomials of linear transformations that has always confused me. Let T\in L(V,V), where V is an n-dimensional vector space. Let us also assume that T has \{v_1,v_2,\dots,v_n\} as distinct eigenvectors, but the corresponding n eigenvalues may not be distinct. If the eigenvalues are \{a_1,a_2,\dots,a_k\} where k\leq n, it is then known that (x-a_1)(x-a_2)\dots (x-a_k) is the minimal polynomial of T.

We know that as polynomials, (x-a_1)(x-a_2)\dots (x-a_k) and (x-a_2)(x-a_1)\dots (x-a_k) are the same (note that I’ve exchanged the places of a_1 and a_2. However, when we substitute x=T, are (T-a_1I)(T-a_2I)\dots (T-a_kI) and (T-a_2T)(T-a_1T)\dots (T-a_kT) also the same? Remember that matrices are in general not commutative. In fact, if for matrices A and B we have AB=0, then it is not necessary that BA=0 too.

An earlier exercise in the book “Linear Algebra” by Curtis says that for f,g\in F[x], f(T)g(T)=g(T)f(T). Why is this? Because we’re ultimately going to get the same polynomial in terms of T. My mental block came from the fact that I was imagining T-a_iI to be a matrix which I didn’t know much about. I forgot that T-a_iI is a decomposition of a single matrix into two, and matrix multiplication, like the multiplication of complex numbers, is distributive. Hence everything works out as planned.

Flat modules

This post is going to be about flat modules and flat families. A brief excursion into Commutative Algebra often brings up the following fact: a module M over the ring R is flat if for every inclusion N'\subset N of R-modules the induced map M\otimes_R N'\to M\otimes_R N is again an inclusion. Beyond this intuition often goes for a toss. Why the name “flat”? What even is happening? This post is intended to remedy that for both the reader and the scribe.

First, we write about the fact that the tensor product is right exact, and not left exact. What does this mean? This means that if we have a short exact sequence 0\to A\xrightarrow{f} B\xrightarrow{g} C\to 0, then for any module M, the sequence A\otimes M\xrightarrow{f\otimes id} B\otimes M\xrightarrow{g\otimes id} C\otimes M\to 0 is exact. Why is this true? I am going to try and reproduce the explanation given in Atiyah-Macdonald’s “Commutative Algebra”, albeit in a more reader-friendly way. First of all, we notice that \text{Hom}(A\times B, C)\cong \text{Hom} (A, \text{Hom} (B,C)). Now as each bilinear map from A\times B\to C factors through A\otimes B, we have \text{Hom} (A\times B, C)\cong \text{Hom} (A\otimes B, C). Hence, \text{Hom} (A\otimes B, C)\cong \text{Hom} (A, \text{Hom} (B,C)). So where do we go from here?

The \text{Hom}(-, \text{Hom}(P,Q)) functor maps the sequence

0\to A\xrightarrow{f} B\xrightarrow{g} C\to 0 to

0\to \text{Hom}(C,\text{Hom}(P,Q))\to \text{Hom}(B,\text{Hom}(P,Q))\to \text{Hom}(A,\text{Hom}(P,Q)).

Clearly \text{Hom}(-, \text{Hom}(P,Q)) is a contravariant functor. The surjective function g becomes injective under this mapping (please try to parse what this means). This is because any change in an element of \text{Hom}(C,\text{Hom}(P,Q)) will imply a change in the corresponding element of \text{Hom}(B,\text{Hom}(P,Q)). This is precisely because of the surjection of g. The same element in C, if now being mapped differently, will imply a change of elements in \text{Hom}(C,\text{Hom}(P,Q)). Does f become a surjection? No. It does so only under special circumstances. The structure (restrictions) of B might prevent all elements of \text{Hom}(A,\text{Hom}(P,Q)) from being mapped to. Anyway. We realize that \text{Hom}(C,\text{Hom}(P,Q))\cong \text{Hom}(A\otimes P, Q), and then we take away the \text{Hom} functor. Does the injection now become a surjection? Atiyah-Macdonald asks to refer to (2.9). Turns out, yes it does. Basically, if a map is not surjective, you can can have two different maps: you map the image in the same way in both maps. However, outside of the image, you can make changes, such that the composition still remains the same. This is because the image is a submodule, and you have elements not completely dependant on this submodule outside of it. Hence, you don’t have an injective map. The only way to have an injective map is to have a surjective map to start with. Please make sense of this explanation. So we’ve now concluded that the Tensor functor is right exact. It preserves surjection, but does not preserve injection.

Consider the exact sequence E\to F\to G. A module M is flat if E\otimes M\to F\otimes M\to G\otimes M is also exact. To emphasize that this is an important property, we have to give an example of a module that is not flat. Consider the exact sequence 0\to\Bbb{Z}\xrightarrow{2x}\Bbb{Z}. This map is clearly injective, as the exactness of the sequence implies. However, the sequence 0\to \Bbb{Z}\otimes \Bbb{Z}_2\xrightarrow{2x\otimes id} \Bbb{Z}\otimes \Bbb{Z}_2 is not exact, in that the map 2x is no longer injective. Why is that? This is because every element a\otimes b\in\Bbb{Z}\otimes \Bbb{Z}_2 is mapped to 0. This is precisely because of the multiplication with 2. Hence, as there exists a short exact sequence which on tensoring with \Bbb{Z}_2 no longer remains exact, \Bbb{Z}_2 is not a flat module.

Hence, flatness is a special property. We’re now going to investigate some implications of a module M being flat.

a) If M is a flat module, and the functor T_M is a functor on the category \mathbf{Mod} which maps a module A to A\otimes M, then T_M is an exact functor. This is just a re-phrasing of the definition of a flat module.

b) If f:A\to A' is injective, then f\otimes id: A\otimes M\to A'\otimes M is injective too. How? It’s just a re-wording of the fact that the functor T_M is exact! Think about it.

c) Now we come to the real heavyweight implication. M is flat \iff “if f:A\to A' is injective and A,A' are finitely generated, then f\otimes id:A\otimes M\to A'\otimes M is injective too”. The forward direction is obvious. It follows from the definition of M being flat. In fact we do not even need A,A' to be finitely generated. But what about the converse? Essentially we want to be able to take injective maps between modules B,B' that are not finitely generated, and then be able to conclude that the corresponding map between B\otimes M\to B'\otimes M is also injective. How does the proof go? If the map between B\otimes M\to B'\otimes M is not injective, then there exists a non-zero element \sum (x_i\otimes n_i) such that \sum f(x_i)\otimes n_i=0. Clearly the x_i‘s are finite in number. Now consider the submodule \langle x_i\rangle\subset B. This is finitely generated, and clearly the elements x_i are non-zero. Now consider \langle f(x_i)\rangle\subset B'. This too is finitely generated. Also, clearly \langle x_i\rangle\to \langle f(x_i)\rangle is an injective map. By assumption, \langle x_i\rangle\otimes M\to \langle f(x_i)\rangle\otimes M too is an injective map. Now \sum (x_i\otimes n_i) is an element of \langle x_i\rangle\otimes M. Also, as \langle x_i\rangle\otimes M is nonzero in B\otimes M, it is nonzero in \langle x_i\rangle\otimes M too. Hence, we map a non-zero element to 0, which is a contradiction as the map \langle x_i\rangle\to \langle f(x_i)\rangle is injective by assumption. Hence, we have proved that the map B\otimes M\to B'\otimes M is also injective.
Now remember that the tensor functor is right-exact, but not left-exact. There is a functor called the Tor functor which measures how “far” the tensor functor is from being left-exact. We shall talk about this functor in a future blog post.

Open basis for Quasi-Projective Varieties

Today we’re going to generate a basis for a quasi-projective variety in the Zariski topology. These open sets will be affine charts (affine charts are open sets in the Zariski topology as the zth affine chart, for instance, is the complement of the variety z=0). Hence, as every point lies in an affine chart, we can that every quasi-projective variety locally looks like an affine variety, which allows us to do all kinds of affine-type calculations and draw affine-like conclusions.

We first notice that if Q\subseteq \Bbb{A}^n is an affine variety, and f\in C[Q], then Q\setminus V(f) is also an affine variety. This is because it can be bijectively mapped to an affine variety in \Bbb{A}^{n+1}. So we’re not working with objects anymore. We’re working with isomorphism classes. What is the variety that Q maps to? It is the variety that is the image of the map x\to \left(x,\frac{1}{f(x)}\right). How do we know that the image is an affine variety? Let I(Q) be the corresponding ideal for Q in \Bbb{A}^n. Then the ideal for its image is the the one generated by I(V)\cup (zf(x)-1). How does this work? When we just consider I(Q) in $\Bbb{A}^{n+1}$, than all values of z satisfy these equations. With the inclusion of the last equation, what happens is that for each n-tuple, straight up imported from \Bbb{A}^n, there is only one value of z that is attached. Hence, each n-tuple gets a unique z coordinate.

Now we try and generate the basis. Any quasi projective variety Q can be embedded in \Bbb{P}^n for some n. For any affine chart U_i, consider Q\cap U_i. Q is the union \bigcup\limits_{i=0}^n (V\cap U_i). Now each V\cap U_i is a quasi-projective variety in \Bbb{A}^n. Why is this? Because you you can take the ideal corresponding to the variety in projective space, and make the z_i=1 substitution. You’ll get an ideal, and hence a corresponding algebraic set. Now remember that this algebraic set is a quasi-projective variety, and not a traditional affine variety. Hence, it is the intersection of a closed set and an open set. Say it is something of the form V(F_1,F_2,\dots,F_s)\setminus V(G_1,G_2,\dots,G_t). This set is covered by open sets of the form V(F_1,F_2,\dots,F_s)\setminus V(G_i), where i\in\{1,2,\dots,t\}. How? First of all, notice that V(F_1,F_2,\dots,F_s)\setminus V(G_i) is smaller than V(F_1,F_2,\dots,F_s)\setminus V(G_1,G_2,\dots,G_t). This is because we’re intersecting V(F_1,F_2,\dots,F_s) with the complement of a bigger set, and hence intersecting it with a smaller set. Now we have to prove that we indeed have a cover. Let p\in V(F_1,F_2,\dots,F_s)\setminus V(G_1,G_2,\dots,G_t). Then p\in V(F_1,F_2,\dots,F_s) and p\notin V(G_1,G_2,\dots,G_t). Hence, there has to exist a G_i such that p\notin V(G_i). This implies that p\in V(F_1,F_2,\dots,F_s)\setminus V(G_i). Hence, we have an open cover. Taking such open covers over all affine charts, we see that the quasi-projective variety has been covered by open sets.

Do the affine open sets count, when what we wanted was projective-open sets? Can open sets be “projectively completed” to create open sets in projective space? Yes they can be. How? The complement of open sets in affine space would be closed sets. Take the projective closure of those affine closed sets and then take their complement. The complement of the projective closure will contain the affine open sets. Hence we’re done.