An attempted generalization of the Third Isomorphism Theorem.

I recently posted this question on math.stackexchange.com. The link is this.

My assertion was “Let G be a group with three normal subgroups K_1,K_2 and H such that K_1,K_2\leq H. Then (G/H)\cong (G/K_1)/(H/K_2). This is a generalization of the Third Isomorphism Theorem, which states that (G/H)\cong (G/K)/(H/K), where K\leq H.”

What was my rationale behind asking this question? Let G be a group and H its normal subgroup. Then G/H contains elements of the form g+H, where g+h=(g+\alpha h)+ H, for every \alpha\in Z.

Now let K_1,K_2 be two normal subgroups of G such that K_1,K_2\leq H. Then G/K_1 contains elements of the form g+K_1 and H/K_2 contains elements of the form h+K_2. Now consider (G/K_1)/(H/K_2). One coset of this would be \{[(g+ all elements of K_1)+(h_1+all elements of K_2)],[(g+ all elements of K_1)+(h_2+all elements of K_2)],\dots,[(g+ all elements of K_1)+(h_{|H/K_2|}+all elements of K_2)]\}. We are effectively adding every element of G/K_1 to all elements of H. The most important thing to note here is that every element of K_1 is also present in H.

Every element of the form (g+ any element in K_1) in G will give the same element in G/K_1, and by extension in (G/K_1)/(H/K_2). Let g and g+h be two elements in G (h\in H) such that both are not in K_1. Then they will not give the same element in G/K_1. However, as every element of H is individually added to them in (G/K_1)/(H/K_2), they will give the same element in the latter. If g and g' form different cosets in G/H, then they will also form different cosets in (G/K_1)/(H/K_2). This led me to conclude that (G/H)\cong (G/K_1)/(H/K_2).

This reasoning is however flawed, mainly because H/K_2 need not be a subgroup of G/K_1. Hence, in spite of heavy intuition into the working of cosets, I got stuck on technicalities.

Generalizing dual spaces- A study on functionals.

A functional is that which maps a vector space to a scalar field like \Bbb{R} or \Bbb{C}. If X is the vector space under consideration, and f_i:X\to \Bbb{R} (or f_i:X\to\Bbb{C}), then the vector space \{f_i\} of functionals is referred to as the algebraic dual space X^*. Similarly, the vector space of functionals f'_i:X^*\to \Bbb{R} (or f'_i:X^*\to\Bbb{C}) is referred to as the second algebraic dual space. It is also referred to as X^{**}.

How should one imagine X^*? Imagine a bunch of functionals being mapped to \Bbb{R}. One way to do it is to make all of them map only one particular x\in X. Hence, g_x:X^*\to \Bbb{R} such that g_x(f)=g(f(x)). Another such mapping is g_y. The vector space X^{**} is isomorphic to X.

My book only talks about X, X^* and X^{**}. I shall talk about X^{***}, X^{****}, and X^{**\dots *}. Generalization does indeed help the mind figure out the complete picture.

Say we have X^{n*} (n asterisks). Imagine a mapping X^{n*}\to \Bbb{R}. Under what conditions is this mapping well-defined? When we have only one image for each element of X^{n*}. Notice that each mapping f:X^{n*}\to \Bbb{R} is an element of the vector space X^{(n+1)*}. To make f a well-defined mapping, we select any one element a\in X^{(n-1)*}, and determine the value of each element of X^{n*} at a. One must note here that a is a mapping (a: X^{(n-2)*}\to\Bbb{R}). What element in X^{(n-2)*} that a must map to \Bbb{R} should be mentioned in advance. Similarly, every element in X^{(n-2)*} is also a mapping, and what element it should map from X^{(n-3)*} should also be pre-stated.

Hence, for every element in X^{n*}, one element each from X^{(n-2)*}, X^{(n-3)*},X^{(n-4)*},\dots ,X should be pre-stated. For every such element in X^{n*}, this (n-2)-tuple can be different. To define a well-defined mapping f:X^{n*}\to \Bbb{R}, we choose one particular element b\in X^{(n-1)*}, and call the mapping f_b. Hence,

f_b(X^{n*})=X^{n*}(b, rest of the  (n-2)-tuple ),

f_c(X^{n*})=X^{n*}(c, rest of the (n-2)-tuple), and so on.

 

By

f_b(X^{n*})=X^{n*}(b, rest of the  (n-2)-tuple),

we mean the value of every element of X^{n*} at (b, rest of the (n-2)-tuple).

Some facts, better explained, from Atiyah-Macdonald

Today we shall discuss some interesting properties of elements of a ring.

1. If a\in R is not a unit, then it is present in some maximal ideal of the ring R. Self-explanatory.

2. If a is present in every maximal ideal, then 1+xa is a unit for all x\in R. Proof: Let 1+xa not be a unit. Then it is present in some maximal ideal (from 1). Let 1+xa=m, where m is an element from the maximal ideal 1+xa is a part of. Then 1=m-xa. Hence, 1 is also a member of the maximal ideal, which is absurd.

Let’s break down this theorem into elementary steps, and present a better proof (than given on pg.6 of “Commutative Algebra” by Atiyah-Macdonald). If x\in M_1 for some maximal ideal M_1, then 1\pm xy\notin M_1 for all y\in R. Similarly, If x\in M_2 for some maximal ideal M_2, then 1\pm xy\notin M_2 for all y\in R. This argument can then be extended to the fact that if x\in all maximal ideals, then 1\pm xy\notin any maximal ideal for all y\in R. An element not there in any maximal ideal is a unit. Hence, 1\pm xy is a unit.

3. If 1-xy is a unit for all y\in R, then x is part of every maximal ideal in R. Proof: Let is assume x is not part of some maximal ideal. Then there exists some m\in that maximal ideal such that m+xy=1. This implies that m=1-xy, which is impossible as 1-xy is a unit. The same argument can be used for 1+xy.

—————————————————————————————————————-

On pg.6 of Atiyah-Macdonald, it is mentioned that if a,b,c are ideals, then a\cap (b+c)=a\cap b+a\cap c if b\subseteq a and c\subseteq a. It is not elaborated in the book, and the flippant reader may be confused. I’d like to elaborate on this concept.

a\cap (b+c) consists of those elements in b+c which are also there in a. Now elements of a\cap (b+c)=[(b+c) wrt those elements of both b and c that are in a] \bigcup [(b+c) wrt those elements of b that are in a and those elements of c that are not] \bigcup [(b+c) wrt those elements of b that are not in a and those elements of c that are] \bigcup [(b+c) wrt those elements of both b and c that are not in a]

The second and the third terms are null sets, as can be easily seen.

The fourth term is NOT necessarily empty. However, it becomes an empty set if b,c\subseteq a. It may also become an empty set under other conditions which ensure that if both b and c are not in a, then b+c\notin a.

In summation, a\cap (b+c)=a\cap b+a\cap c is definitely true when b,c\subseteq a. However, it is also true under other conditions which ensure that the fourth term is an empty set.

———————————————————————————————————-

I have extended a small paragraph in Atiyah-Macdonald to a full-fledged exposition: (a,b,c and d are ideals in commutative ring R)

1. a\cup (b+c)\subseteq(a\cup b)+(a\cup c)– Both sides contain all elements of a and b+c. Remember that b\cup c\subseteq b+c. However, the right hand side also contains elements of the form a+b and a+c, which the left hand side does not contain.

2. a\cap (b+c)– This has already been explained above.

3. a+(b\cup c)=(a+b)\cup (a+c)– Both are exactly the same.

4. a+ (b\cap c)\subseteq (a+b)\cap (a+c)– There might be b_1,c_1\notin b\cap c such that a'+b_1=a''+c_1. However, any element in a+(b\cap c) will definitely be present in (a+ b)\cap (a+c).

5. a(b\cup c)\supseteq ab\cup ac– LHS contains elements of the form a'b_1+a''c_1, which the RHS doesn’t. In fact, LHS is an ideal while the RHS isn’t. You might wonder how LHS is an ideal. I have just extended the algorithm used to make AB an ideal when A and B are both ideals to situations in which A is an ideal and B is any subset of R.

6. a(b\cap c)\subseteq ab\cap ac– The RHS may contain elements of the form a'b_1=a''c_1 for b_1,c_1\notin b\cap c.

7. a(b+c)=ab+ac– Easy enough to see.

8. (a+b)(c\cap d)=(c\cap d)a+(c\cap d)b\subseteq(ac\cap ad)+(bc\cap bd)

From this formula, we have (a+b)(a\cap b)\subseteq (a\cap ab)+(b\cap ab)\subseteq ab.
This fact is mentioned on pg.7 of Atiyah-Macdonald.

9. (a+b)(c\cup d)= (c\cup d)a+(c\cup d)b= (ca\cup da)+(cb\cup db).

From this formula, we have (a+b)(a\cup b)=(a^2\cup ab)+(b^2\cup ab)\supseteq ab.

The existence or inexistence of a maximal element

Have you ever wondered why the real number line does not have a maximal element?
Take \Bbb{R}. Define an element \alpha. Declare that \alpha is greater than any element in in \Bbb{R}. Can we do that? Surely! We’re defining it thus. In fact, \alpha does not even have to be a real number! It can just be some mysterious object that we declare to be greater than every real number. Note that \alpha is greater than real number, but it is not the maximal element of \Bbb{R}, as for that it will have to be a part of \Bbb{R}. Why can’t \alpha be a part of \Bbb{R}? We’ll see in the next paragraph. 

However, it is when we assert that \alpha has to be a real number that we begin to face problems. If \alpha is a real number, then so is \alpha+1. Thereby, we reach a contradiction, showing that no real number can exist which is greater than all other real numbers.

Another approach is to take the sum of all real numbers. Let that sum be \mathfrak{S}, which is greater than any one real number. However, as the sum of real numbers is a real number by axiom (\Bbb{R} is a field), \mathfrak{S} is also a real number, which is smaller than the real number \mathfrak{S}+1. If we did not have the axiom that the sum of real numbers should be a real number, then we’d be able to create a number greater than all real numbers. The same argument would work if we were to multiply all real numbers.

 

Now I’d like to draw your attention to the proof of the fact that every ring must have a maximal ideal, as given on pg. 4 of the book “Commutative Algebra” by Atiyah-Macdonald. The gist of the proof is: take every ideal which is a proper subset of the ring, and find its union. This union is the maximal ideal.

Why this proof works is that we wouldn’t know which element to add to make the ideal bigger. If we could construct a bigger ideal for any ideal we choose, we can prove that no maximal ideal exists. But whatever element we choose to add to the previous ideal, we have no reason to suspect that that element does not already exist in it.

Let us generalize this argument. Let us take a set of elements, define an order between the elements, and then declare the existence of a maximal element which is part of the set. If we cannot prove that a bigger element exists, then there is no contradiction, and hence that element is indeed a maximal element of the set. This argument works if we were to prove that every ring has a maximal ideal, and does not if we were to prove that \Bbb{R} has a maximal element.

 

Breaking down Zorn’s lemma

Today I’m going to talk about Zorn’s lemma. No. I’m not going to prove that it is equivalent ot the Axiom of Choice. All I’m going to do is talk about what it really is. Hopefully, I shal be able to create a visually rich picture so that you may be able to understand it well.

First, the statement.

“Suppose a partially ordered set P has the property that every chain (i.e. totally ordered subset) has an upper bound in P. Then the set P contains at least one maximal element.”

Imagine chains of elements. Like plants. These may be intersecting or not. Imagine a flat piece of land, and lots of plants growing out of it. These may grow straight, or may grow in a crooked fashion, intersecting. These plants are totally ordered chains of elements. Now as all such chains have a maximal elements, imagine being able to see the tops of each of these plants. Not three things: 1. Each tree may have multiple tops (or maximal elements). 2. There may be multiple points of intersection between any two trees. 3. Different plants may have the same maximal element.

Moreover, there may be small bits of such plants lying on the ground. These are elements that are not part of any chain. If any such bit exists on the ground, then we have a maximal element. Proof: If it could be compared to any other element, it would be on a chain. If it can’t be compared to any other element, it’s not smaller than any element.

Let us suppose no such bits of plants exist. Then a maximal element of any chain will be the maximal element of the whole set! Proof: It is not smaller than any element in its own chain. It can’t be compared with the chains which do not intersect with this chain. And as for chains that intersect with this chain, if the maximal element is the same, then we’re done. If the maximal elements are not the same, then too the two maximal elements can’t be compared. Hence, every distinct maximal element is a maximal element of the whole set.

Assuming that the set is non-empty, at least one plant bit or chain has to exist. Hence, every partially ordered set has at least one maximal element. The possible candidates are plant bits (elements not in any chain) and plant tops (maximal elements of chains).

The mysterious linear bounded mapping

What exactly is a linear bounded mapping? The definition says T is called a linear bounded mapping if \|Tx\|/\|x\|\leq c. When you hear the word “bounded”, the first thing that strikes you is that the mappings can’t exceed a particular value. That all image points are within a finite outer covering. That, unfortunately, is not implied by a linear bounded mapping.

The image points can lie anywhere in infinite space. It is just that the vectors with norm 1 are mapped to vectors whose norms have a finite upper bound (here the upper bound is c. One may also refer to it as \|T\|). Say a is a vector which is mapped to Ta. Then sa will be mapped to sTa, where s is a scalar. This is how the mapping is both bounded (for vectors with norm 1) and linear (T(\alpha x+\beta y)=\alpha Tx+\beta Ty).

Could this concept be generalized? Of course! We could have quadratic bounded mappings: vectors with norm 1 are mapped in a similar way as linear bounded mappings, and T(\alpha x)=\alpha^2 Tx. What about T(\alpha x+\beta y)? Let r be a scalar such that \|r(\alpha x+\beta y)\|=1. Then T(\alpha x+\beta y)=\frac{1}{r^2} T(r(\alpha x+\beta y)). Similarly we could have cubic bounded mappings, etc.

Then why are linear bounded mappings so important? Why haven’t we come across quadratic bounded mappings, or the like? This is because linear bounded mappings are definitely continuous, whist not much can be said about other bounded mapppings. Proof: \|T(x-x_0)\|\leq c\|x-x_0\|\implies \|Tx-Tx_0\|\leq c\|x-x_0\| implies that linear bounded mappings are continuous. Does this definitely prove that quadratic bounded mappings are not continuous? No. All that is shown here is that we can’t use this method to prove that quadratic or other bounded mappings are continuous.  

The “supremum” norm

Today I shall speak on a topic that I feel is important.

All of us have encountered the “\sup” norm in Functional Analysis. In C[a,b], \|f_1-f_2\|=\sup|f_1x-f_2x| for x\in X. In the dual space B(X,Y), \|T_1x-T_2x\|=\sup\limits_{x\in X}\frac{\|(T_1-T_2)x\|}{\|x\|}. What is the utility of this “sup” norm? Why can’t our norm be based on “inf” or the infimum? Or even \frac{\sup+\inf}{2}?

First we’ll talk about C[a,b]. Let us take a straight line on the X-Y plane, and a sequence of continuous functions converging to it. How do we know they’re converging? Through the norm. Had the norm been \inf, convergence would only have to be shown at one point. For example, according to this norm, the sequence f_n=|x|+\frac{1}{n} converges to y=0. This does not appeal to our aesthetic sense, as we’d want the shapes of the graphs to gradually start resembling y=0.  

Now let’s talk about B(X,Y). If we had the \inf norm, then we might not have been able to take every point x\in X and show that the sequence (T_n x), n\in\Bbb{N} is a cauchy sequence. So what if we cannot take every x\in X and say (T_n x) is a cauchy sequence? This would crush the proof. How? Because then \lim\limits_{n\to\infty} T_n would not resemble the terms of the cauchy sequence at all points in X, and hence we wouldn’t be able to comment on whether \lim\limits_{n\to\infty} T_n is linear for all x\in X or not. Considering that B(X,Y) contains only bounded linear operators, the limit of the cauchy sequence (T_n) not being a part of B(X,Y) would prove that B(X,Y) is not complete. Hence, in order for us to be able to prove that \lim\limits_{n\to\infty} T_n is linear and that B(X,Y) is complete, we need to use the \sup norm.

On making a choice between hypotheses

At 15:33, Peter Millican says “How can any criterion of reliable knowledge be chosen, unless we already have some reliable criterion for making that choice?”

What does this actually mean? Say I have two hypotheses- A and B. One of them is true, whilst the other is false. But I don’t know which is which. How should I choose? Maybe I should choose the one which corroborates the existing body of knowledge to a greater extent. Or I could choose the one which corroborates my personal experiences to a greater extent (my personal experiences and the existing body of knowledge would certainly overlap, but the former may or may not be a subset of the latter). Clearly, we have a problem here. On what criterion should I base my selection. This is precisely the question being discussed here.

This, I feel brings us to an even more fundamental question: How do we make choices? Let us suppose I have to choose between two identical bars, based on just sight. One is a gold bar, and the other an imitation. As they’re exactly identical, I have no reason to choose one over the other. This of course is true only after removing biases like “I should choose the bar on the left as the left is my lucky side; Today is the 13th, and I had an accident on the 13th while turning to the right, hence I should choose left, etc”. Hence, the criterion for selection, without the addition of any further knowledge, would be arbitrary.

However, when making choices between two non-identical hypotheses, we have a definite bias. I might say my textbook supports A. Or that my personal experiences support B. Now based on whether I trust my textbook more or my personal experiences and reasoning, I shall make a choice accordingly. The criterion for making a choice in such situations is hence based on evaluating biases, of one form or another, and deciding which choice I’m naturally more inclined or biased towards. A lot of this evaluation takes place implicitly in the brain in our daily lives.

This of course is assuming the choice-making process is completely rational, which it is not in humans. I could, for example, arbitrarily choose the option which the weighted sum of my biases do not support.

The factoring of polynomials

This article is about the factorization of polynomials:

First, I’d like to discuss the most important trick that is used directly or implicitly in most of the theorems given below. Let us consider the polynomial f(x)=(a_mx^m+a_{m-1}x^{m-1}+\dots+a_0)(b_nx^n+b^{n-1}x^{n-1}+\dots+b_0). Let a_i be the first coefficient, starting from a_0, not to be a multiple of p in a_mx^m+a_{m-1}x^{m-1}+\dots+a_0. Let b_j be the first such coefficient in b_nx^n+b^{n-1}x^{n-1}+\dots+b_0. Then the coefficient of x^r in f(x) where r\leq i+j-1 is definitely a multiple of p, the coefficient of x^{i+j} is definitely NOT a multiple of p, and the coefficient of x^t where t>i+j, we can’t say anything about, except for the coefficient of x^{m+n} which is definitely not a multiple of p.

1. Gauss’s lemma (polynomial)- A primitive polynomial is one one with integer coefficients such that the gcd of all the coefficients is 1. The lemma states that if a primitive polynomial is irreducible over \Bbb{Z}, then it is irreducible over \Bbb{Q}. In other words, if a primitive polynomial has a root, it has to be an integer. The converse is also obviously true, as \Bbb{Z\subset Q}. A very interesting proof can be found here, although Proof 1 is not completely correct. The characterization of c_{i+j} should actually be \sum\limits_{k\leq m,i+j-k\leq n}{a_k b_{i+j-k}}. This is to include the possibility of i+j>m or i+j>n. Also, note that in all other coefficients a_rx^r\in f(x)g(x), we are unlikely to find a glaring contradiction (that p doesn’t divide this coefficient). This proof by explicit demonstration is indeed brilliant. But wait. This just proves that the product of primitive polynomials is primitive. It doesn’t prove that a primitive polynomial can only be factored into primitive polynomials. This proves the aforementioned. The most important thing to concentrate on here is the how to convert rational polynomials into a constant times a primitive polynomial. Any irreducible polynomial can be converted into a constant times a product of two primitive polynomials. It is only when the original polynomial is also primitive that this constant is 1.

2. Rational root theorem- It states that if a polynomial with integer coefficients has a rational root \frac{p}{q}, then p divides the constant and q divides the leading coefficient (if the polynomial is a_n x^n +a_{n-1}x^{n-1}+\dots +a_0, then p|a_0 and q|a_n). A proof is given here. I’d like to add certain things to the proof given in the article for a clearer exposition. First of all, by taking out the integral gcd of the coefficients, make the polynomial primitive. Let the gcd be g. Deal with this primitive polynomial. Using Gauss’s lemma and the proof in the article, we can easily deduce that q|\frac{a_n}{g} and p|\frac{a_0}{g}. This both these are true, and as g is an integer, we get q|a_n and p|a_0.

3. Eisenstein criterion- The statement and the proof are given here. I want to discuss the true essence of the proof. The most important condition is that p^2\not| a_0. What this essentially does is it splits p between b_0 and c_0; ie it can’t divide both. How does the splitting of p change anything? We’ll come to that. Another important condition is that p\not| a_n. This forces us to conclude that not all b_i or c_i are divisible by p. Hence, there is a first element b_r and a c_t which are not divisible by p. If p\not|b_0, then a_t is not divisible by p, which contradicts the third condition that p|a_i for i<m+n. How? Because a^t=b_0c_t+b_1c_{t-1}+\dots+b_tc_0.All terms except b_0c_t are divisible by p. This is where the splitting helped. If there was no such splitting (if p^2|b_0c_0) then a_t could have been divisible by p). Similarly, if p\not| c_0, then p\not|a^r. Remember the point that I elaborated at the very beginning of the article? Try to correlate that with this proof. Here a_{i+j} becomes a_t or a_r, as the first coefficient not to be divisible by p is b_0 or c_0.

Euclidean rings and prime factorization

Now we will talk about the factorization of elements in Euclidean rings. On pg.146 of “Topics in Algebra” by Herstein, it says:

“Let R be a Euclidean ring. Then every element in R is either a unit in R or can be written as the product of a finite number of prime elements in R.”

This seems elementary. Take any element a\in R. If it is prime, then we’re done. If it is not, then keep on splitting it into factors. For example, let a=bc. If b is prime, then we leave it as it is. If it is not (if b=fg), we split it as fgc. The same with c, and so on.

The theorem says a can be represented as the product of a finite number of primes. But what if this splitting is a never-ending process? We don’t face this problem with \Bbb{N}, as splitting causes positive integers to decrease in magnitude and there’s a lower limit to how much a positive integer can decrease. But we might face this problem with other Euclidean rings.

Circumventing this problem throws light on a fact that is often forgotten. d(a)\leq d(ab). When we take a and start splitting it, we’re decreasing the d-values of the individual factors as we continue to split them. Note that we will not split an associate into a unit and its associate. For example, if f and g are associates, and a=ft, then we will not split f=u_1 g where u_1 is a unit, as we’re only looking to split elements that are not prime. Hence, if only splittings involving units are possible for f, then we know that f is prime, and leave it as it is. Let us suppose f=pq where neither p nor q s a unit. Then d(p),d(q)<d(f), as they’re not associates. This shows that as we keep splitting non-prime elements into factors that are not units, then the d-value of each individual factor keeps strictly decreasing. This has a lower bound as d-values are positive real numbers, and we’re bound to arrive upon a finite factorization in a finite number of steps.  

What’s the deal with units then? We’ll return to this after a short discussion on d(0) and d(1).

If a\neq 0, then d(1)\leq d(1.a)=d(a) for every a\in R. Hence, d(1)=d(u) for all non-zero units u\in R (proof: let 1=u_1 b, where u_1 is a unit. Then b has to be u_{1}^{-1}, which is also a unit. The same can be said of all units. Hence, 1 is associate with all units only), and d(1)<d(a) for all non-zero non-units in R.

Now what about d(0)? If the axiom of a Euclidean ring was a=qb+r\implies d(r)<d(b) rather than d(r)<d(b) provided r\neq 0, then we could conduct some investigation into this. Let us imagine d(0) exists. Then a=1.a+0. Hence, d(0)<d(a) for all a\in R. But 0=0.0+0. Hence, d(0)<d(0), which is impossible as d:R\to \Bbb{R}^+ is a well-defined mapping. Hence, in order to facilitate the well-defined existence of d and also keep a=qb+r\implies d(r)<d(b) as an axiom used to define a Euclidean ring, we forego defining f(0).  

Now returning to our discussion, we’ve already stated that as d(0) is not defined, and 1 has the lowest d-value in R. Moreover, as all associates of 1 are units, d(u)=d(1), where u is a unit. If it were possible to split u into factors ab such that both a and b are not units, then d(a),d(b)<d(u), which is not possible. Hence, every unit u is prime in a Euclidean ring. Note that this is not a natural property of such structures, but a result of the arbitrary axiom that d(ab)\geq d(a),d(b).

Summarising the above arguments:

1. A unit is prime in a Euclidean ring R.

2. Every element in R can be split into a finite number of prime factors.

3. In order to avoid contradictions, d(0) is not defined. Also, 1 has the lowest d-value.

4. Removing the axiom d(ab)\geq d(a),d(b) would nullify a lot of these properties.