Euclidean rings and generators of ideals

This is to address a point that has just been glazed over in “Topics in Algebra” by Herstein.

In a Euclidean ring, for any two elements a,b\in R, \exists q,r\in R such that a=bq+r. Also, there exists a function d:R\to\Bbb{R} such that d(r)<d(b).

We also know that the element with the lowest d-value generates the whole ring R. The proof of this is elementary.

But what if there are more than one element with the same lowest d-value? Do both these elements generate R?

Yes. Proof: Let d(x)=d(y) such that x and y are the elements of R with the lowest d-value. Then for a third element c\in R, c=u_1 x=u_2 y. Hence, both x and y divide c. They also divide each other. Hence, x and y have to be associates. In other words, x=uy, where u is a unit in R.

Let us approach this from the opposite direction now. If x=uy, where u is a unit. Axiomatically, d(y)\leq d(uy). Hence, d(y)\leq d(x). Similarly, d(x)\leq d(y). This shows that d(x)=d(y). Therefore, whenever two elements a and b are associates, their d-values are the same.

Note that if we did not have the axiom that d(a)\leq d(ab), then there would be no reason to believe that if a and b are associates, then d(a)=d(b). Hence, ideals could then potentially be generated  by elements whose d-values would not be the lowest in the ideal, with the restriction that all those elements would be associates of the lowest d-value element.

A summary of the important points is:

1. Associates have the same d-value.

2. An element a generates an ideal iff it has the lowest d-value in the ideal.

3. All associates of the lowest d-value element in an ideal generate the same ideal.

4. If we did not have the axiom d(a)\leq d(ab), then point 1 would not be true, point 2 would not be true (a generator of an ideal wouldn’t have to have the lowest d-value), but point 3 would still be true.


There’s a couple of things I’d like to add here.

Why is it that a prime element should be such that its factorization does not contain a unit element? Generally, when we think about prime numbers in positive integers, we imagine a number which is absolutely not factorizable except in the form 1.p (p being the prime number). A sense of unbreakability is felt. Here, the same prime element p can be broken in at least n ways, where n is the number of unit elements in the Euclidean ring R. The sense of absolute unbreakability is lost. I suppose the reason for this is that the concept of ‘unit’ is just an extension of 1 in natural numbers. As factorization of primes of the form 1.p are not counted when dealing with natural numbers, factorizations of the form u.p_1 shouldn’t count in R, where p_1 is an associate of p.

Also note that the addition of deletion of axioms would have greatly changed the structure of Euclidean rings. For example, deleting the axiom d(ab)\geq d(a),d(b) would allow infinite prime factorizations of elements, and adding the axiom d(a+b)<d(ab) would further alter the structure of R. One should not forget that the properties of the elements of R are a result of these defining axioms, and the addition and deletion of such would cause substantial alterations. It is just the fact that Euclidean rings mimic many properties of natural numbers that we find them important to study.

Integral domains and characteristics

Today we shall talk about the characteristic of an integral domain, concentrating mainly on misconceptions and important points.

An integral domain is a commutative ring with the property that if a\neq 0 and b\neq 0, then ab\neq 0. Hence, if ab=0, then a=0 or b=0 (or both).

The characteristic of an integral domain is the lowest positive integer c such that \underbrace{1+1+\dots +1}_{\text{ c times}}=0.

Let a\in R. Then \underbrace{a+a+\dots +a}_{\text{ c times}}=a\underbrace{(1+1+\dots +1)}_{\text{ c times}}=0. This is because a.0=0.

If \underbrace{a+a+\dots +a}_{\text{ d<c times}}=0, then we have a\underbrace{(1+1+\dots +1)}_{\text{ d times}}=0. This is obvious for a=0. If a\neq 0, then this implies \underbrace{1+1+\dots +1}_{\text{ d times}}=0, which contradicts the fact that c is the lowest positive integer such that 1 added c times to itself is equal to 0. Hence, if c is the characteristic of the integral domain D, then it is the lowest positive integer such that any non-zero member of D, added c times to itself, gives 0. No member of D can be added a lower number of times to itself to give 0.

Sometimes \underbrace{a+a+\dots +a}_{\text{ c times}} is written as ca. One should remember that this has nothing to do the multiplication operator in the ring. In other words, this does not imply that \underbrace{a+a+\dots +a}_{\text{ c times}}=c.a, where c is a member of the domain. In fact, c does NOT have to be a member of the domain. It is just an arbitrary positive integer.

Now on to an important point: something that is not emphasized, but should be. Any expression of the form

\underbrace{\underbrace{a+a+\dots +a}_{\text{m times}}+\underbrace{a+a+\dots +a}_{\text{m times}}+\dots +\underbrace{a+a+\dots +a}_{\text{m times}}}_{\text{n times}}=\underbrace{(a+a+\dots +a)}_{\text{m times}}(\underbrace{1+1+\dots +1}_{\text{n times}}).

Now use this knowledge to prove that the characteristic of an integral domain, if finite, has to be 0 or prime.

Ordinals- just what exactly are they?!

If ordinals have not confused you, you haven’t really made a serious attempt to understand them.

Let me illustrate this. If I have 5 fruits (all different) and 5 plates (all different), then I can bijectively map the fruits to plates. However, I arrange the fruits or plates, I can still bijectively map them.

Let’s suppose I have a set finite set A, and don’t know its cardinality. But I know hat it bijectively maps to B. This directly implies that however I arrange A or B, they will still bijectivey map to each other.

This intuition fails for infinte sets. In fact weird things start happening for infinite sets. Natural numbers can be bijectively mapped to rational numbers. What?!! Isn’t the set of rational numbers a superset of natural numbers?! Yes. Then how can there be a bijection between them? Bijection implies both sets contain the same number of elements.

No. That the cardinality should be the same is not part of the definition of bijection. Bijection is defined as an injective and surjective mapping between two sets. It is just that same cardinality is implied through bijection in the case of finite sets. For rational numbers, by cantor’s diagonalization argument, for every number, we cam find a unique pre-image amongst the natural numbers. Hence, we have a bijection.

Coming back to ordinals, let \omega denote the ordered set of natural numbers. Does \Bbb{N} bijectively map to \Bbb{N}\cup \{\pi\}? Not if you use the mapping f(n)=n. However, if you map f(1)=\pi and f(n)=n-1, then you’re done. Note the fact that if two infinite sets are bijective, that does not imply that every one-to-one mapping will be surjective. It just means that there exists *one* such mapping. This is in direct contrast with the case of finite sets, in which every injective mapping between two bijective sets is surjective.

What if you map \omega to \omega+1? Note that as \omega is ordered, this implies there is only *one* mapping we’re allowed to have: f(n)=n. We can clearly see \Bbb{N} can’t be bijectively mapped to \omega+1.

Where most mathematical texts fail is actually explaining these finer points to students. Most of them just regurgitate the material present in “classics” of that subject. The most important thing to note here is that if we have two infinite sets which are not ordered, then there *might* be some bijective mapping between them, and finding one can be tricky sometimes. For example, Cantor’s diagonal mapping is brilliant, and non-trivial. Hence for non-ordinal infinite sets, we can’t be sure if they’re bijective with \Bbb{N}. However, in the case of ordinals, as there is only one mapping, determining whether the set is bijective with respect to \Bbb{N} is trivial.


Why substitution works in indefinite integration

Let’s integrate \int{\frac{dx}{\sqrt{1-x^2}}} . We know the trick: substitute x for \sin\theta. We get dx=\cos\theta d\theta. Substituting into the original equation, we get \int{\frac{\cos\theta d\theta}{\sqrt{1-\sin^2\theta}}}=\int{\frac{\cos\theta d\theta}{|\cos\theta|}}. Let us assume \cos\theta remains positive throughout the interval under consideration. Then we get the integral as \theta or \arcsin x.

I have performed similar operations for close to five years of my life now. But I was never, ever, quite convinced with it. How can you, just like that, substitute dx for \cos\theta d\theta? My teacher once told me this: \frac{dx}{d\theta}=\cos\theta. Multiplying by d\theta on both sides, we get dx=d\theta. What?!! It doesn’t work like that!!

It was a year back that I finally derived why this ‘ruse’ works.

Take the function x^2. If you differentiate this with respect to x, you get 2x. If you integrate 2x, you get x^2+c. Simple.

Now take the function \sin^2\theta. Differentiate it with respect to \theta. You get 2\sin\theta.\cos\theta. If you integrate 2\sin\theta.\cos\theta, you get \sin^2\theta+c.

The thing to notice is when you integrate the two functions- 2x and 2\sin\theta.\cos\theta, you want a function of the form y^2. However and whatever I integrate, I ultimately want a function of the form y^2, so that I can substitute x for y to get x^2.

In the original situation, let us imagine there’s a function f(x)=\int{\frac{dx}{\sqrt{1-x^2}}}. We’ll discuss the properties of f(x). If we were to make the substitution x=\sin\theta in f(x) and differentiate it with respect to \theta, we’d get a function of the form \frac{1}{\sqrt{1-y^2}}\cos\theta, where y is \sin\theta. There are two things to note here:

1. The form of the derivative if f(x) wrt \theta is the same as that of f'(x), which is \frac{1}{\sqrt{1-y^2}}, multiplied by \cos\theta, or derivative of \sin\theta wrt \theta.

2. When any function is differentiated with respect to any variable, integration wrt the same variabe gives us back the same function. Hence, \int{\frac{\partial f}{\partial x}dx}=\int{\frac{\partial f}{\partial \theta}d\theta}

Coming back to \int{\frac{dx}{\sqrt{1-x^2}}}, let us assume its integral is f(x). It’s derivative on substituting x=\cos\theta and differentiating wrt \theta is of the same form as \frac{\partial f}{\partial x} multiplied by \cos\theta. This is a result of the chain rule of differentiation. Now following rule 2, we know \int{\frac{dx}{\sqrt{1-x^2}}}=\int{\frac{\cos\theta d\theta}{\sqrt{1-\sin^2\theta}}}.

How is making the substitution x=\sin\theta justified? Could we have made any other continuous substitution, like x=\theta^2 +\tan\theta^3? Let us assume we substitute x for g(\theta). We want g(\theta) to take all the values x can take. This is the condition that must be satisfied by any substitution. For values that g(\theta) takes by x doesn’t, we restrict the range of g(\theta) to that of x. Note that the shapes of f(x) as plotted against x and f(\sin\theta) as plotted against \theta will be different. But that is irrelevant as long as we can write the same cartesian pairs (m,n) for any variable, where m is the x-coordinate and n is the y-coordinate.

Summing the argument, we predict the form the derivative of f(x) will take when the substitution x=\sin\theta is made, and then integrate this new form wrt \theta to get the original function. This is why the ‘trick’ works.

Fermat’s Last Theorem

When in high school, spurred by Mr. Scheelbeek’s end-of-term inspirational lecture on Fermat’s Last Theorem, I tried proving the same for…about one and a half long years!
For documentation purposes, I’m attaching my proof. Feel free to outline the flaws in the comments section.

Let us assume FLT is true. i.e. x^n + y^n =z^n. We know x^n + y^n<(x+y)^n (n is assumed to be greater than one here). Hence, z<x+y. Moreover, we know z^n-x^n<(z+x)^n. Hence, y<z+x. Similarly, y+z<x.

So we have the three inequalities: x+y<z, x+z<y, and y+z<x.

x,y,z satisfy the triangle inequalities! Hence, x,y,z form a triangle.

Using the cosine rule, we get z^2=x^2 +y^2 -2xy\cos C, where C is the angle opposite side z.

Raising both sides to the power \frac{n}{2}, we get z^n=(x^2 +y^2 -2xy\cos C)^{\frac{n}{2}}. Now if n=2 and c=\frac{\pi}{2}, we get z^2=x^2+y^2. This is the case of the right-angled triangle.

However, if n\geq 3, then the right hand side, which is (x^2 +y^2 -2xy\cos C)^{\frac{n}{2}}, is unlikely to simplify to x^n + y^n.

There are multiple flaws in this argument. Coming to terms with them was a huge learning experience.

Binomial probability distribution

What exactly is binomial distribution?

Q. A manufacturing process is estimated to produce 5\% nonconforming items. If a random sample of the five items is chosen, find the probability of getting two nonconforming items.

Now one could say let there be 100 items. Then the required probability woud be \frac{{5\choose 2}{95\choose 3}}{{100\choose 5}} . In what order the items are chosen is irrelevant. This roughly comes out to be 0.18, while the answer is 0.22. Where did we go wrong?

Why should we assume there are 100 items in total? Let us assume n\to\infty, as we determine \frac{{.05n\choose 2}{.95n\choose 3}}{{n\choose 5}} . What if 0.95 n and 0.05n are not integers? We use the gamma function.

We get \frac{{.05n\choose 2}{.95n\choose 3}}{{n\choose 5}}=\frac{\int_{0}^{\infty}{t^{0.05n}e^{-t} dt}.\int_{0}^{\infty}{t^{0.95n}e^{-t} dt}}{{n\choose 5}}

My textbook says this tends to {5\choose 2}(0.05)^2 (0.95)^2. This is something you could verify for yourself.

Another question. Say you roll a die 5 times. Find the probability of getting two 6s. The probability as determined by combinatorics is \frac{{5\choose 2}5^3}{6^5} . You must have applied the binomial theorem before in such problems. You know the answer to be {5\choose 2}(\frac{1}{6})^2 (\frac{5}{6})^3 . This matches with the answer determined before. So why is it that we’re right here in determining the probability accurately, while we were not before?

Binomial probability corroborates with elementary probability where separate arrangements of selected items are counted as distinct arrangements, and where the total number of items is known and not just guessed at. When the total number of items is not known and only percentages (percentage of success) is known, then binomial probability is an approximation arrived at by assuming n approaches infinity.

Continuous linear operators are bounded.: decoding the proof, and how the mathematician chances upon it

Here we try to prove that a linear operator, if continuous, is bounded.

Continuity implies: for any \epsilon>0, \|Tx-Tx_0\|<\epsilon for \|x-x_0\|<\delta

We want the following result: \frac{\|Ty\|}{\|y\|}\leq c , where c is a constant, and y is any vector in X.

What constants can be construed from \epsilon and \delta, knowing that they are prone to change? As T is a linear operator, \frac{\epsilon}{\delta} is constant. We need to use this knowledge.

We want \frac{\|Ty\|}{\|y\|}\leq \frac{\epsilon}{\delta} , or \delta\frac{\|Ty\|}{\|y\|}\leq {\epsilon} .

We have \|Tx-Tx_0\|=\|T(x-x_0)\|<\epsilon.

Hence, x-x_0=\delta.\frac{y}{\|y\|} .

\|T(\delta.\frac{y}{\|y\|})\|=\frac{\delta}{\|y\|}\|Ty\| .

We have just deconstructed the proof given on pg.97of Kreyszig’s book on Functional Analysis. The substitution x-x_0=\delta.\frac{y}{\|y\|} did not just occur by magic to him. It was the result of thorough analysis. And probaby such investigation.

But hey! Let’s investigate this. \frac{\delta}{\epsilon} is also constant! Let us assume \epsilon\frac{\|Ty\|}{\|y\|}\leq \delta . Multiplying on both sides by \frac{\epsilon}{\delta} , we get \frac{\epsilon^2}{\delta}\frac{\|Ty\|}{\|y\|}\leq \epsilon . This shows x-x_0=\frac{\epsilon^2}{\delta}\frac{y}{\|y\|} . Does this substitution also prove boundedness?

We have to show \|x-x_0\|<\delta . \frac{\epsilon^2}{\delta}<\delta only if \epsilon<\delta . Hence, this is conditionally true.

Similar investigations taking (\frac{\epsilon}{\delta})^n to be constant can also be conducted.

Linear operators mapping finite dimensional vector spaces are bounded,

Theorem: Every linear operator T:V\to W, where V is finite dimensional, is bounded.

Proof \frac{\|Tx\|}{\|x\|}=\frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{\|a_1e_1+a_1e_2+\dots+a_ne_n\|}\leq \frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{c(|a_1|+|a_2|+\dots+|a_n|)}\leq \frac{\|T(e_i)\|}{c}

where \|T(e_i)\|=\max\{\|T(e_1)\|,\|T(e_2)\|,\dots\}.

What we learn from here is

\|e_i\|(|a_1|+|a_2|+\dots+|a_n|)\geq\|a_1e_1+a_1e_2+\dots+a_ne_n\|\geq c(|a_1|+|a_2|+\dots+|a_n|)




\|e_i\|(|a_1|+|a_2|+\dots+|a_n|)\geq\|a_1e_1+a_1e_2+\dots+a_ne_n\|\geq \|e_k\|(|a_1|+|a_2|+\dots+|a_n|)



Another proof of the assertion is

\frac{\|Tx\|}{\|x\|}=\frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{\|a_1e_1+a_1e_2+\dots+a_ne_n\|}\leq \frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{\|e_k\|(|a_1|+|a_2|+\dots+|a_n|)}\leq \frac{\|T(e_i)\|}{\|e_k\|}

which is a constant.

Note: why does this not work in infinite dimensional spaces? Because the max and min of \|e_r\| and \|Te_r\| might not exist.

Riesz’s lemma decoded

This is a rant on Riesz’s lemma.

Riesz’s lemma- Let there be a vector space Z and a closed proper subspace Y\subset Z. Then \forall y\in Y, there exists a z\in Z such that |z-y|\geq \theta, where \theta\in (0,1), and |z|=1.

A proof is commonly available. What we will discuss here is the thought behind the proof.

For any random z\in Z\setminus Y and y\in Y, write \|z-y\|. Let a_{y\in Y}=\inf\|z-y\|. Then \|z-y\|\geq a. Also, there exists a y_0\in Y such that \|z-y_0\|\leq\frac{a}{\theta}. Then \left\|\frac{z}{\|z-y_0\|}-\frac{y}{\|z-y_0\|}\right\|\geq\theta. Because the vector space Z is closed under scalar multiplication, we have effectively proved \|z-y\|\geq\theta for any \theta\in (0,1) and y\in Y.

If there is some other vector v such that \|v-v_0\|\leq\frac{a}{\theta}, then \|\frac{z}{\|v-v_0\|}-\frac{y}{\|v-v_0\|}\|\geq\theta.

Hence, one part of Riesz’s lemma, that of exceeding \theta is satisfied by every vector z\in Z\setminus Y. The thoughts to take away from this is dividing by \theta or a number less than 1 increases everything, even a small increase from the infimum exceeds terms of a sequence converging to the infimum, and every arbitrary term in the sequence is greater than the infimum. When we say \theta can be any number in the interval (0,1), we know we’re skirting with boundaries. We could aso have thought of a proof in this direction: let b=\sup_{y\in Y} \|z-y\|. Then b\theta\leq\|z-y_0\|\leq b. However, for an arbitrary y\in Y, \left\|\frac{z}{\|z-y_0\|}-\frac{y}{\|z-y_0\|}\right\|\leq\frac{1}{\theta}.

Hence, for every \theta\in (0,1), \theta\leq \|z-y\|\leq\frac{1}{\theta}.

Now what about \|z\|=1? This condition is satisfied only when z=z-y_0 in the expression \left\|\frac{z}{\|z-y_0\|}-\frac{y}{\|z-y_0\|}\right\|\geq\theta.

Hence, over in all, for every vector z\in Z-Y, there are infinite vectors which satisfy the condition of Riesz’s lemma. Also, for every such z, there is AT LEAST one unit vector which satisfies Riesz’s lemma (there can be more than one).¬†Hence, to think there can be only one unit vector in Z-Y which satisfies Riesz’s lemma would be erroneous.

Completing metric spaces

If you’ve read the proof of the “completion of a metric space”, then you surely must have asked yourself “WHY?”! Say we have an incomplete metric space X. Why can’t we just complete X by including the limit points of all its cauchy sequences?!

No. We can’t. The limit points of cauchy sequences may not be determinable.

The new space \overline{X} that we create, is it just X\cup \{\text{limit points of cauchy sequences in X}\}? No. It is a completely different space.

So what exactly is \overline{X}? \overline{X} is a space with a new bunch of points: equivalence classes of cauchy sequences in X such that \{a\}\sim\{b\} iff \lim\limits_{n\to\infty}d(a_n,b_n)=0.

If you read the proof, you’ll realise it does a bunch of random crap to prove \overline{X} is complete. WHY?? Couldn’t it have been simpler with less dense sets and the like?

Let’s create a cauchy sequence of the equivalence classes. How do we know that the limit point of this sequence exists? We’re stuck here. One wouldn’t know how to proceed.

On a more important note, we just have a bunch of equivalence classes whose limits we do not know. We have no idea how they behave with respect to each other. Should we have equivalence classes whose limit points we do know, then we’ll have some perspective on the structure of the space and what the limit point of the cauchy sequence is. We might not even know the terms of some such equivalence classes. How’re we supposed to analyze things we have absolutely no idea about?

Some information is better than no information. If we could find out the limit points of all such equivalence classes (or terms of the cauchy sequence, in this case), we could think of doing something productive. But we can’t determine the limit points. So what now?

Consider all equivalence classes of cauchy sequences which converge to points in the space X. This set is dense in \overline{X} (this is easy to prove).

A fundamental concept is this: Let us take a cauchy sequence \{a_1,a_2,a_3,\dots\}, and another cauchy sequence \{b_1,b_2,b_3,\dots\} which converges to a_N. Then \lim\limits_{n\to\infty}d(a_n,b_n)=\epsilon, where \epsilon is a fixed number. As N increases, the cauchy sequence \{b_i\} converges to \{a_i\}. Hence, we extrapolate from the concept of convergence of points to convergence of converging sequences. Can we think about the convergence of converging sequences in any other way? Something to think about. But this is definitely a useful concept to remember. Note that the limit point of \{a_i\} may not even be known.

So how is this concept relevant to the proof? We’ve associated with the original cauchy sequence \{x_i\} another cauchy sequence \{b_i\} with limit points in the space, as mentioned before. The association is such that \lim b_i=\lim x_i. Now the masterstroke- we map each sequence to the limit in the original space X: we map \{y_i\} converging to l, to the point l in X. Isn’t that a lot of potentially useless mapping? No. This is explained below.

What do we have here? We have a cauchy sequence \{l_1,l_2,l_3,\dots\}. This may or may not have a limit, which is inconsequential to the proof. Now let us take the cauchy sequence \{t^i\} converging to l_i. We know from before that \lim\limits_{n\to\infty}d(l_n,t_n)=0. Now let us take equivalence classes of the sequence \{l_1,l_2,\dots\}, and the sequences \{t_i\}. The cauchy sequence of equivalence classes of \{t_i\} will obviously converge to the equivalence class of l_i. As a result, the original \{x_i\} also converges to the equivalence class of \{l_i\}. We had associated \{x_i\} just so that we could get sequences converging to the terms of \{l_1,_2,\dots\}.

What is the point of creating these equivalence classes? Couldn’t we have formed a complete metric space in some other way? Thinking about cauchy sequences, something that immediately pops into mind is cauchy sequences of cauchy sequences. Cauchy sequences of what else can be formed? Cauchy sequences of squares of points? Will that space really be complete? Maybe there are other possibilites to form a complete metric space as derived from X, but this one is one that easily pops into mind after one gets comfortable with the concept of the cauchy sequence \{l_1,l_2,\dots\} and the sequences \{t_i\} converging to l_i. Whether metric spaces can be completed in other ways is something you and I should think about.