4 out of 5 dentists recommend this WordPress.com site

## Month: September, 2013

### IIT JEE, Centre of Mass, and mugging

During the time I was preparing for IIT-JEE, I was confused about very many things. I spent a lot of time trying to unravel concepts rather than solve problems and memorise formulae. Subsequently, I screwed up my entrance exams, and got into a rather well-known institute purely on the basis of the English/General Aptitude section.

One concept which was a major pain in the posterior was “centre of mass”. When you throw a body up in the air, say a rod, then it will rotate about a special point, which will lie at the geometrical centre in the case of a uniform rod. More importantly, we could treat the centre of mass of an object as containing the entire mass of the body.

What?!! So if you cut out the geometrical centre and weight the rest of the rod, will it not weigh anything??

“No”, the teacher said. “This is just for calculation purposes.” And without any further explanations, he’d stomp off towards his overpriced car.

I tried to interpret the centre of mass as some “special point” that is ‘gifted’ the mass of the body when the body is in motion. Dissatisfied, I finally construed an explanation last year about why the concept of the centre of mass is so useful. To the best of my knowledge, I haven’t read this explanation in any book or website. A part of this explanation, in a rather inaccessible form, can be found in Resnick-Halliday.

Take a solid body, and let external force $\sum{\overline{F}_{ext}}$ act on it. Take any particle $p_i$. This particle exerts forces on other particles, and other particles exert forces on it. The resultant force on $p_i$, due to Newton’s Second Axiom, is $m_i\overline{a}_i$. Let us determine $\sum\limits_{i=1}^n{m_i\overline{a}_i}$. This is equal to $\sum{\overline{F}_{ext}}+\sum\limits_{i=1}^n{\sum\limits_{j=1,j\neq i}^n{\overline{F}_{ij}}}$, where $\overline{F}_{ij}$ is the force $p_i$ applies on $p_j$. By Newton’s Third Law, $\sum\limits_{i=1}^n{\sum\limits_{j=1,j\neq i}^n{\overline{F}_{ij}}}=0$. Hence, we have $\sum{\overline{F}_{ext}}=\sum\limits_{i=1}^n{m_i\overline{a}_i}$.

Now what? We want to find a point such that under the given circumstances, it has the same acceleration as a body of mass $\sum\limits_{i=1}^n{m_i}$ would have when $\sum{\overline{F}_{ext}}$ is applied on it. There are two things to note here:

1. We don’t KNOW the point yet. We want to DETERMINE this point.
2. This seems like an arbitrary condition. We could also have wanted to determine the point whose acceleration is $(\sum{\overline{F}_{ext}}/\sum\limits_{i=1}^n{m_i})^2$. However, it is not that arbitrary. We will always be able to calculate the acceleration of this particle based on external forces, which are easily determined, in however complex situations. And although we’ll also be able to easily determine the acceleration of the point whose acceleration is $(\sum{\overline{F}_{ext}}/\sum\limits_{i=1}^n{m_i})^2$, the former condition is more relevant to Newtonian mechanics.

Let the particle be called $cm$. We were saying $\overline{a}_{cm}=\sum{\overline{F}_{ext}}/\sum\limits_{i=1}^n{m_i}$. This is equivalent to saying $\overline{a}_{cm}=\sum\limits_{i=1}^n{m_i\overline{a}_{i}}/\sum\limits_{i=1}^n{m_i}$. On integrating twice and applying boundary conditions, we get $\overline{r}_{cm}=\sum\limits_{i=1}^n{m_i\overline{r}_{i}}/\sum\limits_{i=1}^n{m_i}$.

So this is the formula for locating the point whose acceleration depends on the external forces and the total mass of the body. So how does the whole mass of the body get concentrated at this point? It doesn’t. The mass of this particle remains the same!! $\Delta m$, if you like. It is just that $\overline{a}_{cm}=\sum\limits_{i=1}^n{m_i\overline{a}_{i}}/\sum\limits_{i=1}^n{m_i}$ is equivalent to saying $(\sum\limits_{i=1}^n{m_i}).\overline{a}_{cm}=\sum\limits_{i=1}^n{m_i\overline{a}_{i}}=\sum{\overline{F}_{ext}}$. Remember this. The mass of the whole body does NOT get magically transferred to this point.

Similarly,one may determine points such that their acceleration is totally dependant on external forces in the following fashion: $(\sum{\overline{F}_{ext}})^3/\sqrt{\sum\limits_{i=1}^n{m_i}}$. Or any other combination of this form.

If you too are preparing for IIT JEE and think it is the stupidest shit in the world, you’re not alone. Stop solving numericals! Go look outta the window. Explore.

### Why substitution works in indefinite integration

Let’s integrate $\int{\frac{dx}{\sqrt{1-x^2}}}$. We know the trick: substitute $x$ for $\sin\theta$. We get $dx=\cos\theta d\theta$. Substituting into the original equation, we get $\int{\frac{\cos\theta d\theta}{\sqrt{1-\sin^2\theta}}}=\int{\frac{\cos\theta d\theta}{|\cos\theta|}}$. Let us assume $\cos\theta$ remains positive throughout the interval under consideration. Then we get the integral as $\theta$ or $\arcsin x$.

I have performed similar operations for close to five years of my life now. But I was never, ever, quite convinced with it. How can you, just like that, substitute $dx$ for $\cos\theta d\theta$? My teacher once told me this: $\frac{dx}{d\theta}=\cos\theta$. Multiplying by $d\theta$ on both sides, we get $dx=d\theta$. What?!! It doesn’t work like that!!

It was a year back that I finally derived why this ‘ruse’ works.

Take the function $x^2$. If you differentiate this with respect to $x$, you get $2x$. If you integrate $2x$, you get $x^2+c$. Simple.

Now take the function $\sin^2\theta$. Differentiate it with respect to $\theta$. You get $2\sin\theta.\cos\theta$. If you integrate $2\sin\theta.\cos\theta$, you get $\sin^2\theta+c$.

The thing to notice is when you integrate the two functions- $2x$ and $2\sin\theta.\cos\theta$, you want a function of the form $y^2$. However and whatever I integrate, I ultimately want a function of the form $y^2$, so that I can substitute $x$ for $y$ to get $x^2$.

In the original situation, let us imagine there’s a function $f(x)=\int{\frac{dx}{\sqrt{1-x^2}}}$. We’ll discuss the properties of $f(x)$. If we were to make the substitution $x=\sin\theta$ in $f(x)$ and differentiate it with respect to $\theta$, we’d get a function of the form $\frac{1}{\sqrt{1-y^2}}\cos\theta$, where $y$ is $\sin\theta$. There are two things to note here:

1. The form of the derivative if $f(x)$ wrt $\theta$ is the same as that of $f'(x)$, which is $\frac{1}{\sqrt{1-y^2}}$, multiplied by $\cos\theta$, or derivative of $\sin\theta$ wrt $\theta$.

2. When any function is differentiated with respect to any variable, integration wrt the same variabe gives us back the same function. Hence, $\int{\frac{\partial f}{\partial x}dx}=\int{\frac{\partial f}{\partial \theta}d\theta}$

Coming back to $\int{\frac{dx}{\sqrt{1-x^2}}}$, let us assume its integral is $f(x)$. It’s derivative on substituting $x=\cos\theta$ and differentiating wrt $\theta$ is of the same form as $\frac{\partial f}{\partial x}$ multiplied by $\cos\theta$. This is a result of the chain rule of differentiation. Now following rule 2, we know $\int{\frac{dx}{\sqrt{1-x^2}}}=\int{\frac{\cos\theta d\theta}{\sqrt{1-\sin^2\theta}}}$.

How is making the substitution $x=\sin\theta$ justified? Could we have made any other continuous substitution, like $x=\theta^2 +\tan\theta^3$? Let us assume we substitute $x$ for $g(\theta)$. We want $g(\theta)$ to take all the values $x$ can take. This is the condition that must be satisfied by any substitution. For values that $g(\theta)$ takes by $x$ doesn’t, we restrict the range of $g(\theta)$ to that of $x$. Note that the shapes of $f(x)$ as plotted against $x$ and $f(\sin\theta)$ as plotted against $\theta$ will be different. But that is irrelevant as long as we can write the same cartesian pairs $(m,n)$ for any variable, where $m$ is the x-coordinate and $n$ is the y-coordinate.

Summing the argument, we predict the form the derivative of $f(x)$ will take when the substitution $x=\sin\theta$ is made, and then integrate this new form wrt $\theta$ to get the original function. This is why the ‘trick’ works.

### Fermat’s Last Theorem

When in high school, spurred by Mr. Scheelbeek’s end-of-term inspirational lecture on Fermat’s Last Theorem, I tried proving the same for…about one and a half long years!
For documentation purposes, I’m attaching my proof. Feel free to outline the flaws in the comments section.

Let us assume FLT is true. i.e. $x^n + y^n =z^n$. We know $x^n + y^n<(x+y)^n$ ($n$ is assumed to be greater than one here). Hence, $z. Moreover, we know $z^n-x^n<(z+x)^n$. Hence, $y. Similarly, $y+z.

So we have the three inequalities: $x+y and $y+z.

$x,y,z$ satisfy the triangle inequalities! Hence, $x,y,z$ form a triangle.

Using the cosine rule, we get $z^2=x^2 +y^2 -2xy\cos C$, where $C$ is the angle opposite side $z$.

Raising both sides to the power $\frac{n}{2}$, we get $z^n=(x^2 +y^2 -2xy\cos C)^{\frac{n}{2}}$. Now if $n=2$ and $c=\frac{\pi}{2}$, we get $z^2=x^2+y^2$. This is the case of the right-angled triangle.

However, if $n\geq 3$, then the right hand side, which is $(x^2 +y^2 -2xy\cos C)^{\frac{n}{2}}$, is unlikely to simplify to $x^n + y^n$.

There are multiple flaws in this argument. Coming to terms with them was a huge learning experience.

### Binomial probability distribution

What exactly is binomial distribution?

Q. A manufacturing process is estimated to produce $5\%$ nonconforming items. If a random sample of the five items is chosen, find the probability of getting two nonconforming items.

Now one could say let there be $100$ items. Then the required probability woud be $\frac{{5\choose 2}{95\choose 3}}{{100\choose 5}}$. In what order the items are chosen is irrelevant. This roughly comes out to be $0.18$, while the answer is $0.22$. Where did we go wrong?

Why should we assume there are $100$ items in total? Let us assume $n\to\infty$, as we determine $\frac{{.05n\choose 2}{.95n\choose 3}}{{n\choose 5}}$. What if $0.95 n$ and $0.05n$ are not integers? We use the gamma function.

We get $\frac{{.05n\choose 2}{.95n\choose 3}}{{n\choose 5}}=\frac{\int_{0}^{\infty}{t^{0.05n}e^{-t} dt}.\int_{0}^{\infty}{t^{0.95n}e^{-t} dt}}{{n\choose 5}}$

My textbook says this tends to ${5\choose 2}(0.05)^2 (0.95)^2$. This is something you could verify for yourself.

Another question. Say you roll a die 5 times. Find the probability of getting two $6$s. The probability as determined by combinatorics is $\frac{{5\choose 2}5^3}{6^5}$. You must have applied the binomial theorem before in such problems. You know the answer to be ${5\choose 2}(\frac{1}{6})^2 (\frac{5}{6})^3$. This matches with the answer determined before. So why is it that we’re right here in determining the probability accurately, while we were not before?

Binomial probability corroborates with elementary probability where separate arrangements of selected items are counted as distinct arrangements, and where the total number of items is known and not just guessed at. When the total number of items is not known and only percentages (percentage of success) is known, then binomial probability is an approximation arrived at by assuming $n$ approaches infinity.

### Continuous linear operators are bounded.: decoding the proof, and how the mathematician chances upon it

Here we try to prove that a linear operator, if continuous, is bounded.

Continuity implies: for any $\epsilon>0, \|Tx-Tx_0\|<\epsilon$ for $\|x-x_0\|<\delta$

We want the following result: $\frac{\|Ty\|}{\|y\|}\leq c$, where $c$ is a constant, and $y$ is any vector in $X$.

What constants can be construed from $\epsilon$ and $\delta$, knowing that they are prone to change? As $T$ is a linear operator, $\frac{\epsilon}{\delta}$ is constant. We need to use this knowledge.

We want $\frac{\|Ty\|}{\|y\|}\leq \frac{\epsilon}{\delta}$, or $\delta\frac{\|Ty\|}{\|y\|}\leq {\epsilon}$.

We have $\|Tx-Tx_0\|=\|T(x-x_0)\|<\epsilon$.

Hence, $x-x_0=\delta.\frac{y}{\|y\|}$.

$\|T(\delta.\frac{y}{\|y\|})\|=\frac{\delta}{\|y\|}\|Ty\|$.

We have just deconstructed the proof given on pg.97of Kreyszig’s book on Functional Analysis. The substitution $x-x_0=\delta.\frac{y}{\|y\|}$ did not just occur by magic to him. It was the result of thorough analysis. And probaby such investigation.

But hey! Let’s investigate this. $\frac{\delta}{\epsilon}$ is also constant! Let us assume $\epsilon\frac{\|Ty\|}{\|y\|}\leq \delta$. Multiplying on both sides by $\frac{\epsilon}{\delta}$, we get $\frac{\epsilon^2}{\delta}\frac{\|Ty\|}{\|y\|}\leq \epsilon$. This shows $x-x_0=\frac{\epsilon^2}{\delta}\frac{y}{\|y\|}$. Does this substitution also prove boundedness?

We have to show $\|x-x_0\|<\delta$. $\frac{\epsilon^2}{\delta}<\delta$ only if $\epsilon<\delta$. Hence, this is conditionally true.

Similar investigations taking $(\frac{\epsilon}{\delta})^n$ to be constant can also be conducted.

### Linear operators mapping finite dimensional vector spaces are bounded,

Theorem: Every linear operator $T:V\to W$, where $V$ is finite dimensional, is bounded.

Proof $\frac{\|Tx\|}{\|x\|}=\frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{\|a_1e_1+a_1e_2+\dots+a_ne_n\|}\leq \frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{c(|a_1|+|a_2|+\dots+|a_n|)}\leq \frac{\|T(e_i)\|}{c}$

where $\|T(e_i)\|=\max\{\|T(e_1)\|,\|T(e_2)\|,\dots\}$.

What we learn from here is

$\|e_i\|(|a_1|+|a_2|+\dots+|a_n|)\geq\|a_1e_1+a_1e_2+\dots+a_ne_n\|\geq c(|a_1|+|a_2|+\dots+|a_n|)$

where

$\|e_i\|=\max\{\|e_1\|,\|e_2\|,\dots,\|e_n\|\}$.

Similarly,

$\|e_i\|(|a_1|+|a_2|+\dots+|a_n|)\geq\|a_1e_1+a_1e_2+\dots+a_ne_n\|\geq \|e_k\|(|a_1|+|a_2|+\dots+|a_n|)$

where

$\|e_k\|=\min\{\|e_1\|,\|e_2\|,\dots,\|e_n\|\}$

Another proof of the assertion is

$\frac{\|Tx\|}{\|x\|}=\frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{\|a_1e_1+a_1e_2+\dots+a_ne_n\|}\leq \frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{\|e_k\|(|a_1|+|a_2|+\dots+|a_n|)}\leq \frac{\|T(e_i)\|}{\|e_k\|}$

which is a constant.

Note: why does this not work in infinite dimensional spaces? Because the max and min of $\|e_r\|$ and $\|Te_r\|$ might not exist.

### Riesz’s lemma decoded

This is a rant on Riesz’s lemma.

Riesz’s lemma- Let there be a vector space $Z$ and a closed proper subspace $Y\subset Z$. Then $\forall y\in Y$, there exists a $z\in Z$ such that $|z-y|\geq \theta$, where $\theta\in (0,1)$, and $|z|=1$.

A proof is commonly available. What we will discuss here is the thought behind the proof.

For any random $z\in Z\setminus Y$ and $y\in Y$, write $\|z-y\|$. Let $a_{y\in Y}=\inf\|z-y\|$. Then $\|z-y\|\geq a$. Also, there exists a $y_0\in Y$ such that $\|z-y_0\|\leq\frac{a}{\theta}$. Then $\left\|\frac{z}{\|z-y_0\|}-\frac{y}{\|z-y_0\|}\right\|\geq\theta$. Because the vector space $Z$ is closed under scalar multiplication, we have effectively proved $\|z-y\|\geq\theta$ for any $\theta\in (0,1)$ and $y\in Y$.

If there is some other vector $v$ such that $\|v-v_0\|\leq\frac{a}{\theta}$, then $\|\frac{z}{\|v-v_0\|}-\frac{y}{\|v-v_0\|}\|\geq\theta$.

Hence, one part of Riesz’s lemma, that of exceeding $\theta$ is satisfied by every vector $z\in Z\setminus Y$. The thoughts to take away from this is dividing by $\theta$ or a number less than $1$ increases everything, even a small increase from the infimum exceeds terms of a sequence converging to the infimum, and every arbitrary term in the sequence is greater than the infimum. When we say $\theta$ can be any number in the interval $(0,1)$, we know we’re skirting with boundaries. We could aso have thought of a proof in this direction: let $b=\sup_{y\in Y} \|z-y\|$. Then $b\theta\leq\|z-y_0\|\leq b$. However, for an arbitrary $y\in Y$, $\left\|\frac{z}{\|z-y_0\|}-\frac{y}{\|z-y_0\|}\right\|\leq\frac{1}{\theta}$.

Hence, for every $\theta\in (0,1)$, $\theta\leq \|z-y\|\leq\frac{1}{\theta}$.

Now what about $\|z\|=1$? This condition is satisfied only when $z=z-y_0$ in the expression $\left\|\frac{z}{\|z-y_0\|}-\frac{y}{\|z-y_0\|}\right\|\geq\theta$.

Hence, over in all, for every vector $z\in Z-Y$, there are infinite vectors which satisfy the condition of Riesz’s lemma. Also, for every such $z$, there is AT LEAST one unit vector which satisfies Riesz’s lemma (there can be more than one). Hence, to think there can be only one unit vector in $Z-Y$ which satisfies Riesz’s lemma would be erroneous.

### Completing metric spaces

If you’ve read the proof of the “completion of a metric space”, then you surely must have asked yourself “WHY?”! Say we have an incomplete metric space $X$. Why can’t we just complete $X$ by including the limit points of all its cauchy sequences?!

No. We can’t. The limit points of cauchy sequences may not be determinable.

The new space $\overline{X}$ that we create, is it just $X\cup \{\text{limit points of cauchy sequences in X}\}$? No. It is a completely different space.

So what exactly is $\overline{X}$? $\overline{X}$ is a space with a new bunch of points: equivalence classes of cauchy sequences in $X$ such that $\{a\}\sim\{b\}$ iff $\lim\limits_{n\to\infty}d(a_n,b_n)=0$.

If you read the proof, you’ll realise it does a bunch of random crap to prove $\overline{X}$ is complete. WHY?? Couldn’t it have been simpler with less dense sets and the like?

Let’s create a cauchy sequence of the equivalence classes. How do we know that the limit point of this sequence exists? We’re stuck here. One wouldn’t know how to proceed.

On a more important note, we just have a bunch of equivalence classes whose limits we do not know. We have no idea how they behave with respect to each other. Should we have equivalence classes whose limit points we do know, then we’ll have some perspective on the structure of the space and what the limit point of the cauchy sequence is. We might not even know the terms of some such equivalence classes. How’re we supposed to analyze things we have absolutely no idea about?

Some information is better than no information. If we could find out the limit points of all such equivalence classes (or terms of the cauchy sequence, in this case), we could think of doing something productive. But we can’t determine the limit points. So what now?

Consider all equivalence classes of cauchy sequences which converge to points in the space $X$. This set is dense in $\overline{X}$ (this is easy to prove).

A fundamental concept is this: Let us take a cauchy sequence $\{a_1,a_2,a_3,\dots\}$, and another cauchy sequence $\{b_1,b_2,b_3,\dots\}$ which converges to $a_N$. Then $\lim\limits_{n\to\infty}d(a_n,b_n)=\epsilon$, where $\epsilon$ is a fixed number. As $N$ increases, the cauchy sequence $\{b_i\}$ converges to $\{a_i\}$. Hence, we extrapolate from the concept of convergence of points to convergence of converging sequences. Can we think about the convergence of converging sequences in any other way? Something to think about. But this is definitely a useful concept to remember. Note that the limit point of $\{a_i\}$ may not even be known.

So how is this concept relevant to the proof? We’ve associated with the original cauchy sequence $\{x_i\}$ another cauchy sequence $\{b_i\}$ with limit points in the space, as mentioned before. The association is such that $\lim b_i=\lim x_i$. Now the masterstroke- we map each sequence to the limit in the original space $X$: we map $\{y_i\}$ converging to $l$, to the point $l$ in $X$. Isn’t that a lot of potentially useless mapping? No. This is explained below.

What do we have here? We have a cauchy sequence $\{l_1,l_2,l_3,\dots\}$. This may or may not have a limit, which is inconsequential to the proof. Now let us take the cauchy sequence $\{t^i\}$ converging to $l_i$. We know from before that $\lim\limits_{n\to\infty}d(l_n,t_n)=0$. Now let us take equivalence classes of the sequence $\{l_1,l_2,\dots\}$, and the sequences $\{t_i\}$. The cauchy sequence of equivalence classes of $\{t_i\}$ will obviously converge to the equivalence class of $l_i$. As a result, the original $\{x_i\}$ also converges to the equivalence class of $\{l_i\}$. We had associated $\{x_i\}$ just so that we could get sequences converging to the terms of $\{l_1,_2,\dots\}$.

What is the point of creating these equivalence classes? Couldn’t we have formed a complete metric space in some other way? Thinking about cauchy sequences, something that immediately pops into mind is cauchy sequences of cauchy sequences. Cauchy sequences of what else can be formed? Cauchy sequences of squares of points? Will that space really be complete? Maybe there are other possibilites to form a complete metric space as derived from $X$, but this one is one that easily pops into mind after one gets comfortable with the concept of the cauchy sequence $\{l_1,l_2,\dots\}$ and the sequences $\{t_i\}$ converging to $l_i$. Whether metric spaces can be completed in other ways is something you and I should think about.

### |Groups|

Today we will discuss the proof of $o(ST)=\frac{o(S)o(T)}{o(S\cap T)}$.
Here, $S$ and $T$ are groups. We know $S\cap T\neq\emptyset$, as $e\in S\cap T$.

Let $s_1t_1=s_2t_2$. Then $s_1s_2^{-1}=t_2t_1^{-1}\in S\cap T$. Take any $a\in S\cap T$. For any $s_1,t_1\in S,T$, find $s_2=s_1a^{-1}$ and $t_2=at_1$. Then $s_2t_2=s_1a^{-1}at_1=s_1t_1$. Hence, $|S\cap T|$ pairs of elements $(s_2,t_2$) can be found such that $s_2t_2=s_1t_1$ for any two $s_1,t_1\in S,T$. Hence, we can form equivalence classes which partition $ST$, all with $|S\cap T|$ elements. This shows $o(ST)=\frac{o(S)o(T)}{o(S\cap T)}$.

We can also digress to more complicated situations like $o(ST+W)$, and find similar formulae.

### A new proof of Cauchy’s theorem

We will discuss a more direct proof of Cauchy’s theorem than the one given in Herstein’s “Topics in Algebra” (pg.61).

Statement: If $G$ is an abelian group, and $p|o(G)$, then there is an element $g\in G$ such that $g^{p}=e_G$, and $g\neq e_G$.

We will prove this by induction. Let us assume that in every abelian group $H$ of order $|H|<|G|$, if $p|o(H)\implies \exists h\in H: h^p=e_H$. Let $N$ be a (by default normal) subgroup of $G$. If $p|o(N)$, by the inductin hypothesis, $\exists n\in N: n^p=e_N=e_G$.

Let us now assume $p|o(G)$ but $p\not| o(N)$. This implies $p|\frac{o(G)}{o(N)}\implies p|o\left(\frac{G}{N}\right)$. As $o\left(\frac{G}{N}\right), by the induction hypothesis, $\exists (Nb)\in \frac{G}{N}: (Nb)^{p}=n_1bn_2b\dots n_pb=n_1n_2\dots n_p b^p=N$. This implies $b^p\in N\implies b^{p.o(N)}=e$ ($e_G$ shall be simply be referred to as $e$ from now on). $b^{o(N)}$ is hence that element in $G$ such that when raised to the power $p$, gives $e$.

Now all we have to prove is $b^{o(N)}\neq e$. Given below is my original spin on the proof.

We know $p\not| o(N)$. And as $p$ is prime, $o(N)$ can’t have any common factors with it. Hence $\gcd (p,o(N))=1$. This proves there exist integers such that $a.p+b.o(N)=1$, where $a,b\in\Bbb{Z}$. Also, note that if $b^{p}\in N$, then $(b^{p})^{z}\in N$, for any $z\in \Bbb{Z}$.

Let us now assume $b^{o(N)}=e$. Then $(b^{o(N)})^{b}.(b^{p})^{a}=e.(b^{p})^{a}\in N$. Also note that $(b^{o(N)})^{b}.(b^{p})^{a}=b^{a.p+b.o(N)}=b$. The two statements imply $b\in N$. This contradicts the assumption that $b\notin N$. Now you would ask where was the assumption made?! The answer lies in the fact we said $b^{o(N)}$ is the desired element which is not equal to $e$, such that when raised to $p$ gives $e$. Had $b$ been a part of $N$, then $b^{o(N)}=e$.

There’s an extraordinarily powerful trick I’d like to point out and explain here. When you have statements about $b^a$ and $b^c$, where $\gcd (a,c)=1$, then we can make a statement about $b$ by virtue of the fact $\exists z_1,z_2\in\Bbb{Z}$ such that $z_1.a+z_2.c=1$.

Now we consider the proof of Sylow’s theorem for abelian groups, which runs along similar lines.

The statement is :if $p$ is prime, $p^\alpha|o(G)$ and $p^{\alpha+1}\not|o(G)$, then there is a subgroup of order $p^{\alpha}$ in $G$.

We will again prove by induction. If $p^{i}|o(N)$, where $N$ is a normal subgroup of $G$ and $i\leq \alpha$, then the statement is true. Hence, let $p^{\alpha}|o\left(\frac{G}{N}\right)$. This again makes $\gcd (p^\alpha,o(N))=1$. The rest of the proof is elementary.

Anti-climax: The extension to Sylow’s theorem is incorrect. Please try to determine the flaw yourself

Hint: the induction hypothesis is “for groups $H$ of order smaller than $o(G)$, if $p^{alpha}|o(H)$ and $p^{\alpha+1}\not| o(H)$, then there exists an element $h\in H$ such that $h^{p^\alpha}=e$. Second hint: if $p^{\alpha}\not|o(N)$, then that does not imply $p^{\alpha}|\frac{o(G)}{o(N)}$. Moreover, it is not necessary that $\gcd(o(N),p^\alpha)=1$.