cozilikethinking

4 out of 5 dentists recommend this WordPress.com site

Month: November, 2013

A note on points of intersection

An interesting fact I thought about today. Let us suppose we have to determine the points of intersection of the cartesian equations f(x,y)=c and g(x,y)=d. What we generally do is f(x,y)-c=g(x,y)-d.

Why? Because the points of intersection (a,b) will, on substitution, make both sides equal. On substitution, the values obtained on both sides will be 0.

But does the equation f(x,y)-c=g(x,y)-d only determine the points of intersection of the figures f(x,y)=c and g(x,y)=d? No. It also determines the points of intersection of the figures f(x,y)=x+r and g(x,y)=d+r for every r\in\Bbb{R}.

Shocking, isn’t it? Determining only the points of intersection relevant to r=0 can be done by substitution of the points obtained in the two equations separately.

Directional derivative: Better explained than in Serge Lang’s book

This is an attempt to explain directional derivatives better than how it is explained in Serge Lang’s seminal book “A Second Course in Calculus”.

The directional derivative of f(X) is (grad f(X)).A, where A is a unit vector in the direction that we’re interested in.

Let us suppose we need to find the derivative of function f:\Bbb{R}^n\to \Bbb{R} along the direction X+tR, where points X,R\in \Bbb{R}^n and t\in \Bbb{R} is the parameter. The derivative will obviously be \lim\limits_{\|tR\|\to 0}\frac{f(X+tR)-f(X)}{\|tR\|}.

Lang’s book mentions it as \frac{d f(X(t))}{dt}, where X(t) is obviously X+tR. This is theoretically an inaccurate assertion. But we will now work out why the formula of the directional derivative comes out to be the same.

Let us determine why \frac{d f(X(t))}{dt}=\lim\limits_{t\to 0}f(X+tR)-f(X).

We know that f(X+tR)-f(X)=(D_1(s)\times\|R\|\times t)+\|tR\|g(X,R,t) such that \lim\limits_{t\to 0}g(X,R,t)=0. Here, s\in (X,X+tR) (the Mean Value Theorem has been used here). Note that \lim\limits_{t\to 0}D_1(s)=\frac{\partial f}{\partial X}.

Although the above formula is valid, it is not very helpful, as determining \frac{\partial f}{\partial X} would be difficult. What should we do now?

Let X=(x_1,x_2,x_3,\dots,x_n) and R=(r_1,r_2,r_3,\dots,r_n). Then \lim\limits_{t\to 0}f(X+tR)-f(X)=D_1(X).r_1.t+D_2(X).r_2.t+\dots+D_n(x).r_n.t. Hence, \lim\limits_{t\to 0}\frac{f(X+tR)-f(X)}{t}=( grad f(X)).R. Divide both sides by \|R\| to get a unit vector in the direction of \|R\|. You get the required formula.

Summing up the above argument, if R is the required direction in which you want to determine the slope, \lim\limits_{\|tR\|\to 0}\frac{f(X+tR)-f(X)}{\|tR\|} is indeed what you’re looking for! Your intuition was correct. The world is happy and pink again. For calculational purposes, we use \frac{1}{\|R\|}( grad f(X)).R, as \lim\limits_{\|tR\|\to 0}\frac{f(X+tR)-f(X)}{\|tR\|}=( grad f(X)).R

A note on the gradient of a function.

I want to insert a note on grad f(X) (gradient of f(X)).

1. It is not perpendicular to everything in the surface. Most proofs only go as far as to prove it is perpendicular to continuous parameterized curves. Nothing more. Stop reading too deeply into it.

2. It is mostly useful for finding perpendiculars to all parameterized curves, rather than the parameterized curves themselves. The tangent is an exception, as the normal vector to a straight line can easily be used to find the straight line. For example, if we know that a\overline{i}+b\overline{j} is perpendicular to a straight line passing through point P, we can easily determine the straight line. Non-straight line curves do not in general lend themselves to such determination with knowledge of just a perpendicular vector and a point through which the curve passes.

The chain rule in multi-variable calculus: Generalized

Now we’ll discuss the chain rule for n-nested functions. For example, an n-nested function would be g=f_1(f_2(\dots(f_n(t))\dots). What would \frac{\partial g}{\partial t} be?

We know that

g(t+h)-g(t)=\frac{f_1(f_2(\dots(f_n(t+h))\dots)-f_1(f_2(\dots(f_n(t))\dots)}{f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)}.f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots).

If f_2 is continuous, then

g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)+g_1 such that \lim_{[f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)]\to 0}g_1=0, which is equivalent to saying \lim\limits_{t\to 0}g_1=0.

In turn

f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)=\frac{\partial f_2}{\partial f_3}.f_3(\dots(f_n(t+h))\dots)-f_3(\dots(f_n(t))\dots)+g_2

such that \lim\limits_{t\to 0}g_2=0.

Hence, we have

g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.(\frac{\partial f_2}{\partial f_3}.\left[f_3(\dots(f_n(t+h))\dots)-f_3(\dots(f_n(t))\dots)\right]+g_2)+g_1

Continuing like this, we get the formula

g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.(\frac{\partial f_2}{\partial f_3}.(\dots(\frac{\partial f_n}{\partial t}.t+g_n)+g_{n-1})\dots)+g_2)+g_1

such that \lim\limits_{t\to 0}g_i=0 for all i\in \{1,2,3,\dots,n\}.

From the above formula, we get

\lim\limits_{t\to 0}g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.\frac{\partial f_2}{\partial f_3}.\dots\frac{\partial f_n}{\partial t}.t

Multi-variable differentiation.

There are very many bad books on multivariable calculus. “A Second Course in Calculus” by Serge Lang is the rare good book in this area. Succinct, thorough, and rigorous. This is an attempt to re-create some of the more orgasmic portions of the book.

In \Bbb{R}^n space, should differentiation be defined as \lim\limits_{H\to 0}\frac{f(X+H)-f(X)}{H}? No, as division by a vector (H) is not defined. Then \lim\limits_{\|H\|\to 0}\frac{f(X+H)-f(X)}{\|H\|}? We’re not sure. Let us see how it goes.

Something that is easy to define is f(X+H)-f(X), which can be written as

f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n) (H is the n-tuple (h_1,h_2,\dots,h_n)).

This expression in turn can be written as

f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)=\left[f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2+h_2,\dots,x_n+h_n)\right]\\+\left[f(x_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n+h_n)\right]+\dots+\left[f(x_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2+h_2,\dots,x_n)\right].

Here, we can use the Mean Value Theorem. Let us supposes_1\in((x_1+h_1,x_2+h_2,\dots,x_n+h_n),(x_1,x_2+h_2,\dots,x_n+h_n)),

or in general

s_k\in((x_1,x_2,\dots,x_k+h_k,\dots,x_n+h_n),(x_1,x_2,\dots,x_k\dots,x_n+h_n)). Then

f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)=\\ \displaystyle{\sum\limits_{k=1}^n{D_{x_k}(x_1,x_2,\dots,s_k,\dots,x_n+h_n).((x_1,x_2,\dots,x_k+h_k,\dots,x_n+h_n)-(x_1,x_2,\dots,x_k,\dots,x_n+h_n))}}.

No correction factor. Just this.

What follows is that a function

g_k=D_{x_k}(x_1,x_2,\dots,s_k,\dots,x_n+h_n)-D_{x_k}(x_1,x_2,\dots,x_k,\dots,x_n)

is assigned for every k=\{1,2,3,\dots,n\}.

Hence, the expression becomes

f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)=\sum\limits_{k=1}^n {D_{x_k}(x_1,x_2,\dots,x_n)+g_k}

It is easy to determine that \lim\limits_{H\to 0}g_k=0.

The more interesting question to ask here is that why did we use mean value theorem? Why could we not have used the formula f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)\\=\sum\limits_{k=1}^n {\left[D_{x_k}(x_1,x_2,\dots,x_k\dots,x_n+h_n)+g_k(x_1,x_2,\dots,x_k,\dots,x_n+h_n,h_k)\right]},

where \lim\limits_{h_k\to 0}g_k(x_1,x_2,\dots,x_k,\dots,x_n+h_n,h_k)=0??

This is because g_k(x_1,x_2,\dots,x_k,\dots,x_n+h_n,h_k) may not be defined at the point (x_1,x_2,\dots,x_n). If in fact every g_k is continuous at x_1,x_2,\dots,x_n), then we wouldn’t have to use mean value theorem.

Watch this space for some more expositions on this topic.

Watch this space for some more posts on this topic.

One passing note as I end this article.

A function is differentiable at X if it can be expressed in this manner: f(X+H)-f(X)=(gradf(X)).H+\|H\|g(X,H) such that \lim\limits_{\|H\|\to 0}g(X,H)=0. This is a necessary and sufficient condition; the definition of differentiability. It does not have a derivation. I spent a very long time trying to derive it before realising what a fool I had been.

Continuity decoded

The definition of continuity was framed after decades of deliberation and mathematical squabbling. The current notation we have is due to a Polish mathematician by the name of Weierstrass. It states that

“If f:\Bbb{R}\to \Bbb{R} is continuous at point a, then for every \epsilon>0, \exists\delta>0 such that for |x-a|<\delta, |f(x)-f(a)|<\epsilon.”

Now let us try and interpret the statement and break it down into simpler statements, in order to give us a strong visual feel.

Can \epsilon be very large? Of course! It can be 1,000,000 for example. Does there exist a \delta such that |x-a|<\delta\implies |f(x)-f(a)|<1,000,000, even if the function is not continuous? Yes. An example would be

f(x)=x for x\in(-\infty,a) and f(x)=x+1 for x\in[a,\infty)

Does this mean that we have proved a discontinuous function to be continuous? NO.

\epsilon should take up the values of all positive real numbers. So f(x) defined above will fail for \epsilon lower than 0.000\dots01

Let us suppose for some \epsilon>0, we have |f(x)-f(a)|<\epsilon if |x-a|<\delta. Let f(x_1) and f(x_2) be two points in B(f(a),\epsilon). Let us now make \epsilon=\frac{|f(x_1)-f(x_2)|}{2}. Will the value of \delta also have to decrease? Can it in fact increase?

The value of \delta cannot increase because the bigger interval will contain x_1 and x_2, and we know that that will violate the condition that for all points in B(a,\delta), the distance between the mappings has to be less than \frac{|f(x_1)-f(x_2)|}{2}. Can \delta remain the same? No (for the same reasons, as the interval will still contain x_1 and x_2). Hence, \delta most definitely has to decrease in this case?

However, does it always have to decrease? No. An example in case is a constant function like y=b.

We have now come to the most important aspect of continuity. The smaller we make \epsilon, the smaller the value of \delta. Does continuity also imply that the smaller we make \delta, the smaller the value of \epsilon? YES! How? When we decrease \delta, \epsilon obviously can’t get bigger. Moreover, we know that there do exist values of \delta which make smaller \epsilon possible. Say, for |f(x)-f(a)|<\epsilon/2, it is necessary that |x-a|<\delta/5. Hence, if we decrease the radius of the interval on the x-axis from \delta to \delta/5, the value of \epsilon (or the bound of the mappings of the points) also decreases to \epsilon/2.

In summation, a continuous function is such that

           decrease in value of \epsilon\Longleftrightarrow decrease in value of \delta

One may ask how does knowing this help?

It has become very easy to prove that differentiable functions are continuous, and a host of other properties of continuous functions.

A doubt that one may face here is does this imply that all continuous functions are differentiable? No. “decrease in value of \epsilon\Longleftrightarrow decrease in value of \delta” just implies that the derivative formula at a will have a limit for every cauchy sequence of x converging to a. In order for a function to be derivable, all those limits of the different cauchy sequences have to be equal. This is not implied by the aforementioned condition.

An attempted generalization of the Third Isomorphism Theorem.

I recently posted this question on math.stackexchange.com. The link is this.

My assertion was “Let G be a group with three normal subgroups K_1,K_2 and H such that K_1,K_2\leq H. Then (G/H)\cong (G/K_1)/(H/K_2). This is a generalization of the Third Isomorphism Theorem, which states that (G/H)\cong (G/K)/(H/K), where K\leq H.”

What was my rationale behind asking this question? Let G be a group and H its normal subgroup. Then G/H contains elements of the form g+H, where g+h=(g+\alpha h)+ H, for every \alpha\in Z.

Now let K_1,K_2 be two normal subgroups of G such that K_1,K_2\leq H. Then G/K_1 contains elements of the form g+K_1 and H/K_2 contains elements of the form h+K_2. Now consider (G/K_1)/(H/K_2). One coset of this would be \{[(g+ all elements of K_1)+(h_1+all elements of K_2)],[(g+ all elements of K_1)+(h_2+all elements of K_2)],\dots,[(g+ all elements of K_1)+(h_{|H/K_2|}+all elements of K_2)]\}. We are effectively adding every element of G/K_1 to all elements of H. The most important thing to note here is that every element of K_1 is also present in H.

Every element of the form (g+ any element in K_1) in G will give the same element in G/K_1, and by extension in (G/K_1)/(H/K_2). Let g and g+h be two elements in G (h\in H) such that both are not in K_1. Then they will not give the same element in G/K_1. However, as every element of H is individually added to them in (G/K_1)/(H/K_2), they will give the same element in the latter. If g and g' form different cosets in G/H, then they will also form different cosets in (G/K_1)/(H/K_2). This led me to conclude that (G/H)\cong (G/K_1)/(H/K_2).

This reasoning is however flawed, mainly because H/K_2 need not be a subgroup of G/K_1. Hence, in spite of heavy intuition into the working of cosets, I got stuck on technicalities.

Generalizing dual spaces- A study on functionals.

A functional is that which maps a vector space to a scalar field like \Bbb{R} or \Bbb{C}. If X is the vector space under consideration, and f_i:X\to \Bbb{R} (or f_i:X\to\Bbb{C}), then the vector space \{f_i\} of functionals is referred to as the algebraic dual space X^*. Similarly, the vector space of functionals f'_i:X^*\to \Bbb{R} (or f'_i:X^*\to\Bbb{C}) is referred to as the second algebraic dual space. It is also referred to as X^{**}.

How should one imagine X^*? Imagine a bunch of functionals being mapped to \Bbb{R}. One way to do it is to make all of them map only one particular x\in X. Hence, g_x:X^*\to \Bbb{R} such that g_x(f)=g(f(x)). Another such mapping is g_y. The vector space X^{**} is isomorphic to X.

My book only talks about X, X^* and X^{**}. I shall talk about X^{***}, X^{****}, and X^{**\dots *}. Generalization does indeed help the mind figure out the complete picture.

Say we have X^{n*} (n asterisks). Imagine a mapping X^{n*}\to \Bbb{R}. Under what conditions is this mapping well-defined? When we have only one image for each element of X^{n*}. Notice that each mapping f:X^{n*}\to \Bbb{R} is an element of the vector space X^{(n+1)*}. To make f a well-defined mapping, we select any one element a\in X^{(n-1)*}, and determine the value of each element of X^{n*} at a. One must note here that a is a mapping (a: X^{(n-2)*}\to\Bbb{R}). What element in X^{(n-2)*} that a must map to \Bbb{R} should be mentioned in advance. Similarly, every element in X^{(n-2)*} is also a mapping, and what element it should map from X^{(n-3)*} should also be pre-stated.

Hence, for every element in X^{n*}, one element each from X^{(n-2)*}, X^{(n-3)*},X^{(n-4)*},\dots ,X should be pre-stated. For every such element in X^{n*}, this (n-2)-tuple can be different. To define a well-defined mapping f:X^{n*}\to \Bbb{R}, we choose one particular element b\in X^{(n-1)*}, and call the mapping f_b. Hence,

f_b(X^{n*})=X^{n*}(b, rest of the  (n-2)-tuple ),

f_c(X^{n*})=X^{n*}(c, rest of the (n-2)-tuple), and so on.

 

By

f_b(X^{n*})=X^{n*}(b, rest of the  (n-2)-tuple),

we mean the value of every element of X^{n*} at (b, rest of the (n-2)-tuple).