2012/06/22

Android

开发环境

1. Using JDK 1.6
2. Remove the default Run/Debug configure of Eclipse project setting.
3. 安装对应的移动设备的USB驱动，打开设备的USB调试。

Application components

Android由Application components构成。Activities对应程序的一个界面，Activities之间是独立的，比如一个程序可以启动另一个程序的一个Activities。SDK提供了一个Activities基类。Services是后台运行的过程。Content providers是程序数据的接口，其他模块通过Content providers来访问自己的或者其他程序的数据。Broadcast receivers管理对系统事件广播的处理。

Activities

Activity的UI设计记录在AndroidManifest.xml里，这个XML作为资源存在程序的目录里，setContentView方法用于把Activity和这个XML联系起来。每个时刻只有一个Activity处于活动状态，系统会维护一个Activity的堆栈，用于跟踪程序的流程，比如按返回键的时候回到上一个Activity。
Activity需要在程序的AndroidManifest.xml里声明，其中一个是main Activity也就是，用户启动程序的时候执行的Activity。
<action android:name="android.intent.action.MAIN" />
Intent对象封装Activity启动命令的来源，当系统启动Activity的时候会把这样一个对象传给Activity，

2012/06/20

Beautiful Architecture

Common among the notions of architecture is the idea of structures, each defined by components of various sorts and their relations: how they fit together, invoke each other, communicate, synchronize, and otherwise interact.

Table 1-1. Structure summary
Structure	Components	Relations	Concerns
Information Hiding	Information Hiding Modules	Is a part of Is contained in	Changeability Modularity Buildability
Uses	Programs	Uses	Producibility Ecosystem
Process	Processes (tasks, threads)	Gives work to Gets resources from Shares resources with Contained in ...	Performance Changeability Capacity
Data Access	Programs and Segments	Has access to	Security Ecosystem

决定architecture的第一步是和stakeholders to understand and prioritize quality concerns and constraints，而不是通过functional requirements来决定architecture，因为满足functional requirements的architecture很多，选择其中一个的决定取决于quality concerns，并不是所有的可能都能达到要求。比如呈现网页的architecture很多，CGI，静态网页，ASP，JSP，Python。

Fred Brooks said that conceptual integrity is the most important attribute of an architecture: "It is better to have a system...reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas" (1995).

The first class of evaluation methods determines properties of the architecture, often by modeling or simulation of one or more aspects of the system.

A consistent architecture does not offer two (or more) ways to do the same thing, forcing the user to waste time choosing.

We would look for architectures with a well-defined Uses Structure that would support incremental construction, so that at each iteration of construction we would have a useful, testable system. We would also look for architectures that have well-defined module interfaces and that are inherently testable, so that the construction progress is transparent and visible.

2.2. Design Town

第一步，conceptional design，例如模块的划分，结构分层，业务流程模型，辅助库的选择。

3.2 Project Darkstar

Massively Multiplayer Online games (MMOs)，需要结构能scaling；Multi-cores chip evolution，需要concurrent。但是

This context gave us our first goal for the architecture. The requirements for scaling dictated that the system be distributed and concurrent, but we needed to present a much simpler programming model to the game developer. The simple statement of the goal is that the game developer should see the system as a single machine running a single thread, and all of the mechanisms that would allow deployment on multiple threads and multiple machines should be taken care of by the Project Darkstar infrastructure.

The general programming model that Project Darkstar requires is a reactive one, in which the server side of the game is written as a listener for events generated by the clients. When an event is detected, the game server should generate a task, which is a short-lived sequence of computations that includes manipulation of information in the virtual world and communication with the client that generated the original event and possibly with other clients. Tasks can also be generated by the game server itself, either as a response to some internal change or on a periodic, timed basis. In this way, the game server can generate characters in the game or world that aren't controlled by some outside player.

Darkstar在server端实现了一些独立的services，类似典型的操作系统，给server端的游戏提供access persistent storage, schedule and run tasks, and perform communication服务。服务的实现被隐藏在服务的接口后面，允许其实现的变化。

game的特点，不会有对数据的复杂query，每个task大概会query十几个对象。被query的对象一半的机会会被task修改。

task被包在一个transaction里面。如果两个task要修改一份数据(通过data service提供的接口)，data service会做出仲裁，访问失败的task会被reschedule。

Session Service管理客户端的login，认证，建立session，一旦session建立，客户端发向server的message会被Session Service解析，并且决定产生什么task。Session Service确保同一个客户端发的message被顺序处理，这样Task Service就不用关心task的顺序问题，
Session掩盖了底部的通信机制，所以Session的实际连接可以被重定向，移动到不同的server。

有点高级了，所有的tasks都被编译成java vm的bytecodes，可以在任何满足条件的环境里运行。

tasks都是被序列化的，server crash过程中丢失的task会被恢复出来，在别的server上reschedule，crash问题就转化成了延迟。

All data that lasts longer than a particular task be stored persistently.这种设计的读写延迟可以用更底层的cache技术优化。

Service结构的设计使得server端可以对通信监控，可以自由的把交互频繁的客户端的session移到同一个server，而得到更好的运行效率。

4. Making Memories

As we look at each of these facets, keep in mind that they are different ways of looking at the overall system. For instance, we used a modular architecture to support different deployment scenarios. At the same time, each module is built in a layered architecture. These are orthogonal but intersecting concerns. Each set of modules follows the same layering, and each layer is found across all the modules.

5. Resource-Oriented Architectures

What the business really wants are easier ways to manage the data they have collected, build upon it, and reuse it to support their customers and core functions.

2012/06/16

Moment Generating Function

r.v. $X$的Moment Generating Function是一个参数化的期望，一个参数化的函数把r.v. $X$映射到另一个r.v.，然后求其期望，$f_t(X)=e^{tX}$的期望： $$\phi(t)=E(f_t(X))=E(e^{tX})$$ 可以推出： $$\phi^{(n)}(0)=E(X^n)$$ 所以计算Moment Generating Function的过程就是计算期望的过程，最后的一个用$t$参数化的期望。

2012/06/15

Book Highlight - I

Quantum

Einstein, Bohr, And The Great Debate About the Nature of Reality

In his quest for a new mechanics for the quantised world of the atom, Heisenberg concentrated on the frequencies and relative intensities of the spectral lines produced when an electron instantaneously jumped from one energy level to another. He had no other choice; it was the only available data about what was happening inside an atom. Despite the imagery conjured up by all the talk of quantum jumps and leaps, an electron did not 'jump' through space as it moved between energy levels like a boy jumping off a wall onto the pavement below. It was simply in one place and an instant later it popped up in another without being anywhere in between. Heisenberg accepted that all observables, or anything connected with them, were associated with the mystery and magic of the quantum jump of an electron between two energy levels. Lost forever was the picturesque miniature solar system in which each electron orbited a nuclear sun.

On the pollen-free haven of Helgoland, Heisenberg devised a method of book-keeping to track all possible electron jumps, or transitions, that could occur between the different energy levels of hydrogen. The only way he could think of recording each observable quantity, associated with a unique pair of energy levels, was to use an array: $$(v_{ij})_{m\times n}$$ This was the array for the entire set of possible frequencies of the spectral lines that could theoretically be emitted by an electron when it jumps between two different energy levels. If an electron quantum jumps from the energy level $E_2$ to the lower energy level $E_1$, a spectral line is emitted with a frequency designated by $v_{21}$ in the array. The spectral line of frequency $v_{12}$ would only be found in the absorption spectrum, since it is associated with an electron in energy level $E_1$ absorbing a quantum of energy sufficient to jump to energy level $E_2$. A spectral line of frequency vmn would be emitted when an electron jumps between any two levels whose energies are $E_m$ and $E_n$, where $m$ is greater than $n$. Not all the frequencies $v_{mn}$ are exactly observed. For example, measurement of $v_{11}$ is impossible, since it would be the frequency of the spectral line emitted in a 'transition' from energy level $E_1$ to energy level $E_1$ – a physical impossibility. Hence $v_{11}$ is zero, as are all potential frequencies when $m=n$. The collection of all non-zero frequencies, $v_{mn}$, would be the lines actually present in the emission spectrum of a particular element.

Another array could be formed from the calculation of transition rates between the various energy levels. If the probability for a particular transition, $a_{mn}$, from energy level $E_m$ to $E_n$, is high, then the transition is more likely than one with a lower probability. The resulting spectral line with frequency $v_{mn}$ would be more intense than for the less probable transition. Heisenberg realised that the transition probabilities $a_{mn}$ and the frequencies $v_{mn}$ could, after some deft theoretical manipulation, lead to a quantum counterpart for each observable quantity known in Newtonian mechanics such as position and momentum.

Schrödinger finally proposed that the wave function of an electron, for example, was intimately connected to the cloud-like distribution of its electric charge as it traveled through space. In wave mechanics the wave function was not a quantity that could be directly measured because it was what mathematicians call a complex number. 4+3i is one example of such a number, and it consists of two parts: one 'real' and the other 'imaginary'. 4 is an ordinary number and is the 'real' part of the complex number 4+3i. The 'imaginary' part, 3i, has no physical meaning because i is the square root of -1. The square root of a number is just another number that multiplied by itself will give the original number. The square root of 4 is 2 since 2×2 equals 4. There is no number that multiplied by itself equals -1. While 1×1=1, –1×–1 is also equal to 1, since by the laws of algebra, a minus times a minus generates a plus.

The wave function was unobservable; it was something intangible that could not be measured. However, the square of a complex number gives a real number that is associated with something that can actually be measured in the laboratory. The square of 4+3i is 25. Schrödinger believed that the square of the wave function of an electron, $|\Psi(x,t)|^2$ was a measure of the smeared-out density of electric charge at location x at time t.

As part of his interpretation of the wave function, Schrödinger introduced the concept of a 'wave packet' to represent the electron as he challenged the very idea that particles existed. He argued that an electron only 'appeared' to be particle-like but was not actually a particle, despite the overwhelming experimental evidence in favour of it being so. Schrödinger believed that a particle-like electron was an illusion. In reality there were only waves. Any manifestation of a particle electron was due to a group of matter waves being superimposed into a wave packet. An electron in motion would then be nothing more than a wave packet that moved like a pulse sent, with a flick of the wrist, travelling down the length of a taut rope tied at one end and held at the other. A wave packet that gave the appearance of a particle required a collection of waves of different wavelengths that interfered with one another in such a way that they cancelled each other out beyond the wave packet.

If giving up particles and reducing everything to waves rid physics of discontinuity and quantum jumps, then for Schrödinger it was a price worth paying. However, his interpretation soon ran into difficulties as it failed to make physical sense. Firstly, the wave packet representation of the electron began to unravel when it was discovered that the constituent waves would spread out across space to such a degree that they would have to travel faster than the speed of light if they were to be connected with the detection of a particle-like electron in an experiment.

Try as he might, there was no way for Schrödinger to prevent this dispersal of the wave packet. Since it was made up of waves that varied in wavelength and frequency, as the wave packet traveled through space it would soon begin to spread out as individual waves moved at different velocities. An almost instantaneous coming together, a localisation at one point in space, would have to take place every time an electron was detected as a particle. Secondly, when attempts were made to apply the wave equation to helium and other atoms, Schrödinger's vision of the reality that lay beneath his mathematics disappeared into an abstract, multi-dimensional space that was impossible to visualize.

Nor could Schrödinger's interpretation account for the photoelectric and Compton effects. There were unanswered questions: how could a wave packet possess electric charge? Could wave mechanics incorporate quantum spin? If Schrödinger's wave function did not represent real waves in everyday three-dimensional space, then what were they? It was Max Born who provided the answer.

The wave function itself has no physical reality; it exists in the mysterious, ghost-like realm of the possible. It deals with abstract possibilities, like all the angles by which an electron could be scattered following a collision with an atom. There is a real world of difference between the possible and the probable. Born argued that the square of the wave function, a real rather than a complex number, inhabits the world of the probable. Squaring the wave function, for example, does not give the actual position of an electron, only the probability, the odds that it will found here rather than there. For example, if the value of the wave function of an electron at X is double its value at Y, then the probability of it being found at X is four times greater than the probability of finding it at Y. The electron could be found at X, Y or somewhere else.

Niels Bohr would soon argue that until an observation or measurement is made, a microphysical object like an electron does not exist anywhere. Between one measurement and the next it has no existence outside the abstract possibilities of the wave function. It is only when an observation or measurement is made that the 'wave function collapses' as one of the 'possible' states of the electron becomes the 'actual' state and the probability of all the other possibilities becomes zero.

For Born, Schrödinger's equation described a probability wave. There were no real electron waves, only abstract waves of probability. 'From the point of view of our quantum mechanics there exists no quantity which in an individual case causally determines the effect of a collision', wrote Born. And he confessed, 'I myself tend to give up determinism in the atomic world.' Yet while the 'motion of particles follows probability rules', he pointed out, 'probability itself propagates according to the law of causality'.

It took Born the time between his two papers to fully grasp that he had introduced a new kind of probability into physics. 'Quantum probability', for want of a better term, was not the classical probability of ignorance that could in theory be eliminated. It was an inherent feature of atomic reality. For example, the fact that it was impossible to predict when an individual atom would decay in a radioactive sample, amid the certainty that one would do so, was not due to a lack of knowledge but was the result of the probabilistic nature of the quantum rules that dictate radioactive decay.

According to the uncertainty principle, a measurement that yields an exact value for the momentum of a microphysical object or system excludes even the possibility of simultaneously measuring its position. The question that Einstein wanted to answer was: Does the inability to measure its exact position directly mean that the electron does not have a definite position? The Copenhagen interpretation answered that in the absence of a measurement to determine its position, the electron has no position. EPR set out to demonstrate that there are elements of physical reality, such as an electron having a definite position, that quantum mechanics cannot accommodate – and therefore, it is incomplete.

EPR attempted to clinch their argument with a thought experiment. Two particles, A and B, interact briefly and then move off in opposite directions. The uncertainty principle forbids the exact measurement, at any given instant, of both the position and the momentum of either particle. However, it does allow an exact and simultaneous measurement of the total momentum of the two particles, A and B, and the relative distance between them.

The key to the EPR thought experiment is to leave particle B undisturbed by avoiding any direct observation of it. Even if A and B are light years apart, nothing within the mathematical structure of quantum mechanics prohibits a measurement of the momentum of A yielding information about the exact momentum of B without B being disturbed in the process. When the momentum of particle A is measured exactly, it indirectly but simultaneously allows, via the law of conservation of momentum, an exact determination of the momentum of B. Therefore, according to the EPR criterion of reality, the momentum of B must be an element of physical reality. Similarly, by measuring the exact position of A, it is possible, because the physical distance separating A and B is known, to deduce the position of B without directly measuring it. Hence, EPR argue, it too must be an element of physical reality. EPR appeared to have contrived a means to establish with certainty the exact values of either the momentum or the position of B due to measurements performed on particle A, without the slightest possibility of particle B being physically disturbed.

Given their reality criterion, EPR argued that they had thus proved that both the momentum and position of particle B are 'elements of reality', that B can have simultaneously exact values of position and momentum. Since quantum mechanics via the uncertainty principle rules out any possibility of a particle simultaneously possessing both these properties, these 'elements of reality' have no counterparts in the theory. Therefore the quantum mechanical description of physical reality, EPR conclude, is incomplete.

Einstein, Podolsky and Rosen had conjured up an imaginary experiment that involved a pair of correlated particles, A and B, so far apart that it should be impossible for them to physically interact with one another. EPR argued that a measurement carried out on particle A could not physically disturb particle B. Since any measurement is performed on only one of the particles, EPR believed they could cut off Bohr's counter-attack – an act of measurement causes a 'physical disturbance'. Since the properties of the two particles are correlated, they argued that by measuring a property of particle A, such as its position, it is possible to know the corresponding property of B without disturbing it. EPR's aim was to demonstrate that particle B possessed the property independently of being measured, and since this was something that quantum mechanics failed to describe, it was therefore incomplete. Bohr countered, never so succinctly, that the pair of particles were entangled and formed a single system no matter how far apart they were. Therefore, if you measured one, then you also measured the other.

The Copenhagen interpretation requires an observer outside the universe to observe it, but since there is none – leaving God aside – the universe should never come into existence but remain forever in a superposition of many possibilities. This is the long-standing measurement problem writ large. Schrödinger's equation that describes quantum reality as a superposition of possibilities, and attaches a range of probabilities to each possibility, does not include the act of measurement. There are no observers in the mathematics of quantum mechanics. The theory says nothing about the collapse of the wave function, the sudden and discontinuous change of the state of a quantum system upon observation or measurement, when the possible becomes the actual. In Everett's many worlds interpretation there was no need for an observation or measurement to collapse the wave function, since each and every quantum possibility coexists as an actual reality in an array of parallel universes.

2012/06/13

比赛

A，B，先赢$c$场的队赢，求比的场数的期望。

最多会比$2(c-1)+1=2c-1$场，A,B的情况完全对称，考虑A赢的情况，乘以2得全部的可能。

令$c\leq k\leq 2c-1$为A赢的情况下，比赛的场数。可能的组合是 $$N_k = {k\choose c}-{k-1\choose c}={k-1\choose c-1}$$ $k$场里选$c$场$A$赢，去掉A在$k-1$场以前已经赢的情况。

概率是 $$p^c(1-p)^{k-c}$$ A赢的情况下，比赛场数的期望是 $$N(p)=\sum_{k=c}^{2c-1}{k-1\choose c-1}p^i(1-p)^{k-c}$$ 根据对称性，B赢的情况下，比赛场数的期望是$N(1-p)$，所以 $$N=N(p)+N(1-p)$$

Set and Function

集合的角度看映射

集合可以看成具有某种属性的元素的全体，两个集合$A, B$，其交集就是不能用这两个属性区分的元素的集合，例如$A$代表有红颜色的元素，$B$代表有黑颜色的元素，$A\cap B$可以是部分表面红部分表面黑的元素。颜色黑红这两个属性不能区分一些元素。

映射前不能被区分的元素$A\cap B$

映射后不能区分的元素$f(A)\cap f(B)$

映射前不能被区分的元素被映射后的命运$f(A\cap B)$

1)所有的mapping都不增加信息，也就是map后$A\cap B$必须还在$f(A)\cap f(B)$里： $$f(A\cap B)\subseteq f(A)\cap f(B)$$ 因为如果有一个元素在$f(A\cap B)$里但不在$f(A)\cap f(B)$里，它必然是在$f(A),f(B)$其中一个里面，并且只在其中一个里面，变的可以区分了。

2) injection不减少信息，以前能区分的元素不会被injection变的不能区分了，$f(A\cap B)$包含了所有映射后不能被区分的元素： $$f(A)\cap f(B) \subseteq f(A\cap B)$$ 所以对injection有 $$f(A)\cap f(B) = f(A\cap B)$$ 3) 逆映射在原空间上引入了等价关系，如果map到同一点认为是等价的，取这个等价关系的商空间，原来的映射变成新的商空间到值域的injection。这个商空间的特点是把原来因为map而减少的信息直接原空间内消化了，原来的map变成了不减少信息的injection。

満射得到的商空间到值域的映射是一一映射，所以保持所有的集合运算。

等价类的特点就是，如果两个等价类相交，那么重合。

2012/06/12

有用的命令

$ \newcommand{\infint}{{\int_{-\infty}^{\infty}}} \newcommand{\ie}{\textit{i.e.}\ } \newcommand{\eg}{\textit{e.g.}\ } \newcommand{\pfrac}[2]{{\frac{\partial #1}{\partial #2}}} \newcommand{\eqn}[1]{Eq.\ (\ref{#1})} $
Putty command line:
start d:\softwares\putty.exe -ssh -pw xxx -P 1234 user@localhost

Run VirtualBox headless:
VBoxHeadless.exe -s Sage-5.0

Sage interactive:

Latex newcommad command:

$$\infint f(x)dx$$ $$\pfrac{L}{q_n},\quad \pfrac{}{x_i}$$

2012/06/10

概率计算技巧

1) 利用经典分布。例如二项分布，求$n$个元素里具有某种属性的元素的个数为$r$的概率，首先求出单个元素具有这种属性的概率，然后利用分布公式。

$d$为显性基因，$r$为隐性基因，父母为都为$rd$，求4个孩子里有3个呈显性的概率。

Solution: If we assume that each child is equally likely to inherit either of two genes from each parent, the probabilities that the child of two hybrid parents will have $dd, rr, rd$ pairs of genes are, respectively $(1/4, 1/4, 1/2)$, Hence, because an offspring will have the outward appearance of the dominant gene if its gene pair is either $dd, \text{ or } rd$, it follows that the number of such children is binomially distributed with parameters $(4, 3/4)$.(注意到单个元素具有显性这种属性的概率需要计算，不是直接知道。)

二项分布其实给出了概率空间里，$n$个(不相交的)子集的概率情况，一般问题中要求的具有某个属性个子集可以表达成这$n$个子集的并。

2) 区间上的均匀分布：随机变量的值落到某个子区间的概率等于这个子区间的长度占全部区间的比例。

3) 正态分布：密度函数可以吸收$x$的线性变换而不改变形式 $$f(x)=\frac{1}{\sqrt{2\pi}\sigma} \exp{ \left( -\frac{1}{2} \left( \frac{x-\mu}{\sigma} \right)^2 \right) }$$

4) 随机变量$X$的函数$g(X)$的期望计算的时候只能把$g$做到积分$E(X)=\int_{-\infty}^{\infty}xf(x)dx$里面的$x$上，不能作用在$E(X)$的结果上。 $$E(g(X))=\int_{-\infty}^{\infty}g(x)f(x)dx$$ moment是$g(X)=X^n$。

Var是$g(X)=(X-m)^2, m=E(X)$。

二维的时候 $$E(g(X, Y))=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}g(x, y)f(x, y)dxdy$$ 但是如果$g$是线性函数，显然可以和积分交换，所以期望具有线性。

5) 分布函数$F(a)$接受的参数是一个界限$a$，联合分布函数$F(a, b)$接受的参数是两个界限$a, b$。得单个分布的方法是另一个界限传入$+\infty$，例如 $$F_X(a)=F(a, +\infty)$$ 联合密度是$p(x, y)=P(X=x, Y=y)$，要得单个r.v.的方法是对另一个求和 $$p_X(x)=\sum_{y}p(x,y)$$ 连续的时候是对另一个积分 $$f_X(x)=\int_{-\infty}^{\infty}f(x,y)dy$$ 一个例子

$E(X+Y)=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}(x+y)f(x, y)dxdy\\ =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}xf(x, y)dxdy + \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}yf(x, y)dxdy\\ =\int_{-\infty}^{\infty}x\int_{-\infty}^{\infty}f(x, y)dydx + \int_{-\infty}^{\infty}y\int_{-\infty}^{\infty}f(x, y)dxdy\\ =\int_{-\infty}^{\infty}xf_X(x)dx + \int_{-\infty}^{\infty}yf_Y(y)dy\\ =E(X) + E(Y)$
6) 现实中期望是容易得到的参数，只要对r.v.做很多实验，求平均值就可以了。有些不等式给出了概率空间某些集合的概率的估计是期望的函数。

7) 利用indicator函数

indicator的特点是期望等于取值1的概率 $$E(I)=P(I=1)=1-P(I=0)$$ 并且这个r.v. indicator的平方的期望不变 $$E(I^2)=1^2\cdot P(I=1) + 0^2\cdot P(I=0)$$

25种优惠券，被选中的机会均等，求随机取10张优惠券里面优惠券种类的期望。

概率空间里的元素$\omega$为10张优惠券的组合，考虑集合$A_i=\{\omega | \omega \text{ contains type $i$ coupon }\}$，定义$I_i$为$A_i$的indicator，问题中的r.v. 可以表达成$X(\omega)=\sum_iI_i(\omega)$，由期望的线性 $$E(x)=E(\sum_iI_i)=\sum_iE(I_i)=\sum_i[1-\left(\frac{24}{25}\right)^{10}]=25[1-\left(\frac{24}{25}\right)^{10}]$$ 利用条件，概率空间里的元素是set，r.v是set的一个内部的一个计数，计数可以分解成indicator。

8) 密度函数，分布函数表达的都是r.v.某个逆像的概率测度，r.v.独立的定义是任何逆像作为概率空间的子集是独立的。子集独立定义用到其交集的概率，而联合分布其实是定义了逆像的交集的概率。

独立r.v.实际上对应了概率空间内部的一个乘积分解，原空间实际上可以表达成两个概率空间的乘积。

一个随机变量对应了概率空间的一个维度(测量样本的一个属性)，测很多个属性，里面独立的对应了概率空间内部的乘积结构。

9) 概率和组合：
i) 如果问题中的组合有$N$种可能，每种组合的概率是相等的，也就是$p=\frac{1}{N}$，那么要求其中某些组合的集合的概率，1.直接计数，算满足条件的组合有个数$M$，概率是$\frac{M}{N}$。2.已知每个组合的概率是$p=\frac{1}{N}$，满足条件的组合有$M$种，简单的把$M$个组合的概率加起来就得到结果，$M\frac{1}{N}$.
ii) 有时候满足要求的组合的概率并不相等，这时候要单独计算每种组合的概率和个数，最后做加法。比如抛硬币，出现正面结束，求抛的次数小于5的概率。
iii) 还有满足条件的元素可以分成多组，每组内部的元素概率相等，不同组的元素的概率不同。计算方法和ii类似，组内计数，的组的概率，整个组构成的子集的概率，然后所有的组相加，例如后面的比赛的例子。

10) 概率，和式，积分

和式以及积分，分成两部分，第一是变量可以取值的范围，或者下标可以取值的范围，第二部分是刚才约束的变量的函数，这个函数取遍所有的可能不重复的值求和。

两个部分可以独立的变化，只要不改变实际的范围或者函数的取值。

11) Conditioning 用indicator可以把一个集合的概率$P$和它的indicator的r.v.的期望联系起来(进一步，它们的条件期望和条件概率也相等)，期望可以用条件期望的方法计算。例子，假设$X,Y$是两个独立的r.v.,计算$P\{X < Y\}$ $$P\{X < Y\}=\int_{-\infty}^{\infty}P\{ X < Y|Y=y \}f_Y(y)dy$$ 等式右边的积分是期望带来的。

2012/06/09

Patterns in Permutations and Words

Chapter 1. What is a pattern？

Roughly speaking, for our purposes, patterns in permutations are permutations with additional restrictions, and patterns in words are certain restricted Words (possibly permutations).

Reduced form of a permutation：把一个$r$ permutation里的元素重新命名，使得他们成为一个$\{1,\cdots r\}$排列，大小顺序不变。例如： $$Red(3174)=2143$$ Reverse of a permutation：$\sigma=\sigma_1\sigma_2\cdots\sigma_n$的reverse $r(\sigma)=\sigma_n\sigma_{n-1}\cdots\sigma_1$。
Complement of a permutation：用$\{1,\cdots r\}$里面最小的替换$\sigma=\sigma_1\sigma_2\cdots\sigma_n$里最大的，第二小的替换第二大的，以此类推，记为$c(\sigma)$。
Inverse of a permutation：The $\sigma_i$-th position of the inverse $i(\sigma)$ is occupied by $i$。

这三个变换称为$S_n$的trivial bijections。

Permutation matrix：Permutation可以描述为第$i$个位置是几($\sigma_i$)，用矩阵表示，列用值index，行用位置index。

Ascent是Permutation里的一个字母，后面接比它大的字母；Descent后面接比它小的字母的。n-permutation里面Ascent的个数加Descent的个数是$n-1$；如果一个Permutation以Ascent开始，然后Descent，然后Ascent，以此类推，这个Permutation称为Alternating Permutation。

Interval of a Permutation是里面的一个factor，而且他们的值是相邻的(不要求按值的大小排列)，例如423 is an interval in 1642375。如果一个Permutation里全部proper initial factor都不是Interval。

Dirac: The Hamiltonian Method

I'm very happy to be here at Yeshiva and to have this chance to talk to you about some mathematical methods that I have been working on for a number of years. I would like first to describe in a few words the general object of these methods.

In atomic theory we have to deal with various fields. There are some fields which are very familiar, like the electromagnetic and the gravitational fields; but in recent times we have a number of other fields also to concern ourselves with, because according to the general ideas of De Broglie and Schrodinger every particle is associated with waves and these waves may be considered as a field. So we have in atomic physics the general problem of setting up a theory of various fields in interaction with each other. We need a theory conforming to the principles of quantum mechanics, but it is quite a difficult matter to get such a theory.

One can get a much simpler theory if one goes over to the corresponding classical mechanics, which is the form which quantum mechanics takes when one makes Planck's constant $\hbar$ tend to zero. It is very much easier to visualize what one is doing in terms of classical mechanics. It will be mainly about classical mechanics that I shall be talking in these lectures.

Now you may think that that is really not good enough, because classical mechanics is not good enough to describe Nature. Nature is described by quantum mechanics. Why should one, therefore, bother so much about classical mechanics? Well, the quantum field theories are, as I said, quite difficult and so far, people have been able to build up quantum field theories only for fairly simple kinds of fields with simple interactions between them. It is quite possible that these simple fields with the simple interactions between them are not adequate for a description of Nature. The successes which we get with quantum field theories are rather limited. One is continually running into difficulties and one would like to broaden one's basis and have some possibility of bringing more general fields into account. For example, one would like to take into account the possibility that Maxwell's equations are not accurately valid. When one goes to distances very close to the charges that arc producing the fields, one may have to modify Maxwell's field theory so as to make it into a nonlinear electrodynamics. This is only one example of the kind of generalization which it is profitable to consider in our present state of ignorance of the basic ideas, the basic forces and the basic character of the fields of atomic theory.

In order to be able to start on this problem of dealing with more general fields, we must go over the classical theory. Now, if we can put the classical theory into the Hamiltonian form, then we can always apply certain standard rules so as to get a first approximation to a quantum theory. My talks will be mainly concerned with this problem of putting a general classical theory into the Hamiltonian form. When one has done that, one is well launched onto the path of getting an accurate quantum theory. One has, in any case, a first approximation.

Of course, this work is to be considered as a preliminary piece of work. The final conclusion of this piece of work must be to set up an accurate quantum theory, and that involves quite serious difficulties, difiiculties of a fundamental character which people have been worrying over for quite a number of years. Some people are so much impressed by the difficulties of passing over from Hamiltonian classical mechanics to quantum mechanics that they think that maybe the whole method of working from Hamiltonian classical theory is a bad method. Particularly in the last few years people have been trying to set up alternative methods for getting quantum field theories. They have made quite considerable progress on these lines. They have obtained a number of conditions which have to be satisfied. Still I feel that these alternative methods, although they go quite a long way towards accounting for experimental results, will not lead to a final solution to the problem. I feel that there will always be something missing from them which we can only get by working from a Hamiltonian, or maybe from some generalization of the concept of a Hamiltonian. So I take the point of view that the Hamiltonian is really very important for quantum theory.

In fact, without using Hamiltonian methods one cannot solve some of the simplest problems in quantum theory, for example the problem of getting the Balmer formula for hydrogen, which was the very beginning of quantum mechanics. A Hamiltonian comes in therefore in very elementary ways and it seems to me that it is really quite crucial to work from a Hamiltonian; so I want to talk to you about how far one can develop Hamiltonian methods.

I would like to begin in an elementary way and I take as my starting point an action principle. That is to say, I assume that there is an action integral which depends on the€˜ motion, such that, when one varies the motion, and puts down the conditions for the action integral to be stationary, one gets the equations of motion. The method of starting from an action principle has the one great advantage, that one can easily make the theory conform in to the principle of relativity. We need our atomic theory in conform to relativity because in general we are dealing with particles moving with high velocities.

lf we want to bring in the gravitational field, then we have to make our theory conform to the general principle of relativity, which means working with a space-time which is not flat. Now the gravitational field is not very important in atomic physics, because gravitational forces are extremely weak compared with the other kinds of forces which are present in atomic processes, and for practical purposes one can neglect the gravitational field. People have in recent years worked to some extent on bringing the gravitational field into the quantum theory, but I think that the main object of this work was the hope that bringing in the gravitational field might help to solve some of the difficulties. As far as one can see at present, that hope is not realized, and bringing in the gravitational field seems to add to the difficulties rather than remove them. So that there is not very much point at present in bringing gravitational field into atomic theory. However, the methods which I am going to describe are powerful mathematical methods which would be available whether one brings in the gravitational field or not. We start off with an action integral which I denote by $$I=\int L dt \tag{1-1}$$ It is expressed as a time integral, the integrand $L$ being the Lagrangian. So with an action principle we have a Lagrangian. We have to consider how to pass from that Lagrangian to a Hamiltonian. When We have got the Hamiltonian, we have made the first step toward getting a quantum theory.

You might wonder whether one could not take the Hamiltonian as the starting point and short-circuit this work of beginning with an action integral, getting a Lagrangian from it and passing from the Lagrangian to the Hamiltonian. The objection to trying to make this short-circuit is that it is not at all easy to formulate the conditions for a theory to be relativistic in terms of the Hamiltonian. In terms of the action integral, it is very easy to formulate the conditions for the theory to be relativistic: one simply has to require that the action integral shall be invariant. One can easily construct innumerable examples of action integrals which are invariant. They will automatically lead to equations of motion agreeing with relativity, and any developments from this action integral will therefore also be in agreement with relativity.

When we have the Hamiltonian, we can apply a standard method which gives us a first approximation to a quantum theory, and if we are lucky we might be able to go on and get an accurate quantum theory. You might again wonder whether one could not short-circuit that work to some extent. Could one not perhaps pass directly from the Lagrangian to the quantum theory, and shortcircuit altogether the Hamiltonian ? Well, for some simple examples one can do that. For some of the simple fields which are used in physics the Lagrangian is quadratic in the velocities, and is like the Lagrangian which one has in the non-relativistic dynamics of particles. For these examples for which the Lagrangian is quadratic in the velocities, people have devised some methods for passing directly from the Lagrangian to the quantum theory. Still, this limitation of the Lagrangians being quadratic in the velocities is quite a severe one. I want to avoid this limitation and to work with a Lagrangian which can be quite a general function of the velocities. To get a general formalism which will be applicable, for example, to the non-linear electrodynamics which I mentioned previously, I don't think one can in any way shortcircuit the route of starting with an action integral, getting a Lagrangian, passing from the Langrangian to the Hamiltonian, and then passing from the Hamiltonian to the quantum theory. That is the route which I want to discuss in this course of lectures.

In order to express things in a simple way to begin with, I would like to start with a dynamical theory involving only a finite number of degrees of freedom, such as you are familiar with in particle dynamics. It is then merely a formal matter to pass from this finite number of degrees of freedom to the infinite number of degrees of freedom which we need for a field theory.

Starting with a finite number of degrees of freedom, we have dynamical coordinates which I denote by $q$. The general one is $q_n, n = 1,\cdots , N, N$ being the number of degrees of freedom. Then we have the velocities $dq_n/dt = \dot{q}_n$. The Lagrangian is a function $L = L(q, \dot{q})$ of the coordinates and the velocities.

You may be a little disturbed at this stage by the importance that the time variable plays in the formalism. We have a time variable $t$ occurring already as soon as we introduce the Lagrangian. It occurs again in the velocities, and all the work of passing from Lagrangian to Hamiltonian involves one particular time variable. From the relativistic point of view we are thus singling out one particular observer and making our whole formalism refer to the time for this observer. That, of course, is not really very pleasant to a relativist, who would like to treat all observers on the same footing. However, it is a feature of the present formalism which l do not see how one can avoid if one wants to keep to the generality of allowing the Lagrangian to be any function of the coordinates and velocities. We can be sure that the contents of the theory are relativistic, even though the form of the equations is not manifestly relativistic on account of the appearance of one particular time in a dominant place in the theory.

Let us now develop this Lagrangian dynamics and pass over to Hamiltonian dynamics, following as closely as we can the ideas which one learns about as soon as one deals with dynamics from the point of view of working with general coordinates. We have the Lagrangian equations of motion which follow from the variation of the action integral: $$\frac{d}{dt}\frac{\partial L}{\partial \dot{q}_n}=\frac{\partial L}{\partial q_n} \tag{1-2}$$ To go over to the Hamiltonian formalism, we introduce the momentum variables $p_n$, which are defined by $$p_n=\frac{\partial L}{\partial \dot{q}_n} \tag{1-3}$$ Now in the usual dynamical theory, one makes the assumption that the momenta are independent functions of the velocities, but that assumption is too restrictive for the applications which we are going to make. We want to allow for the possibility of these momenta not being independent functions of the velocities. In that case, there exist certain relations connecting the momentum variables, of the type $\phi(q,p)=0$

There may be several independent relations of this type, and if there are, we distinguish them one from another by a suffix $m=1, \cdots, M$, so we have $$\phi_m(q,p) = 0 \tag{1-4}$$ The $q$'s and the $p$'s are the dynamical variables of the Hamiltonian theory. They are connected by thc relations (1-4), which are called the primary constraints of the Hamiltonian formalism. This terminology is due to Bergmann, and I think it is a good one.

Let us now consider the quantity $p_n\dot{q}_n - L$. (Whenever there is a repeated suffix I assume a summation over all values of that suffix.) Let us make variations in the variables $q$ and $\dot{q}$), in the coordinates and the velocities. These variations will cause variations to occur in the momentum variables $p$. As a result of these variations, $$ \begin{eqnarray} \delta(p_n\dot{q}_n - L) &=& \delta p_n\dot{q}_n + p_n\delta \dot{q}_n -\left(\frac{\partial L}{\partial q_n}\right)\delta q_n - \left(\frac{\partial L}{\partial \dot{q}_n}\right)\delta \dot{q}_n\\ &=& \delta p_n\dot{q}_n -\left(\frac{\partial L}{\partial q_n}\right)\delta q_n \end{eqnarray} \tag{1-5} $$ by $(1\text{-}3)$.

Now you see that the variation of this quantity $p_n\dot{q}_n - L$ involves only the variation of the $q$'s and that of the $p$'s. It does not involve the variation of the velocities. That means that $p_n\dot{q}_n L$ - can be expressed in terms of the $q$'s and the $p$'s, independent of the velocities. Expressed in this way, it is called the Hamiltonian $H$.

However, the Hamiltonian defined in this way is not uniquely determined, because we may add to it any linear combination of the $\phi$'s, which are zero. Thus, we could go over to another Hamiltonian $$H^* = H + c_m\phi_m \tag{1-6}$$ where the quantities $c_m$ are coefficients which can be any function of the $q$'s and the $p$'s. $H^*$ is then just as good as $H$; our theory cannot distinguish between $H$ and $H^*$. The Hamiltonian is not uniquely determined.

We have seen in $(1\text{-}5)$ that $$\delta H = \dot{q}_n\delta p_n -\left(\frac{\partial L}{\partial q_n}\right)\delta q_n$$ This equation holds for any variation of the $q$'s and the $p$'s subject to the condition that the constraints $(1\text{-}4)$ are preserved. The $q$'s and the $p$'s cannot be varied independently because they are restricted by $(1\text{-}4)$, but for any variation of the $q$'s and the $p$'s which preserves these conditions, we have this equation holding. From the general method of the calculus of variations applied to a variational equation with constraints of this kind, we deduce $$ \dot{q}_n =\frac{\partial H}{\partial p_n} + u_m\frac{\partial \phi_m}{\partial p_n}\tag{1-7} $$ and $$ -\frac{\partial L}{\partial q_n} = \frac{\partial H}{\partial q_n} + u_m\frac{\partial \phi_m}{\partial q_n} $$ or $$\dot{p}_n =- \frac{\partial H}{\partial q_n} - u_m\frac{\partial \phi_m}{\partial q_n}$$ with the hclp of $(1\text{-}2)$ and $(1\text{-}3)$, where the $u_m$ are unknown coefficients. We have here the Hamiltonian equations of motion, describing how the variables $q$ and $p$: vary in time, but these equations involve unknown coefficients $u_m$.

It is convenient to introduce a certain formalism which enables one to write these equations briefly, namely the Poisson bracket formalism. It consists of the following: If we have two functions of the the $q$'s and the $p$'s, say $f(q, p$) and $g(q, p)$, they have a Poisson bracket $[f, g]$ which is defined by $$[f, g] = \frac{\partial f}{\partial q_n} \frac{\partial g}{\partial p_n} - \frac{\partial f}{\partial p_n} \frac{\partial g}{\partial q_n} \tag{1-9}$$ The Poisson brackets have certain properties which follow from their definition, namely $[f,g]$ is anti-symmetric in $f$ and $g$: $$[f, g] = -[g, f], \tag{1-10}$$ it is linear in either member: $$[f_1 + f_2, g] = [f_1, g] + [f_2, g], \text{etc.;} \tag{1-11}$$ and we have the product law, $$[f_1f_2, g] = f_1[f_2, g] + [f_1, g]f_2. \tag{1-12}$$ Finally, there is the relationship, known as the Jacobi Identity, connecting three quantities: $$[f, [g, h]] + [g, [h, f]] + [h, [f, g]] = 0, \tag{1-13}$$ With the help of the Poisson bracket, one can rewrite the equations of motion. For any function $g$ of the $q$'s and the $p$'s, we have $$\dot{g}=\frac{\partial g}{\partial q_n}\dot{q}_n + \frac{\partial g}{\partial p_n}\dot{p}_n \tag{1-14}$$ If we substitute for $q_n$ and $p_n$ their values given by $(1\text{-}7)$ and $(1\text{-}8)$, we find that $(1\text{-}14)$ is just $$\dot{g} = [g, H] + u_m[g, \phi_m]\tag{1-15}$$ The equations of motion are thus all written concisely in the Poisson bracket formalism.

We can write them in a still more concise formalism if we extend the notion of Poisson bracket somewhat. As I have defined Poisson brackets, they have a meaning only for quantities $f$ and $g$ which can be expressed in terms of the $q$'s and the $p$'s. Something more general, such as a general velocity variable which is not expressible in terms of the $q$'s and the $p$'s, does not have a Poisson bracket with another quantity. Let us extend the meaning of Poisson brackets and suppose that they exist for any two quantities and that they satisfy the laws (1-10), (1-11), (1-12), and (1-13), but are otherwise undetermined when the quantities are not functions of the $q$'s and the $p$'s.

Then we may write (1-15) as $$\dot{g} = [g, H + u_m\phi_m]. \tag{1-16}$$ Here you see the coefficients $u$ occurring in one of the members of a Poisson bracket. The coefficients $u_m$ are not functions of the $q$'s and the $p$'s, so that we cannot use the definition (1-9) for determining the Poisson bracket in (1-16). However, we can proceed to work out this Poisson bracket using the laws (1-10), (1-11), (1-12) and (1-13). Using the summation law (1-11) we have: $$[g, H + u_m\phi_m] = [g, H] + [g, u_m\phi_m]. \tag{1-17}$$ and using the product law (1-12) $$[g, u_m\phi_m] = [g, u_m]\phi_m + u_m[g, \phi_m]. \tag{1-18}$$ The last bracket in (1-18) is well-defined, for $g$ and $\phi_m$, are both functions of the $q$'s and the $p$'s. The Poisson bracket $[g, u_m]$ is not defined, but it is multiplied by something that vanishes, $\phi_m$. So the first term on the right of (1-18) vanishes. The result is that $$[g, H + u_m\phi_m] = [g, H] + u_m[g, \phi_m]. \tag{1-19}$$ making (1-16) agree with (1-15

There is something that we have to be careful about in working with the Poisson bracket formalism: We have the constraints (1-4), but must not use one of these constraints before working out a Poisson bracket. If we did, we would get a wrong result. So we take it as a rule that Poisson brackets must all be worked out before we make use of the constraint equations. To remind us of this rule in the formalism, I write the constraints (1-4) as equations with a different equality sign $\approx$ from the usual. Thus they are written $$\phi_m \approx 0 \tag{1-20}$$ I call such equations weak equations, to distinguish them from the usual or strong equations.

One can make use of (1-20) only after one has worked out all the Poisson brackets which one is interested in. Subject to this rule, the Poisson bracket (1-19) is quite definite, and we have the possibility of writing our equations of motion (1-16) in a very concise form: $$\dot{g} \approx [g, H_T] \tag{1-21}$$ with a Hamiltonian I call the total Hamiltonian, $$H_T = H + u_m \phi_m \tag{1-22}$$ Now let us examine the consequences of these equations of motion. In the first place, there will be some consistency conditions. We have the quantities $\phi$ which have to be zero throughout all time. We can apply the equation of motion (1-21) or (1-15) taking $g$ to be one of the $\phi$'s. We know that $\dot{g}$ must be zero for consistency, and so we get some consistency conditions. Let us see what they are like. Putting $g = \phi_m$ and $\dot{g} = 0$ in (1-15), we have: $$[\phi_m, H] + u_{m'}[\phi_m, \phi_{m'}] \approx 0 \tag{1-23}$$ We have here a number of consistency conditions, one for each value of $m$. We must examine these conditions to see what they lead to. lt is possible for them to lead directly to an inconsistency. They might lead to the inconsistency $1 = 0$. If that happens, it would mean that our original Lagrangian is such that the Lagrangian equations of motion are inconsistent. One can easily construct an example with just one degree of freedom. If we take $L = q$ then the Lagrangian equation of motion (1-2) gives immediately $1 = O$. So you see, we cannot take the Lagrangian to be completely arbitrary. We must impose on it the condition that the Lagrangian equations of motion do not involve an inconsistency. With this restriction the equations (1-23) can be divided into three kinds.

One kind of equation reduces to $0 = 0$, i.e. it is identically satisfied, with the help of the primary constraints.

Another kind of equation reduces to an equation independent of the $u$'s, thus involving only the $q$'s and the $p$'s. Such an equation must be independent of the primary constraints, otherwise it is of the first kind. Thus it is of the form $$\chi(q, p) = 0 \tag{1-24}$$ Finally, an equation in (1-23) may not reduce in either of these ways; it then imposes a condition on the $u$'s.

The first kind we do not have to bother about any more. Each equation of the second kind means that we have another constraint on the Hamiltonian variables. Constraints which turn up in this way are called secondary constraints. They differ from the primary constraints in that the primary constraints are consequences merely of the equations (1-3) that defined the momentum variables, while for the secondary constraints, one has to make use of the Lagrangian equations of motion as well.

If we have a secondary constraint turning up in our theory, then we get yet another consistency condition, because we can work out $\dot{\chi}$ according to the equation of motion (1-15) and we require that $\dot{\chi} = 0$ . So we get another equation $$[\chi, H] + u_m[\chi,\phi_m] \approx 0 \tag{1-25}$$ This equation has to be treated on the same footing as (1-23). One must again see which of the three kinds it is. If it is of the second kind, then we have to push the process one stage further because we have a further secondary constraint. We carry on like that until we have exhausted all the consistency conditions, and the final result will be that we are left with a number of secondary constraints of the type (1-24) together with a number of conditions on the coefficients $u$ of the type (1-23).

The secondary constraints will for many purposes be treated on the same footing as the primary constraints. It is convenient to use the notation for them: $$\phi_k \approx 0,\ k = M + 1, \cdots, M+K \tag{1-26}$$ where $K$ is the total number of secondary constraints. They ought to be written as weak equations in the same way as primary constraints, as they are also equations which one must not make use of before one works out Poisson brackets. So all the constraints together may be written as $$\phi_j \approx 0,\ j = M + 1, \cdots, M+K \equiv \mathcal{J} \tag{1-27}$$ Let us now go over to the remaining equations of the third kind. We have to see what conditions they impose on the coefficients $u$. These equations are $$[\phi_j, H] + u_m[\phi_j, \phi_m] \approx 0 \tag{1-28}$$ where $m$ is summed from 1 to $M$ and $j$ takes on any of the values from 1 to $\mathcal{J}$. We have these equations involving conditions on the coefficients u, insofar as they do not reduce merely to the constraint equations.

Let us look at these equations from the following point of view. Let us suppose that the $u$'s are unknowns and that we have in (1-28) a number of non-homogeneous linear equations in these unknowns $u$, with coefficients which are functions of the $q$'s and the $p$'s. Let us look for a solution of these equations, which gives us the $u$'s as functions of the $q$'s and the $p$'s, say $$u_m = U_m(q, p) \tag{1-29}$$ There must exist a solution of this type, because if there were none it would mean that the Lagrangian equations of motion are inconsistent, and we are excluding that case.

The solution is not unique. If we have one solution, we may add to it any solution $V_m(q, p)$ of the homogeneous equations associated with (1-28): $$V_m(\phi_j, \phi_m)=0 \tag{1-30}$$ and that will give us another solution of the inhomogeneous equations (1-28). We want the most general solution of (1-28) and that means that we must consider all the independent solutions of (1-30), which we may denote by $V_{am}(q, p), a = 1,\cdots, A$. The general solution of (1-28) is then $$u_m = U_m + v_aV_{am} \tag{1-31}$$ in terms of coefficients $v_a$ which can be arbitrary.

Let us substitute these expressions for $u$ into the total Hamiltonian of the theory (1-22). That will give us the total Hamiltonian $$H_T = H + U_m\phi_m + v_aV_{am}\phi_m \tag{1-32}$$ We can write this as $$H_T= H' + v_a\phi_s \tag{1-33}$$ where $$H' = H + U_m\phi_m \tag{1-33'}$$ and $$\phi_a = V_{am}\phi_{m} \tag{1-34}$$ In terms of this total Hamiltonian (1-33) we still have the equations of motion (1-21).

As a result of carrying out this analysis, we have satisfied all the consistency requirements of the theory and we still have arbitrary coefficients $v$. The number of the coefficients $v$ will usually be less than the number of coefficients $u$. The $u$'s are not arbitrary but have to satisfy consistency conditions, while the $v$'s are arbitrary coefficients. We may take the $v$'s to be arbitrary functions of the time and we have still satisfied all the requirements of our dynamical theory.

This provides a difference of the generalized Hamiltonian formalism from what one is familiar with in elementary dynamics. We have arbitrary functions of the time occurring in the general solution of the equations of motion with given initial conditions. These arbitrary conditions of the time must mean that we are using a mathematical framework containing arbitrary features, for example, a coordinate system which we can choose in some arbitrary way, or the gauge in electrodynamics as a result of this arbitrariness in the mathematical framework, the dynamical variables at future times are not completely determined by the initial dynamical variables, and this shows itself up through arbitrary functions appearing in the general solution.

We require some terminology which will enable one to appreciate the relationships between the quantities which occur in the formalism. I find the following terminology useful. I define any dynamical variable, $R$, a function of the $q$'s and the $p$'s, to be first-class if it has zero Poisson brackets with all the $\phi$'s: $$[R, \phi_j] \approx 0,\ j=1,\cdots, \mathcal{J} \tag{1-35}$$ It is sufficient if these conditions hold weakly. Otherwise $R$ is second-class. If $R$ is first-class, then $[R, \phi_j]$ has to be strongly equal to some linear function of the $\phi$'s, as anything that is weakly zero in the present theory is strongly equal to some linear function of the $\phi$'s. The $\phi$'s are, by definition, the only independent quantities which are weakly zero. So we have the strong equations $$[R, \phi_j] = r_{jj'}\phi_{j'} \tag{1-36}$$ Before going on, I would like to prove a

Theorem: the Poisson bracket of two first-class quantities is also first-class. Proof. Let $R, S$ be first-class: then in addition to (1-36), we mve $$[S, \phi_j] = s_{jj'}\phi_{j'} \tag{1-36'}$$ Let us form $[[R, S], \phi_j]$. We can work out this Poisson bracket using Jacobi's identity (1-13) $$[[R, S], \phi_j] = [[R, \phi_j], S] - [[S, \phi_j], R] \approx 0$$ by (1-36), (1-36'), the product law (1-12), and (1-20). The whole thing vanishes weakly. We have proved therefore that [R, S] is first-class.

We have altogether four different kinds of constraints. We can divide constraints into first-class and secondclass, which is quite independent of the division into primary and secondary.

I would like you to notice that $H'$ given by (1-33') and the $\phi_a$ given by (1-34) are first-class. Forming the Poisson bracket of $\phi_a$ with $\phi_j$, we get, by (1-34), $U_{am}[\phi_a, \phi_j]$ plus terms that vanish weakly. Since the $U_{am}$ are deï¬ned to satisfy (1-30), $\phi_a$ is first-class. Thus (1-28) with $U_m$ for $u_m$ shows that $H'$ is first-class. Thus (1-33) gives the total Hamiltonian in terms of a first-class Hamiltonian $H'$ together with some first-class $\phi$'s.

Any linear combination of the $\phi$'s is of course another constraint, and if we take a linear combination of the primary constraints we get another primary constraint. So each $\phi_a$ is a primary constraint; and it is first-class. So the final situation is that we have the total Hamiltonian expressed as the sum of a first-class Hamiltonian plus a linear combination of the primary, first-class constraints.

The number of independent arbitrary functions of the time: occurring in the general solution of the equations of motion is equal to the number of values which the suffix $a$ takes on. That is equal to the number of independent primary first-class constraints, because all the independent primary first-class constraints are included in the sum (1-33).

That gives you then the general situation. We have deduced it by just starting from the Lagrangian equations of motion, passing to the Hamiltonian and working out consistency conditions.

From the practical point of view one can tell from the general transformation properties of the action integral what arbitrary functions of the time will occur in the general solution of the equations of motion. To each of these functions of the time there must correspond some primary first-class constraint. So we can tell which primary first-class constraints we are going to have without going through all the detailed calculation of working out Poisson brackets; in practical applications of this theory we can obviously save a lot of work by using that method.

I would like to go on a bit more and develop one further point of the theory. Let us try to get a physical understanding of the situation where we start with given initial variables and get a solution of the equations of motion containing arbitrary functions. The initial variables which we need are the $q$'s and the $p$'s. We don't need to be given initial values for the coefficients $v$. These initial conditions describe what physicists would call the initial physical state of the system. The physical state is determined only by the $q$'s and the $p$'s and not by the coefficients $v$.

Now the initial state must determine the state at later times. But the $q$'s and the $p$'s at later times are not uniquely determined by the initial state because we have the arbitrary functions $v$ coming in. That means that the state does not uniquely determine a set of $q$'s and $p$'s, even though a set of $q$'s and $p$'s uniquely determines a state. There must be several choices of $q$'s and $p$'s which correspond to the same state. So we have the problem of looking for all the sets of $q$'s and $p$'s that correspond to one particular physical state.

All those values for the $q$'s and $p$'s at a certain time which can evolve from one initial state must correspond to the same physical state at that time. Let us take particular initial values for the $q$'s and the $p$'s at time $t = 0$, and consider what the $q$'s and the $p$'s are after a short time interval $\delta t$. For a general dynamical variable $g$, with initial value $g_0$, its value at time $\delta t$ is $$ \begin{eqnarray} g(\delta t) &=& g_0 + \dot{g}\delta t\\ &=& g_0 + [g, H_T]\delta t \\ &=& g_0 + \delta t \{ [g, H'] + v_a[g, \phi_a] \} \end{eqnarray} \tag{1-37} $$ The coefficients $v$ are completely arbitrary and at our disposal. Suppose we take different values, $v'$, for these coefficients. That would give a different $g(\delta t)$, the difference being $$\Delta g(\delta t) = \delta t(v_a - v'_a)[g, \phi_a] \tag{1-38}$$ We may write this as $$\Delta g(\delta t) = \epsilon_a[g, \phi_a] \tag{1-39}$$ where $$\epsilon_a = \delta t(v_a - v'_a) \tag{1-40}$$

is a small arbitrary number, small because of the coefficients $\delta t$ and arbitrary because the $v$'s and the $v'$'s are arbitrary. We can change all our Hamiltonian variables in accordance with the rule (1-39) and the new Hamiltonian variables will describe the same state. This change in the Hamiltonian variables consists in applying all infinitesimal contact transformation with a generating function $\epsilon_a\phi_a$. We come to the conclusion that the $\phi_a$'s, which appeared in the theory in the first place as the primary first-class constraints, have this meaning: as generating functions of infinitesimal contact transformations, they lead to changes in the $q$'s and the $p$'s that do not affect the physical state.

However, that is not the end ofthe story. We can go on further in the same direction. Suppose we apply two of these contact transformations in succession. Apply first a contact transformation with generating function $\epsilon_a\phi_a$ and then apply a second contact transformation with generating function $\gamma_{a'}\phi_{a'}$ where the $\gamma$'s are some new small coefficients. We get finally $$g' = g_0 + \epsilon_a[g, \phi_a] + \gamma_{a'}[g + \epsilon_a[g, \phi_a], \phi_{a'}] \tag{1-41}$$ (I retain the second order terms involving products $\epsilon\gamma$, but I neglect the second order terms involving $\epsilon^2$ or involving $\gamma^2$. This is legitimate and sufficient. I do that because I do not want to write down more than I really need for getting the desired result.) If we apply the two transformations in succession in the reverse order, We get finally $$g'' = g_0 + \gamma_{a'}[g, \phi_a] + \epsilon_a[g + \gamma_{a'}[g, \phi_{a'}], \phi_{a}] \tag{1-42}$$ Now let us subtract these two. The difference is $$\Delta g = \epsilon_{a}\gamma_{a'}\{ [[g, \phi_a], \phi_{a'}] - [[g, \phi_{a'}], \phi_{a}] \} \tag{1-43}$$ By Jacobi's identity (1-13) this reduces to $$\Delta g = \epsilon_{a}\gamma_{a'}[g, [\phi_a, \phi_{a'}]] \tag{1-44}$$ This $\Delta g$ must also correspond to a change in the $q$'s and the $p$'s which does not involve any change in the physical state, because it is made up by processes which individually don't involve any change in the physical state. Thus we see that we can use $$[\phi_a, \phi_{a'}] \tag{1-45}$$ as a generating function of an infinitesimal contact transformation and it will still cause no change in the physical state.

Now the $\phi_a$ are first-class: their Poisson brackets are weakly zero, and therefore strongly equal to some linear function of the $\phi$'s. This linear function of the $\phi$'s must be first-class because of the theorem I proved a little while back, that the Poisson bracket of two first-class quantities is first-class. So we see that the transformations which we get this way, corresponding to no change in the physical state, are transformations for which the generanng function is a first-class constraint. The only way these transformations are more general than the ones we had before is that the generating functions which we had before are restricted to be first-class primary constraints. Those that we get now could be first-class secondary constraints. The result of this calculation is to show that we might have a first-class secondary constraint as a generating function of an infinitesimal contact translormation which leads to a change in the $q$'s and the $p$'s without changing the state.

For the sake of completeness, there is a little bit of further work one ought to do which shows that a Poisson bracket $[H', \phi_a]$ of the first-class Hamiltonian $H'$ with a first-class $\phi$ is again a linear function of first-class constraints. This can also be shown to be a possible generator for infinitesimal contact transformations which do not change the state.

The final result is that those transformations of the dynamical variables which do not change physical states are infinitesimal contact transformations in which the generating function is a primary first-class constraint or possibly a secondary first-class constraint. A good many of the secondary first-class constraints do turn up by the process (1-45) or as $[H', \phi_a]$ . I think it may be that all the first-class secondary constraints should be included among the transformations which don't change the physical state, but I haven't been able to prove it. Also, I haven't found any example for which there exist first-class secondary constraints which do generate a change in the physical state.

2012/06/06

Information Channels

Forward Probabilitis：已知输入是$a_i$，输出是$b_j$的概率可写成矩阵(这个概率和源的分布无关): $$P_{ij}=P(b=b_j|a=a_i)$$ 如果往channel里一端输入$a_i$，另一端一定会输出一个，所以$\sum_j P_{ij}=1$，输入的信号源有$\sum_ip_i=1$。

Backword Probabilitis：已知输出是$b_j$那么输出$a_i$的概率为(这个概率和输出的分布无关)： $$Q_{ij}=P(a=a_i | b=b_j)$$ Joint Probabilitis：输出是$b_j$，输入$a_i$的概率为： $$R_{ij}=P(a=a_i, b=b_j)$$ 关系： $$p_iP_{ij}=P(a_i)P(b_j|a_i)=P(a_i, b_j)=P(b_j)P(a_i|b_j)=q_jQ_{ij}$$

2012/06/04

Bit Wizardry

1) little endian和big endian可以用int32*往char*类型转换的来区分，int32*指向一个word，当cast到char*的时候如果得的是最不重要的byte，那么是little endian，如果得的是最重要的byte那么是big endian。

得最不重要的byte可以看成是$\% 256$运算，得最重要的byte可以看成是除以$2^{24}$。

2) 对2的幂取模相当于bit-and：int32 a = b % 32 // == b & (32-1)，也就是只留下后面的部分。

除一个编译期的常数，编译器会把除法优化成取模和乘法的组合。

3) XOR是有且只有一个不为0，如果两个整数$x, y$相加会溢出，现在要求平均值，可以用位运算实现。两个二进制数相加，可以分成两部分

一部分是有进位的bits，一部分是没有进位的bits，因为进位最多为1，所以也可以上分成进位和进位后当前bit剩下的值。识别的办法是，x & y 为进位(1的时候是有进位，0的时候是没有进位)，x ^ y为除去进位的效果后当前bit剩下的值(也就是mod 2)。

因为求平均值除以2是可以结合的，所以可以分别作右移，因为加法里进位本身是做左移动，现在识别出来后，左右移动抵消x & y就不用动了，把x ^ y右移，然后相加的平均值。(连续进位之和两个数的相对位置有关，所以连续进位被延迟到移位后的加法里)。

4) XOR满足结合律，所以
对任何的t有(a ^ t) ^ t = a ^ ( t ^ t ) = a ^ 0 = a
如果令a ^ t = b，那么 b ^ t = a
a ^ t = b是我们引入的方程，可以把t解出来，两边用a XOR得 t = a ^ b

用语言描述，如果已知a以及a和一个数XOR的结果b，那么把他俩XOR就可以得到未知的那个数。& 和 | 没有这个性质，比如$2^{32} - 1$和任何数做|运算都得到$2^{32} - 1$，知道$2^{32} - 1$和结果不可能把这个数恢复出来。

5) 求x是整数，得x的下一个偶数 (x | 1)+1，得前一个偶数 ( x - 1 )&~1

6) 测试x的第i位，x & ( 1UL << i )，返回$x[i]*2^{i-1}$

7) 设置第i位， x | ( 1UL << i ) ，清除第i位 x & ~( 1UL << i )，取反第i位 x ^ ( 1UL << i )

8)
x = a & b $\Longleftrightarrow$ (a * b) mod 2
x = a ^ b $\Longleftrightarrow$ (a + b) mod 2
x = a ^ b $\Longleftrightarrow$ if( b == 1 ){ x = ~a } else { x = a } // 对1反射，穿过0
x = a ^ b $\Longleftrightarrow$ if( a != b ){ x = 1 } else { x = 0 } // 判断是不是不相等

9)
a, b // 初始知道a, b
a, a^b // 抛弃b，留a^b
b, a^b // 抛弃a，留b
b, a // 抛弃 a^b，留a

信息没有丢失，一直维护两个数，但是位置交换了。

10) x &= -x; // isolate lowest bit

11)
N: 4 8 16 32 64 128 256 512 1024
m: 5 11 19 37 67 131 269 523 1061

上面的表给出了$N$位表达里面对m取摸的一个性质，$f(i) = 2^i \mod m$是一一映射，所以可以构造一个look up table，快速通过对m的mod得最低bit的index。

12) 整数的$x$的$\lfloor \log_2 x \rfloor$用bit操作就很自然：

13) 判断一个数是不是2的幂，如果是这个数减去1得m，m的二进制表达前i位全是1，和原数XOR，右移和m相等。

14) 得比x大且离得最近的2的幂，要得的数是$2^i$其中$i$是x的最高bit的index+1，方法是用x的最高bit，构造出$2^i-1$然后加1