2012/05/15

Microsoft Word 97 Binary File Format

Word uses internally a PLC (PLex of Cps), but writes to files a PLCF (PLex of Cps in File)

CP (Character Position): Based logical text stream of a document.

FC( File Character position): (coordinate of the beginning of a document's text stream) + CP.

PLCF(PLex of Cps(or FCs) stored in File): relation between a certain CP(or FC) and an arbitrary data structure. It consists of an array of n+1 CPs or FCs followed by an array of n instances of a particular arbitrary data structure.
To properly interpret a PLCF stored in a Word file, the length of the stored PLCF and the length of the arbitrary data structure stored in the PLCF must be known. The length of the stored PLCF is recorded in the FIB. The lengths of the data structures stored in PLCFs within Word files are listed later in this document.

piece table: Array of CPs(plcfpcd) === A partitioning of the Word document into disjoint pieces. Array of PCDs (Piece Descriptors) === array of CPs of corresponding piece begins.

CP of the character TO FIND the piece that contains that character.
1)(CP => PCD)Index of the largest CP in the array of CPs that is less than the character CP.
2)Then reference the PCD with that index in the array of PCDs.
3)(PCD => FC of piece begin)The PCD gives the position of the beginning of the piece in the file.
4)CP + position of the piece beginning.

sprm (Single PRoperty Modifier): An instruction to modify one or more properties within one of the property defining data structures (CHP, PAP, TAP, SEP, or PIC). It consists of an operation code which identifies the field(s) to be changed, and an operand which gives the value that a particular field is changed to or else which is a parameter to a procedure which will change the field or fields. A prl (property modifiers stored in a list) is a sprm plus its operand.

A prl (property modifiers stored in a list) is a sprm plus its operand.

grpprl (group of prls): a set of sprms. to find: opcode => length of the sprm, then skip, and so on.

prm (PRoperty Modifier): A field in piece table entries, contains an index to a grpprl, If the user has made only a small change to formatting that can be expressed as a single 2 or 1-byte sprm, that sprm is stored within the prm.

FKP: count(of run or paragraph), array of FCs( boundaries between runs or paragraphs ), array of offsets within the FKP <==> array of FCs (beginning of a run)

A PLC records the association between a particular range of FCs and the PN (Page Number) of the FKP that contains the properties for that FC range in the file

RUN: A bin table (plcfbte) partitions the total extent of the Word file that contains text characters into a set of contiguous intervals marked by a fcFirst and an fcLim. The fcFirst for the nth interval would be plcfbte.rgfc[n] and the fcLim for the nth interval would be plcfbte.rgfc[n+1]. Associated with each interval is a BTE. A BTE holds a four-byte PN (page number) which identifies the FKP page in the file which contains the formatting information for that interval. A CHPX FKP further partitions an interval into runs of exception text.

No comments:

Post a Comment

Print This Post