CSAPP Chapter 2 Excerpt
Information Storage
Addressing and Byte Ordering
Suppose the variable x of type int and at address 0x100 has a hexadecimal value of 0x01234567. The ordering of the bytes within the address range 0x100 through 0x103 depends on the type of machine:
Big endian
0x100 0x101 0x102 0x103
+++++++
      
 ...  01  23  45  67  ... 
      
+++++++
Little endian
0x100 0x101 0x102 0x103
+++++++
      
 ...  67  45  23  01  ... 
      
+++++++
Most Intelcompatible machines operate exclusively in littleendian mode. On the other hand, most machines from IBM and Oracle (arising from their acquisition of Sun Microsystems in 2010) operate in bigendian mode.
Many recent microprocessor chips are biendian, meaning that they can be configured to operate as either little or bigendian machines.
In practice, byte ordering becomes fixed once a particular operating system is chosen. For example, ARM microprocessors, used in many cell phones, have hardware that can operate in either little or bigendian mode, but the two most common operating systems for these chips—Android (from Google) and IOS (from Apple)—operate only in littleendian mode.
Boolean Algebra
The simplest Boolean algebra is defined over the twoelement set {0, 1}.
Operations of Boolean algebra
~ & 0 1  0 1 ^ 0 1
   
0 1 0 0 0 0 0 1 0 0 1
1 0 1 0 1 1 1 1 1 1 0
Logical Operations in C
C also provides a set of logical operators , &&, and !, which correspond to the or
, and
, and not
operations of logic. These can easily be confused with the bitlevel operations, but their behavior is quite different. The logical operations treat any nonzero argument as representing true
and argument 0 as representing false
. They return either 1 or 0, indicating a result of either true or false, respectively.
Expression Result

!0x41 0x00
!0x00 0x01
!!0x41 0x01
0x69 && 0x55 0x01
0x69  0x55 0x01
Shift Operations in C
C also provides a set of shift operations for shifting bit patterns to the left and to the right.
Operation Value 1 Value 2

Argument x [01100011] [10010101]
x << 4 [00110000] [01010000]
x >> 4 (logical) [00000110] [00001001]
x >> 4 (arithmetic) [00000110] [11111001]
In practice, almost all compiler/machine combinations use arithmetic right shifts for signed data, and many programmers assume this to be the case. For unsigned data, on the other hand, right shifts must be logical.
In contrast to C, Java has a precise definition of how right shifts should be performed. The expression x >> k
shifts x arithmetically by k positions, while x >>> k
shifts it logically.
Shift amounts should be kept less than the word size.
Integer Representations
Unsigned Encodings
Definition of binary to unsigned function
For vector $ x = [x_{w1},x_{w2},…,x_{0}] $:
$$ B2U_{w}(\vec{x}) \doteq \sum_{i=0}^{w1} x_{i}2^{i} $$
Examples:
$$ B2U_{4}([0001]) = 0 \cdot 2^{3} + 0 \cdot 2^{2} + 0 \cdot 2^{1} + 1 \cdot 2^{0} = 0 + 0 + 0 + 1 = 1 $$
$$ B2U_{4}([0101]) = 0 \cdot 2^{3} + 1 \cdot 2^{2} + 0 \cdot 2^{1} + 1 \cdot 2^{0} = 0 + 4 + 0 + 1 = 5 $$
$$ B2U_{4}([1011]) = 1 \cdot 2^{3} + 0 \cdot 2^{2} + 1 \cdot 2^{1} + 1 \cdot 2^{0} = 8 + 0 + 2 + 1 = 11 $$
$$ B2U_{4}([1111]) = 1 \cdot 2^{3} + 1 \cdot 2^{2} + 1 \cdot 2^{1} + 1 \cdot 2^{0} = 8 + 4 + 2 + 1 = 15 $$
Uniqueness of unsigned encoding Function $ B2U_{w} $ is a bijection.
Two’sComplement Encodings
For many applications, we wish to represent negative values as well. The most common computer representation of signed numbers is known as two’scomplement form. This is defined by interpreting the most significant bit of the word to have negative weight. We express this interpretation as a function $ B2T_{w} $ (for “binary to two’s complement” length $w$):
Definition of binary to two’s complement
For vector $ x = [x_{w1},x_{w2},…,x_{0}] $:
$$ B2T_{w}(\vec{x}) \doteq x_{w1}2^{w1}+ \sum_{i=0}^{w2} x_{i}2^{i} $$
Examples:
$$ B2T_{4}([0001]) = 0 \cdot 2^{3} + 0 \cdot 2^{2} + 0 \cdot 2^{1} + 1 \cdot 2^{0} = 0 + 0 + 0 + 1 = 1 $$
$$ B2T_{4}([0101]) = 0 \cdot 2^{3} + 1 \cdot 2^{2} + 0 \cdot 2^{1} + 1 \cdot 2^{0} = 0 + 4 + 0 + 1 = 5 $$
$$ B2T_{4}([1011]) = 1 \cdot 2^{3} + 0 \cdot 2^{2} + 1 \cdot 2^{1} + 1 \cdot 2^{0} = 8 + 0 + 2 + 1 = 5 $$
$$ B2T_{4}([1111]) = 1 \cdot 2^{3} + 1 \cdot 2^{2} + 1 \cdot 2^{1} + 1 \cdot 2^{0} = 8 + 4 + 2 + 1 = 1 $$
Uniqueness of two’scomplement encoding Function $B2T_{w}$ is a bijection.
Alternative representations of signed numbers
There are two other standard representations for signed numbers:
Ones’ complement. This is the same as two’s complement, except that the most significant bit has weight $−(2^{w−1} − 1)$ rather than $−2^{w−1}$:
$$ B2O_{w}(\vec{x}) \doteq x_{w1}(2^{w1}1)+ \sum_{i=0}^{w2} x_{i}2^{i} $$
Sign magnitude. The most significant bit is a sign bit that determines whether the remaining bits should be given negative or positive weight:
$$ B2S_{w}(\vec{x}) \doteq (1)^{x_{w1}} \cdot \sum_{i=0}^{w2} x_{i}2^{i} $$
Both of these representations have the curious property that there are two different encodings of the number 0.For both representations, $ [00 . . . 0] $ is interpreted as +0. The value −0 can be represented in signmagnitude form as $ [10 . . . 0] $ and in ones’ complement as $ [11 . . . 1] $. Although machines based on ones’complement representations were built in the past, almost all modern machines use two’s complement. We will see that signmagnitude encoding is used with floatingpoint numbers.
Note the different position of apostrophes: two’s complement versus ones’ complement. The term “two’s complement” arises from the fact that for nonnegative $ x $ we compute a $w$bit representation of $ −x $ as $2^{w − x}$ (a single two.) The term “ones’ complement” comes from the property that we can compute $ −x $ in this notation as $ [111 . . . 1] −x $ (multiple ones).
Conversions between Signed and Unsigned
Definition of Conversion from two’s complement to unsigned
For $x$ such that $\textit{TMin}_{w} \leq x \leq \textit{TMax}_{w}$ :
$$ T2U_{w}(x)= \begin{cases} x+2^{w}, & x< 0 \\ x, & x \geq 0 \end{cases} $$
Definition of Unsigned to two’scomplement conversion
For $u$ such that $0 \leq u \leq \textit{UMax}_{w}$ :
$$ U2T_{w}(u)= \begin{cases} u, & u \leq \textit{TMax}_{w} \\ u2^{w}, & u > \textit{TMax}_{w} \end{cases} $$
Integer Arithmetic
Unsigned Addition
Definition of Unsigned addition
For $x$ and $y$ such that $0 ≤ x, y < 2_{w}$ :
$$ x+_{w}^{u}y= \begin{cases} x+y, & x + y < 2^{w} & \textrm{Normal} \\ x+y2^{w}, & 2_{w} \leq x + y \geq 2^{w+1} & \textrm{Overflow} \end{cases} $$
Unsigned negation
Definition of Unsigned negation
For any number $x$ such that $0 ≤ x < 2^{w}$ , its $w$bit unsigned negation $^{u}_{w} x$ is given by the following:
$$ ^{u}_{w} x= \begin{cases} x, & x = 0 \\ 2^{w}x, & x > 0 \end{cases} $$
Two’sComplement Addition
Definition of Two’sComplement Addition
For integer values $x$ and $y$ in the range $−2^{w−1} ≤ x, y ≤ 2^{w1}−1$:
$$ x+_{w}^{t}y= \begin{cases} x+y2^{w}, & 2^{w1} \leq x + y & \textrm{Positive overflow} \\ x+y, & 2^{w1} \leq x + y \geq 2^{w1} & \textrm{Normal} \\ x+y+2^{w}, & x + y < 2^{w1} & \textrm{Negative overflow} \end{cases} $$
Two’sComplement Negation
Definition of Two’sComplement Negation
For $x$ in the range $\textit{TMin}_{w} ≤ x ≤ \textit{TMax}_{w} $ , its two’scomplement negation $_{w}^{t} x$ is given by the formula:
$$ _{w}^{t} x= \begin{cases} \textit{TMin}, & x = \textit{TMin} \\ x, & x > \textit{TMin} \end{cases} $$
Unsigned Multiplication
Definition of Unsigned multiplication
For $x$ and $y$ such that $0 ≤ x, y ≤ \textit{UMax}_w$ :
$$ x *_{w}^{u} y = (x \cdot y) \textrm{ mod } 2^{w} $$
Two’sComplement Multiplication
Definition of Two’sComplement Multiplication
For $x$ and $y$ such that $ \textit{TMin}_w ≤ x, y ≤ \textit{TMax}_w $ :
$$ x *_{w}^{t} y = U2T_{w}((x \cdot y) \textrm{ mod } 2^{w}) $$
Floating Point
Fractional Binary Numbers
Decimal notation uses a representation of the form:
$$ d_{m} d_{m−1} . . . d_{1} d_{0} . d_{−1} d_{−2} . . . d_{−n} $$
Digits to the left of the binary point have weights of the form $2^{i}$ , while those to the right have weights of the form $1/2^{i}$.
+ 2^{m}

 + 2^{m1}
 
  + 4
  
  ...  + 2
   
    + 1
    
b_{m} b_{m1} ... b_{2} b_{1} b_{0} . b_{1} b_{2} b_{3} ... b_{n+1} b_{n}
    
1/2 +    
   
1/4 +  ...  
  
1/8 +  
 
 
1/2^{n+1} + 


1/2^{n} +
For $b_{i}$ ranges between 0 and 1, $b$ can be defined as:
$$ b=\sum_{i=n}^{m}2^{i} \times b_{i} $$
IEEE FloatingPoint Representation
Positional notation such as considered in the previous section would not be efficient for representing very large numbers. For example, the representation of 5 $\times$ 2100 would consist of the bit pattern 101 followed by 100 zeros. Instead, we would like to represent numbers in a form $x \times 2^{y}$ by giving the values of $x$ and $y$.
The IEEE floatingpoint standard represents a number in a form $V = (−1)^{s} \times M \times 2^{E}$ :
 The sign s determines whether the number is negative ($s = 1$) or positive ($s = 1$), where the interpretation of the sign bit for numeric value 0 is handled as a special case.
 The significand M is a fractional binary number that ranges either between 1 and 2 − $\epsilon$ or between 0 and 1 − $\epsilon$.
 The exponent E weights the value by a (possibly negative) power of 2.
Single precision (32bit)
31 30 23 22 0
++++
 s  exp  frac 
++++
Double precision (64bit)
63 62 52 51 32
++++
 s  exp  frac(51:32) 
++++
31 0
++
 frac (31:0) 
++
The bit representation of a floatingpoint number is divided into three fields to encode these values:
 The single sign bit
s
directly encodes the sign s.  The kbit exponent field
exp
$= e_{k−1} . . . e_{1}e_{0} $ encodes the exponent E.  The nbit fraction field
frac
$= f_{n−1} . . . f_{1}f_{0} $ encodes the significand M, but the value encoded also depends on whether or not the exponent field equals 0.
In the singleprecision floatingpoint format (a float in C), fields s
, exp
, and frac
are 1, k = 8, and n = 23 bits each, yielding a 32bit representation. In the doubleprecision floatingpoint format (a double in C), fields s
, exp
, and frac
are 1, k = 11, and n = 52 bits each, yielding a 64bit representation.
Rounding
Floatingpoint arithmetic can only approximate real arithmetic, since the representation has limited range and precision. Thus, for a value $x$, we generally want a systematic method of finding the “closest” matching value $x^{’}$ that can be represented in the desired floatingpoint format. This is the task of the rounding operation. One key problem is to define the direction to round a value that is halfway between two possibilities. For example, if I have $1.50 and want to round it to the nearest dollar, should the result be $1 or $2? An alternative approach is to maintain a lower and an upper bound on the actual number. For example, we could determine representable values $x^{−}$ and $x^{+}$ such that the value x is guaranteed to lie between them: $x^{} \leq x \geq x^{+}$.The IEEE floatingpoint format defines four different rounding modes. The default method finds a closest match, while the other three can be used for computing upper and lower bounds.
Mode $1.40 $1.60 $1.50 $2.50 $–1.50
     
Roundtoeven $1 $2 $2 $2 $2
Roundtowardzero $1 $1 $1 $2 $1
Rounddown $1 $1 $1 $2 $2
Roundup $1 $2 $2 $3 $1
Roundtoeven (also called roundtonearest) is the default mode.It attempts to find a closest match. Thus, it rounds $1.40 to $1 and $1.60 to $2, since these are the closest whole dollar values. The only design decision is to determine the effect of rounding values that are halfway between two possible results. Roundtoeven mode adopts the convention that it rounds the number either upward or downward such that the least significant digit of the result is even. Thus, it rounds both $1.50 and $2.50 to $2.
The other three modes produce guaranteed bounds on the actual value. These can be useful in some numerical applications. Roundtowardzero mode rounds positive numbers downward and negative numbers upward, giving a value $\hat{x}$ such that $\hat{x} \geq x $. Rounddown mode rounds both positive and negative numbers downward, giving a value $x^{}$ such that $x^{} \geq x $. Roundup mode rounds both positive and negative numbers upward, giving a value $x^{+}$ such that $x \geq x^{+} $.
Roundtoeven at first seems like it has a rather arbitrary goal—why is there any reason to prefer even numbers? Why not consistently round values halfway between two representable values upward? The problem with such a convention is that one can easily imagine scenarios in which rounding a set of data values would then introduce a statistical bias into the computation of an average of the values. The average of a set of numbers that we rounded by this means would be slightly higher than the average of the numbers themselves. Conversely, if we always rounded numbers halfway between downward, the average of a set of rounded numbers would be slightly lower than the average of the numbers themselves. Rounding toward even numbers avoids this statistical bias in most reallife situations. It will round upward about 50% of the time and round downward about 50% of the time.
Roundtoeven rounding can be applied even when we are not rounding to a whole number. We simply consider whether the least significant digit is even or odd. For example, suppose we want to round decimal numbers to the nearest hundredth. We would round 1.2349999 to 1.23 and 1.2350001 to 1.24, regardless of rounding mode, since they are not halfway between 1.23 and 1.24. On the other hand, we would round both 1.2350000 and 1.2450000 to 1.24, since 4 is even.
Similarly, roundtoeven rounding can be applied to binary fractional numbers. We consider least significant bit value 0 to be even and 1 to be odd. In general, the rounding mode is only significant when we have a bit pattern of the form XX . . . X.Y Y . . . Y 100 . . ., where X and Y denote arbitrary bit values with the rightmost Y being the position to which we wish to round. Only bit patterns of this form denote values that are halfway between two possible results.