Wednesday, 20 August 2008

The Ruby Programming Language - Part 3

Two days back i started reading The Ruby Programming Language By David Flanagan, Yukihiro Matsumoto

Today i have read last 4 chapters (7-8-9-10) from the book and I wanted to share few important quotations i found from the those chapters.

1) Ruby is an object-oriented language in a very pure sense: every value in Ruby is (or at least behaves like) an object. Every object is an instance of a class. A class defines a set of methods that an object responds to. Classes may extend or subclass other classes, and inherit or override the methods of their superclass. Classes can also include—or inherit methods from—modules.

2) In contrast to the strict encapsulation of object state, Ruby's classes are very open. Any Ruby program can add methods to existing classes, and it is even possible to add "singleton methods" to individual objects.

3) The new method of the class object creates a new instance object, and then it automatically invokes the initialize method on that instance. Whatever arguments you passed to new are passed on to initialize.

4) Finally, a caution for programmers who are used to Java and related languages. In statically typed languages, you must declare your variables, including instance variables. You know that Ruby variables don't need to be declared, but you might still feel that you have to write something like this:

# Incorrect code!
class Point
@x = 0 # Create instance variable @x and assign a default. WRONG!
@y = 0 # Create instance variable @y and assign a default. WRONG!

def initialize(x,y)
@x, @y = x, y # Now initialize previously created @x and @y.
end
end


This code does not do at all what a Java programmer expects. Instance variables are always resolved in the context of self. When the initialize method is invoked, self holds an instance of the Point class. But the code outside of that method is executed as part of the definition of the Point class. When those first two assignments are executed, self refers to the Point class itself, not to an instance of the class. The @x and @y variables inside the initialize method are completely different from those outside it

5) Defining a to_s Method

Just about any class you define should have a to_s instance method to return a string representation of the object. This ability proves invaluable when debugging. Here's how we might do this for Point:

class Point
def initialize(x,y)
@x, @y = x, y
end

def to_s # Return a String that represents this point
"(#@x,#@y)" # Just interpolate the instance variables into a string
end

end

p=Point.new(1,2)

puts p

6) Accessors and Attributes

Our Point class uses two instance variables. As we've noted, however, the value of these variables are only accessible to other instance methods. If we want users of the Point class to be able to use the X and Y coordinates of a point, we've got to provide accessor methods that return the value of the variables:

Example:

def x
@x
end

def y
@y
end

p=Point.new(1,2)
q=Point.new(p.x*2,p.y*3)
puts q

If we wanted our Point class to be mutable, we would also add setter methods to set the value of the instance variables:

7) Using Setters Inside a Class

Once you've defined a setter method like x= for your class, you might be tempted to use it within other instance methods of your class. That is, instead of writing @x=2, you might write x=2, intending to invoke x=(2) implicitly on self. It doesn't work, of course; x=2 simply creates a new local variable.

This is a not-uncommon mistake for novices who are just learning about setter methods and assignment in Ruby. The rule is that assignment expressions will only invoke a setter method when invoked through an object. If you want to use a setter from within the class that defines it, invoke it explicitly through self. For example: self.x=2.

8) This combination of instance variable with trivial getter and setter methods is so common that Ruby provides a way to automate it. The attr_reader and attr_accessor methods are defined by the Module class. All classes are modules, (the Class class is a subclass of Module) so you can invoke these method inside any class definition. Both methods take any number of symbols naming attributes. attr_reader creates trivial getter methods for the instance variables with the same name. attr_accessor creates getter and setter methods. Thus, if we were defining a mutable Point class, we could write:

class Point
attr_accessor :x, :y # Define accessor methods for our instance variables
end

And if we were defining an immutable version of the class, we'd write:

class Point
attr_reader :x, :y # Define reader methods for our instance variables
end

9) The attr, attr_reader, and attr_accessor methods create instance methods for us. This is an example of metaprogramming, and the ability to do it is a powerful feature of Ruby.

10) Enumerating Coordinates

If a Point object can behave like an array with two elements, then perhaps we ought to be able to iterate through those elements as we can with a true array. Here is a definition of the each iterator for our Point class. Because a Point always has exactly two elements, our iterator doesn't have to loop; it can simply call yield twice:

# This iterator passes the X coordinate to the associated block, and then
# passes the Y coordinate, and then returns. It allows us to enumerate
# a point as if it were an array with two elements. This each method is
# required by the Enumerable module.
def each
yield @x
yield @y
end

With this iterator defined, we can write code like this:

p = Point.new(1,2)
p.each {|x| print x } # Prints "12"

11) Point Equality

As our class is currently defined, two distinct Point instances are never equal to each other, even if their X and Y coordinates are the same. To remedy this, we must provide an implementation of the == operator.

12) Duck Typing and Equality

The + operator we defined earlier did no type checking at all: it works with any argument object with x and y methods that return numbers. This == method is implemented differently; instead of allowing duck typing, it requires that the argument is a Point. This is an implementation choice. The implementation of == above chooses to define equality so that an object cannot be equal to a Point unless it is itself a Point.

Implementations may be stricter or more liberal than this. The implementation above uses the is_a? predicate to test the class of the argument. This allows an instance of a subclass of Point to be equal to a Point. A stricter implementation would use instance_of? to disallow subclass instances. Similarly, the implementation above uses == to compare the X and Y coordinates. For numbers, the == operator allows type conversion, which means that the point (1,1) is equal to (1.0,1.0). This is probably as it should be, but a stricter definition of equality could use eql? to compare the coordinates.

A more liberal definition of equality would support duck typing. Some caution is required, however. Our == method should not raise a NoMethodError if the argument object does not have x and y methods. Instead, it should simply return false:

def ==(o) # Is self == o?
@x == o.x && @y == o.y # Assume o has proper x and y methods
rescue # If that assumption fails
false # Then self != o
end

13) There are two reasons we might want eql? to be different from ==. First, some classes define eql? to perform a stricter comparison than ==. In Numeric and its subclasses, for example, == allows type conversion and eql? does not. If we believe that the users of our Point class might want to be able to compare instances in two different ways, then we might follow this example. Because points are just two numbers, it would make sense to follow the example set by Numeric here. Our eql? method would look much like the == method, but it would use eql? to compare point coordinates instead of ==:

def eql?(o)
if o.instance_of? Point
@x.eql?(o.x) && @y.eql?(o.y)
elsif
false
end
end

14) Implementing optimal hash methods can be very tricky. Fortunately, there is a simple way to compute perfectly adequate hashcodes for just about any class: simply combine the hashcodes of all the objects referenced by your class. (More precisely: combine the hashcodes of all the objects compared by your eql? method.) The trick is to combine the hashcodes in the proper way. The following hash method is not a good one:

def hash
code = 17
code = 37*code + @x.hash
code = 37*code + @y.hash
# Add lines like this for each significant instance variable
code # Return the resulting code
end


15) To define this ordering for any object, we need only define the <=> operator and include the Comparable module.Doing this mixes in implementations of the equality and relational operators that are based on our implementation of the general <=> operator. The <=> operator should compare self to the object it is passed. If self is less than that object (closer to the origin, in this case), it should return –1. If the two objects are equal, it should return 0. And if self is greater than the argument object, the method should return 1. (The method should return nil if the argument object and self are of incomparable types.

include Comparable # Mix in methods from the Comparable module.

# Define an ordering for points based on their distance from the origin.
# This method is required by the Comparable module.
def <=>(other)
return nil unless other.instance_of? Point
@x + @y <=> other.x + other.y
end

16) When defining a mutator method, we normally only add an exclamation mark to the name if there is a nonmutating version of the same method. In this case, the name add! makes sense if we also define an add method that returns a new object, rather than altering its receiver. A nonmutating version of a mutator method is often written simply by creating a copy of self and invoking the mutator on the copied object:

def add(p) # A nonmutating version of add!
q = self.dup # Make a copy of self
q.add!(p) # Invoke the mutating method on the copy
end

17) A Class Method

Instead of invoking an instance method of one point and passing another point to that method, let's write a method named sum that takes any number of Point objects, adds them together, and returns a new Point. This method is not an instance method invoked on a Point object. Rather, it is a class method, invoked through the Point class itself. We might invoke the sum method like this:

class Point
attr_reader :x, :y # Define accessor methods for our instance variables

def Point.sum(*points) # Return the sum of an arbitrary number of points
x = y = 0
points.each {|p| x += p.x; y += p.y }
Point.new(x,y)
end

# ...the rest of class omitted here...
end

This definition of the class method names the class explicitly, and mirrors the syntax used to invoke the method. Class methods can also be defined using self instead of the class name. Thus, this method could also be written like this:

def self.sum(*points) # Return the sum of an arbitrary number of points
x = y = 0
points.each {|p| x += p.x; y += p.y }
Point.new(x,y)
end

Using self instead of Point makes the code slightly less clear, but it's an application of the DRY (Don't Repeat Yourself) principle. If you use self instead of the class name, you can change the name of a class without having to edit the definition of its class methods.

There is yet another technique for defining class methods. Though it is less clear than the previously shown technique, it can be handy when defining multiple class methods, and you are likely to see it used in existing code:

This technique can also be used inside the class definition, where we can use self instead of repeating the class name:

class Point
# Instance methods go here

class << self
# Class methods go here
end
end

18) Constants

Many classes can benefit from the definition of some associated constants. Here are some constants that might be useful for our Point class:

class Point
def initialize(x,y) # Initialize method
@x,@y = x, y
end

ORIGIN = Point.new(0,0)
UNIT_X = Point.new(1,0)
UNIT_Y = Point.new(0,1)

# Rest of class definition goes here
end

Inside the class definition, these constants can be referred to by their unqualified names. Outside the definition, they must be prefixed by the name of the class, of course:

Point::UNIT_X + Point::UNIT_Y # => (1,1)

Note that because our constants in this example refer to instances of the class, we cannot define the constants until after we've defined the initialize method of the class. Also, keep in mind that it is perfectly legal to define constants in the Point class from outside the class:

Point::NEGATIVE_UNIT_X = Point.new(-1,0)

19) Class variables are visible to, and shared by, the class methods and the instance methods of a class, and also by the class definition itself. Like instance variables, class variables are encapsulated; they can be used by the implementation of a class, but they are not visible to the users of a class. Class variables have names that begin with @@.

20) Class Instance Variables

Classes are objects and can have instance variables just as other objects can. The instance variables of a class—often called class instance variables—are not the same as class variables. But they are similar enough that they can often be used instead of class variables.

An instance variable used inside a class definition but outside an instance method definition is a class instance variable. Like class variables, class instance variables are associated with the class rather than with any particular instance of the class. A disadvantage of class instance variables is that they cannot be used within instance methods as class variables can. Another disadvantage is the potential for confusing them with ordinary instance variables. Without the distinctive punctuation prefixes, it may be more difficult to remember whether a variable is associated with instances or with the class object.

class Point
# Initialize our class instance variables in the class definition itself
@n = 0 # How many points have been created
@totalX = 0 # The sum of all X coordinates
@totalY = 0 # The sum of all Y coordinates

def initialize(x,y) # Initialize method
@x,@y = x, y # Sets initial values for instance variables
end

def self.new(x,y) # Class method to create new Point objects
# Use the class instance variables in this class method to collect data
@n += 1 # Keep track of how many Points have been created
@totalX += x # Add these coordinates to the totals
@totalY += y

super # Invoke the real definition of new to create a Point
# More about super later in the chapter
end

# A class method to report the data we collected
def self.report
# Here we use the class instance variables in a class method
puts "Number of points created: #@n"
puts "Average X coordinate: #{@totalX.to_f/@n}"
puts "Average Y coordinate: #{@totalY.to_f/@n}"
end
end

21) Method visibility is declared with three methods named public, private, and protected. These are instance methods of the Module class. All classes are modules, and inside a class definition (but outside method definitions), self refers to class being defined. Thus, public, private, and protected may be used bare as if they were keywords of the language. In fact, however, they are method invocations on self. There are two ways to invoke these methods. With no arguments, they specify that all subsequent method definitions will have the specified visibility. A class might use them like this:

class Point
# public methods go here

# The following methods are protected
protected

# protected methods go here

# The following methods are private
private

# private methods go here
end


22) The syntax for extending a class is simple. Just add a < character and the name of the superclass to your class statement. For example:

class Point3D < Point # Define class Point3D as a subclass of Point
end

23) One of the important things to understand about object-oriented programming and subclassing is that when methods are invoked, they are looked up dynamically so that the appropriate definition or redefinition of the method is found. That is, method invocations are not bound statically at the time they are parsed, but rather, are looked up at the time they are executed. Here is an example to demonstrate this important point:

# Greet the World
class WorldGreeter
def greet # Display a greeting
puts "#{greeting} #{who}"
end

def greeting # What greeting to use
"Hello"
end

def who # Who to greet
"World"
end
end

# Greet the world in Spanish
class SpanishWorldGreeter < WorldGreeter
def greeting # Override the greeting
"Hola"
end
end

# We call a method defined in WorldGreeter, which calls the overridden
# version of greeting in SpanishWorldGreeter, and prints "Hola World"
SpanishWorldGreeter.new.greet

24) Notice that it is also perfectly reasonable to define an abstract class that invokes certain undefined "abstract" methods, which are left for subclasses to define. The opposite of abstract is concrete. A class that extends an abstract class is concrete if it defines all of the abstract methods of its ancestors. For example:

# This class is abstract; it doesn't define greeting or who
# No special syntax is required: any class that invokes methods that are
# intended for a subclass to implement is abstract.
class AbstractGreeter
def greet
puts "#{greeting} #{who}"
end
end

# A concrete subclass
class WorldGreeter < AbstractGreeter
def greeting; "Hello"; end
def who; "World"; end
end

WorldGreeter.new.greet # Displays "Hello World"

25) Augmenting Behavior by Chaining
Sometimes when we override a method, we don't want to replace it altogether, we just want to augment its behavior by adding some new code. In order to do this, we need a way to invoke the overridden method from the overriding method. This is known as chaining, and it is accomplished with the keyword super.

super works like a special method invocation: it invokes a method with the same name as the current one, in the superclass of the current class. (Note that the superclass need not define that method itself—it can inherit it from one of its ancestors.) You may specify arguments for super just as you would for a normal method invocation. One common and important place for method chaining is the initialize method of a class. Here is how we might write the initialize method of our Point3D class:

Code View:
class Point3D < Point
def initialize(x,y,z)
# Pass our first two arguments along to the superclass initialize method
super(x,y)
# And deal with the third argument ourself
@z = z;
end
end

26) Inheritance of Constants
Constants are inherited and can be overridden, much like instance methods can. There is, however, a very important difference between the inheritance of methods and the inheritance of constants.

Our Point3D class can use the ORIGIN constant defined by its Point superclass, for example. Although the clearest style is to qualify constants with their defining class, Point3D could also refer to this constant with an unqualified ORIGIN or even as Point3D::ORIGIN.

Where inheritance of constants becomes interesting is when a class like Point3D redefines a constant. A three-dimensional point class probably wants a constant named ORIGIN to refer to a three-dimensional point, so Point3D is likely to include a line like this:

ORIGIN = Point3D.new(0,0,0)

As you know, Ruby issues a warning when a constant is redefined. In this case, however, this is a newly created constant. We now have two constants Point::ORIGIN and Point3D::ORIGIN.

The important difference between constants and methods is that constants are looked up in the lexical scope of the place they are used before they are looked up in the inheritance hierarchy. This means that if Point3D inherits methods that use the constant ORIGIN, the behavior of those inherited methods will not change when Point3D defines its own version of ORIGIN.

27) Inheritance and Class Variables
Class variables are shared by a class and all of its subclasses. If a class A defines a variable @@a, then subclass B can use that variable. Although this may appear, superficially, to be inheritance, is it actually something different.

The difference becomes clear when we think about setting the value of a class variable. If a subclass assigns a value to a class variable already in use by a superclass, it does not create its own private copy of the class variable, but instead alters the value seen by the superclass. It also alters the shared value seen by all other subclasses of the superclass. Ruby 1.8 prints a warning about this if you run it with -w. Ruby 1.9 does not issue this warning.

If a class uses class variables, then any subclass can alter the behavior of the class and all its descendants by changing the value of the shared class variable. This is a strong argument for the use of class instance variables instead of class variables.

The following code demonstrates the sharing of class variables. It outputs 123:

class A
@@value = 1 # A class variable
def A.value; @@value; end # An accessor method for it
end
print A.value # Display value of A's class variable
class B < A; @@value = 2; end # Subclass alters shared class variable
print A.value # Superclass sees altered value
class C < A; @@value = 3; end # Another alters shared variable again
print B.value # 1st subclass sees value from 2nd subclass

28) dup, clone, and initialize_copy
Another way that new objects come into existence is as a result of the dup and clone methods. These methods allocate a new instance of the class of the object on which they are invoked. They then copy all the instance variables and the taintedness of the receiver object to the newly allocated object. clone takes this copying a step further than dup—it also copies singleton methods of the receiver object and freezes the copy object if the original is frozen.

If a class defines a method named initialize_copy, then clone and dup will invoke that method on the copied object after copying the instance variables from the original. (clone calls initialize_copy before freezing the copy object, so that initialize_copy is still allowed to modify it.) The initialize_copy method is passed the original object as an argument and has the opportunity to make any changes it desires to the copied object. It cannot create its own copy object, however; the return value of initialize_copy is ignored. Like initialize, Ruby ensures that initialize_copy is always private.

When clone and dup copy instance variables from the original object to the copy, they copy references to the values of those variables; they do not copy the actual values. In other words, these methods perform a shallow copy. And this is one reason that many classes might want to alter the behavior of these methods. Here is code that defines an initialize_copy method to do a deeper copy of internal state:

class Point # A point in n-space
def initialize(*coords) # Accept an arbitrary # of coordinates
@coords = coords # Store the coordinates in an array
end

def initialize_copy(orig) # If someone copies this Point object
@coords = @coords.dup # Make a copy of the coordinates array, too
end
end

The class shown here stores its internal state in an array. Without an initialize_copy method, if an object were copied using clone or dup, the copied object would refer to the same array of state that the original object did. Mutations performed on the copy would affect the state of the original. As this is not the behavior we want, we must define initialize_copy to create a copy of the array as well

29) The Singleton Pattern

A singleton is a class that has only a single instance. Singletons can be used to store global program state within an object-oriented framework and can be useful alternatives to class methods and class variables.

Properly implementing a singleton requires a number of the tricks shown earlier. The new and allocate methods must be made private, dup and clone must be prevented from making copies, and so on. Fortunately, the Singleton module in the standard library does this work for us; just require 'singleton' and then include Singleton into your class. This defines a class method named instance, which takes no arguments and returns the single instance of the class. Define an initialize method to perform initialization of the single instance of the class. Note, however, that no arguments will be passed to this method.

require 'singleton' # Singleton module is not built-in

class PointStats # Define a class
include Singleton # Make it a singleton

def initialize # A normal initialization method
@n, @totalX, @totalY = 0, 0.0, 0.0
end

def record(point) # Record a new point
@n += 1
@totalX += point.x
@totalY += point.y
end

def report # Report point statistics
puts "Number of points created: #@n"
puts "Average X coordinate: #{@totalX/@n}"
puts "Average Y coordinate: #{@totalY/@n}"
end
end



With a class like this in place, we might write the initialize method for our Point class like this:

def initialize(x,y)
@x,@y = x,y
PointStats.instance.record(self)
end



The Singleton module automatically creates the instance class method for us, and we invoke the regular instance method record on that singleton instance. Similarly, when we want to query the point statistics, we write:

PointStats.instance.report

30) Ruby programs may be broken up into multiple files, and the most natural way to partition a program is to place each nontrivial class or module into a separate file. These separate files can then be reassembled into a single program (and, if well-designed, can be reused by other programs) using require or load. These are global functions defined in Kernel, but are used like language keywords. The same require method is also used for loading files from the standard library.

31) Types, Classes, and Modules
The most commonly used reflective methods are those for determining the type of an object—what class it is an instance of and what methods it responds to. We introduced most of these important methods early in this book in Section 3.8.4. To review:



o.class

Returns the class of an object o.



c.superclass

Returns the superclass of a class c.



o.instance_of? c

Determines whether the object o.class == c.



o.is_a? c

Determines whether o is an instance of c, or of any of its subclasses. If c is a module, this method tests whether o.class (or any of its ancestors) includes the module.



o.kind_of? c

kind_of? is a synonym for is_a?.



c === o

For any class or module c, determines if o.is_a?(c).



o.respond_to? name

Determines whether the object o has a public or protected method with the specified name. Passes true as the second argument to check private methods as well.

32) Listing and Testing For Methods
Object defines methods for listing the names of methods defined on the object. These methods return arrays of methods names. Those name are strings in Ruby 1.8 and symbols in Ruby 1.9:

o = "a string"
o.methods # => [ names of all public methods ]
o.public_methods # => the same thing
o.public_methods(false) # Exclude inherited methods
o.protected_methods # => []: there aren't any
o.private_methods # => array of all private methods
o.private_methods(false) # Exclude inherited private methods
def o.single; 1; end # Define a singleton method
o.singleton_methods # => ["single"] (or [:single] in 1.9)


About the Authors

David Flanagan is a computer programmer who spends most of his time writing about JavaScript and Java. His books with O'Reilly include Java in a Nutshell, Java Examples in a Nutshell, Java Foundation Classes in a Nutshell, JavaScript: The Definitive Guide, and JavaScript Pocket Reference. David has a degree in computer science and engineering from the Massachusetts Institute of Technology. He lives with his wife and children in the U.S. Pacific Northwest bewteen the cities of Seattle, Washington and Vancouver, British Columbia. David has a blog at www.davidflanagan.com.

Yukihiro Matsumoto ("Matz"), the creator of Ruby, is a professional programmer who worked for the Japanese open source company, netlab.jp. Matz is also known as one of the open source evangelists in Japan. He's released several open source products, including cmail, the emacs-based mail user agent, written entirely in emacs lisp. Ruby is his first piece of software that has become known outside of Japan.

2 comments:

Anonymous said...

Prashant,

I don't think that you've made it clear in your posts that you're posting verbatim excerpts from my book rather than summarizing important points in your own words. You're welcome to quote what you think are the important parts of my book, but please make it clear (in this and your previous posts) that they are quotations.

Thanks,

David Flanagan

JP said...

David,

Thanks a lot for reading my blog.

Surely i will follow your suggestion and infact changed as per your guidance.

BTW,i am enjoying ruby now very well.

Thanks
Prashant