Saturday, September 24, 2011

Comparing the performance of Go on Windows and Linux.

I measured the performance of a simple Go channel benchmarking program on Windows versus Linux on the same system with as processor an Intel Core 2 Duo (T9300 2.5GHz) and 4 GB of memory.
The Windows OS is a 64 bit Windows 7 Professional; the Linux OS is a 64 bit Linux Ubuntu 11.04 (Natty).
The Linux Go release was: 6g release .r60 (9481).
The Windows Go releases are:  8g release .r60 (9684) and 6g version weekly.2011-07-07 (9153+).
I used the following program:

package main

import (
        "fmt"
        "testing"
        "runtime"
)

func main() {
        runtime.GOMAXPROCS(1) // runtime.GOMAXPROCS(2)
        fmt.Println(" sync", testing.Benchmark(BenchmarkChannelSync).String())
        fmt.Println("buffered", testing.Benchmark(BenchmarkChannelBuffered).String())
}

func BenchmarkChannelSync(b *testing.B) {
        ch := make(chan int)
        go func() {
                for i := 0; i < b.N; i++ {
                        ch <- i
                }
                close(ch)
        }()
        for _ = range ch {
        }
}

func BenchmarkChannelBuffered(b *testing.B) {
        ch := make(chan int, 128)
        go func() {
                for i := 0; i < b.N; i++ {
                        ch <- i
                }
                close(ch)
        }()
        for _ = range ch {
        }
}

These properties were compared:
1)       A synchronous channel versus a buffered channel
2)       The value of GOMAXPROCS (1 versus 2, with value 2 both cores should be used)
3)       Windows performance versus Linux
4)       On Windows: a 32 bit Go-compiled program versus a 64 bit program.

The number of measurements for each result was between 4 and 5 x 106   ;  each result gives how many nanoseconds one operation took.

Here are the results in ns/op:       


GOMAXPROCS
Synchronous
buffered
Windows (8g)
1
428
180

2
3577
3762
Windows (6g)  
1
426
179

2
3603
4000
Linux(6g)
1
16642
210

2
17625
212



We can see the following:
1)       A buffered channel performs better than a nonbuffered channel as is to be expected:
2.4x better  on Windows (8g and 6g)
                               81x  better on Linux

2)        Influence of GOMAXPROCS 1 versus 2:
On Windows (8g and 6g) this did not behave as expected: the nonbuffered channel performed 8.4x worse for GOMAXPROCS=2 versus 1, and the buffered channel even 21.5x worse. Moreover buffering with GOMAXPROCS=2 even gives a slightly worser performance than the synchronous channel, in contrast to 1)!
On Linux also the results for GOMAXPROCS=2 are almost the same than value 1 (in fact very slightly worse).  There is no improvement with increasing GOMAXPROCS.

3)        Linux versus Windows
Windows performs slightly better for buffered channels and much better (39 x) for synchronous channels; I don’t know a reason for the latter.

4)       Windows 8g versus Windows 6g:           they have the same performance.


                We can conclude for this kind of problem (filling and reading a channel):
-          Buffered channels perform better than synchronous channels (much better on Linux)
-          Increasing GOMAXPROCS is not useful here; the task being divided over the 2 cores creates an immense overhead.
-          Windows performs on par with Linux, and even much better for synchronous channels.
-          Windows 8g performs the same as Windows 6g.