Profiling in Haskell (SDL game series part 3)

In this post, I’ll show how I profiled the SDL project. The code shown is a continuation of the efforts in part one and part two. The starting point will be this version of the project: strat-0.0.1.tar.gz.

With version 0.0.1 from part two, I noticed that whenever I did a right mouse button drag to scroll the map, the redraw would always lag behind my mouse movement. Since I’m new to Haskell, I don’t yet have an instinctual feel for what code actually costs me at run time. I figured now is as good of a time as any to take a peek. I’m not interested in getting it as fast as possible, but I feel uncomfortable proceeding without at least checking out some numbers.

First, I would like to reference this blog post by Nathan Sanders. That page, and the profiling chapter in Real World Haskell, got me up and running. For the official version, check out GHC’s user guide on profiling.

The first step in profiling is rebuilding everything with profiling support if it has not been done already. You can execute cabal commands to do this on a package-by-package basis:

cabal install SDL -p --reinstall

The ‘-p’ flag tells cabal to build the library with profiling support.

Doing this for everything on my system quickly got annoying. In the end, I deleted my ~/.ghc and ~/.cabal folder and started from scratch. After the initial ‘cabal update’, edit the ~/.cabal/config file and set library-profiling to True.

Once the dependencies were reinstalled with profiling enabled, I executed a new cabal configure command to enable profiling for the SDL game:

cabal clean
cabal configure --enable-executable-profiling
cabal build

When GHC generates code that can be profiled the binaries come out thicker. Unstripped, my non-profile version of strat is 2.4 MB. The profiled version is 4.0 MB.

To generate a profiling report you must pass a command line flag to the run-time system of the executable. Here’s the command to run from the source directory of the project:

./dist/build/strat/strat +RTS -p

This will start up the SDL game. Move the mouse around a bit and exit. Once the program stops, you’ll notice a file called strat.prof. Here’s what was generated for me with the current settings:

    Tue Apr 13 19:27 2010 Time and Allocation Profiling Report  (Final)

       strat +RTS -p -RTS

    total time  =        2.28 secs   (114 ticks @ 20 ms)
    total alloc = 772,177,168 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

MAIN                           MAIN                 100.0  100.0


                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN                                                   1           0 100.0  100.0   100.0  100.0
 CAF                     Main                                                 348          13   0.0    0.0     0.0    0.0
 CAF                     GHC.Float                                            325           7   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Handle.FD                                     286           2   0.0    0.0     0.0    0.0
 CAF                     System.Posix.Internals                               285           1   0.0    0.0     0.0    0.0
 CAF                     GHC.Conc                                             269           1   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Handle.Internals                              260           1   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Encoding.Iconv                                254           5   0.0    0.0     0.0    0.0
 CAF                     UIState                                              251           1   0.0    0.0     0.0    0.0
 CAF                     Graphics.UI.SDL.General                              227           1   0.0    0.0     0.0    0.0
 CAF                     Graphics.UI.SDL.Events                               223           1   0.0    0.0     0.0    0.0
 CAF                     Graphics.UI.SDL.Video                                215           3   0.0    0.0     0.0    0.0
 CAF                     System.Random                                        210           1   0.0    0.0     0.0    0.0
 CAF                     Data.Time.Clock.POSIX                                203           2   0.0    0.0     0.0    0.0
 CAF                     Data.Time.Clock.CTimeval                             202           1   0.0    0.0     0.0    0.0

Not very helpful is it? It basically only profiled main and some CAFs (Constant Applicative Function – fancy name for functions without arguments).

What we need to do is pass some more build flags to tell GHC to profile more than just that.

cabal build --ghc-options="-caf-all -auto-all -fforce-recomp"

The ‘-caf-all’ option tells GHC to profile all top-level CAFs separately instead of as a group (the default behavior, as shown above). The ‘-auto-all’ option tells GHC to profile all top-level functions. Lastly, the ‘-fforce-recomp’ option forces the recompilation of the files, thereby making sure we have the profiling code we want. With the new executable, try this:

./dist/build/strat/strat +RTS -p

Which generates this report:

    Tue Apr 13 19:39 2010 Time and Allocation Profiling Report  (Final)

       strat +RTS -p -RTS

    total time  =        3.42 secs   (171 ticks @ 20 ms)
    total alloc = 1,303,852,488 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

drawTile                       Main                  72.5   35.8
redrawScreen                   Main                  22.2   57.9
getHexmapOffset                Main                   2.3    4.8
tileInViewPort                 Main                   1.8    0.0
makeRandomMap                  Main                   0.6    1.3


                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN                                                   1           0   0.0    0.0   100.0  100.0
 main                    Main                                                 432         383   0.6    0.0   100.0  100.0
  vpY                    UIState                                              465         320   0.0    0.0     0.0    0.0
  vpX                    UIState                                              464         320   0.0    0.0     0.0    0.0
  uiViewPort             UIState                                              463         320   0.0    0.0     0.0    0.0
  uiMouseButtonsDown     UIState                                              462           3   0.0    0.0     0.0    0.0
  redrawScreen           Main                                                 448         393  22.2   57.9    98.8   98.5
   drawTile              Main                                                 449     3930000  72.5   35.8    76.6   40.6
    ==_aNo               UIState                                              458       75680   0.0    0.0     0.0    0.0
    tileInViewPort       Main                                                 452     3930000   1.8    0.0     1.8    0.0
    getHexmapOffset      Main                                                 450           0   2.3    4.8     2.3    4.8
  makeRandomMap          Main                                                 445         102   0.6    1.3     0.6    1.5
   getRandomTerrain      Main                                                 446         100   0.0    0.3     0.0    0.3
    toEnum_aOi           UIState                                              461         155   0.0    0.0     0.0    0.0
  loadArt                Main                                                 436           1   0.0    0.0     0.0    0.0
 CAF:artFilePaths        Main                                                 426           1   0.0    0.0     0.0    0.0
  artFilePaths           Main                                                 437           1   0.0    0.0     0.0    0.0
 CAF:main18              Main                                                 425           1   0.0    0.0     0.0    0.0
  artFilePaths           Main                                                 439           0   0.0    0.0     0.0    0.0
 CAF:main19              Main                                                 424           1   0.0    0.0     0.0    0.0
  artFilePaths           Main                                                 441           0   0.0    0.0     0.0    0.0
 CAF:main20              Main                                                 423           1   0.0    0.0     0.0    0.0
  artFilePaths           Main                                                 443           0   0.0    0.0     0.0    0.0
 CAF:tileWidth_r3OL      Main                                                 422           1   0.0    0.0     0.0    0.0
  tileWidth              Main                                                 451           1   0.0    0.0     0.0    0.0
 CAF:tileHeight_r3ON     Main                                                 421           1   0.0    0.0     0.0    0.0
  tileHeight             Main                                                 454           1   0.0    0.0     0.0    0.0
 CAF:windowWidth         Main                                                 420           1   0.0    0.0     0.0    0.0
  windowWidth            Main                                                 453           1   0.0    0.0     0.0    0.0
 CAF:windowHeight        Main                                                 419           1   0.0    0.0     0.0    0.0
  windowHeight           Main                                                 455           1   0.0    0.0     0.0    0.0
 CAF:mapColumns          Main                                                 418           1   0.0    0.0     0.0    0.0
  mapColumns             Main                                                 447           1   0.0    0.0     0.0    0.0
 CAF:main24              Main                                                 417           1   0.0    0.0     0.0    0.0
  artFilePaths           Main                                                 438           0   0.0    0.0     0.0    0.0
 CAF:main23              Main                                                 416           1   0.0    0.0     0.0    0.0
  artFilePaths           Main                                                 440           0   0.0    0.0     0.0    0.0
 CAF:main22              Main                                                 415           1   0.0    0.0     0.0    0.0
  artFilePaths           Main                                                 442           0   0.0    0.0     0.0    0.0
 CAF:main21              Main                                                 414           1   0.0    0.0     0.0    0.0
  artFilePaths           Main                                                 444           0   0.0    0.0     0.0    0.0
 CAF:main27              Main                                                 413           1   0.0    0.0     0.0    0.0
  main                   Main                                                 434           0   0.0    0.0     0.0    0.0
 CAF:main26              Main                                                 412           1   0.0    0.0     0.0    0.0
  main                   Main                                                 435           0   0.0    0.0     0.0    0.0
 CAF:main4               Main                                                 411           1   0.0    0.0     0.0    0.0
  main                   Main                                                 466           0   0.0    0.0     0.0    0.0
 CAF:main29              Main                                                 410           1   0.0    0.0     0.0    0.0
 CAF                     GHC.Float                                            387           7   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Handle.FD                                     348           2   0.0    0.0     0.0    0.0
 CAF                     System.Posix.Internals                               347           1   0.0    0.0     0.0    0.0
 CAF                     GHC.Conc                                             331           1   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Handle.Internals                              322           1   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Encoding.Iconv                                316           5   0.0    0.0     0.0    0.0
 CAF:terrainMaxBound     UIState                                              307           1   0.0    0.0     0.0    0.0
  terrainMaxBound        UIState                                              459           1   0.0    0.0     0.0    0.0
   fromEnum_aOn          UIState                                              460           1   0.0    0.0     0.0    0.0
 CAF:terrainTypes        UIState                                              306           1   0.0    0.0     0.0    0.0
  terrainTypes           UIState                                              456           1   0.0    0.0     0.0    0.0
   enumFrom_aOx          UIState                                              457           1   0.0    0.0     0.0    0.0
 CAF                     Graphics.UI.SDL.General                              266           1   0.0    0.0     0.0    0.0
 CAF                     Graphics.UI.SDL.Events                               262           1   0.0    0.0     0.0    0.0
 CAF                     Graphics.UI.SDL.Video                                254           3   0.0    0.0     0.0    0.0
 CAF                     System.Random                                        249           1   0.0    0.0     0.0    0.0
 CAF                     Data.Time.Clock.POSIX                                242           2   0.0    0.0     0.0    0.0
 CAF                     Data.Time.Clock.CTimeval                             241           1   0.0    0.0     0.0    0.0

Much better! Near the top of the report there is a short list of the most expensive functions. We can see that drawTile takes about 72% of execution time and redrawScreen takes about 22%. The next few functions after that barely cost anything by comparison.

If we wanted to see some memory profiling data on the garbage collector we can execute the binary with these options:

./dist/build/strat/strat +RTS -sstderr

Which outputs this to stderr:

dist/build/strat/strat +RTS -sstderr
done
   2,722,681,960 bytes allocated in the heap
     644,419,088 bytes copied during GC
       4,732,608 bytes maximum residency (78 sample(s))
         149,912 bytes maximum slop
              14 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  5106 collections,     0 parallel,  0.50s,  0.52s elapsed
  Generation 1:    78 collections,     0 parallel,  0.42s,  0.42s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    4.59s  (  6.27s elapsed)
  GC    time    0.92s  (  0.94s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    5.51s  (  7.21s elapsed)

  %GC time      16.7%  (13.1% elapsed)

  Alloc rate    593,646,382 bytes per MUT second

  Productivity  83.3% of total user, 63.6% of total elapsed

There are ways to profile the heap as well. For this program, the results from that were not enlightening.

So what to do now? It does not appear that we are having major GC or memory errors. Lets change the two most expensive functions and manually place ‘call centers’ around most expressions. This way we can see the cost of almost everything in the functions.

> drawTile :: UIState -> Point -> IO ()
> drawTile (UIState vp _ mainSurf terrainSurfs tm) (x,y) = do
>      let sr = {-# SCC "dT-sr" #-} Just (SDL.Rect 0 0 tileWidth tileHeight)
>          (tX, tY) = {-# SCC "dT-gP2V" #-} gamePoint2View vp  $ {-# SCC "dT-gHO" #-} getHexmapOffset tileWidth tileHeight x y
>          dr = {-# SCC "dT-dr" #-} Just $ SDL.Rect tX tY 0 0
>          tt = DM.fromJust $ {-# SCC "dT-tmLookup" #-} DMap.lookup (x,y) tm
>          terrainSurf = DM.fromJust $ {-# SCC "dT-tsLookup" #-} lookup tt terrainSurfs
>      if {-# SCC "dT-tIVP" #-} tileInViewPort vp tileWidth tileHeight (tX,tY)
>          then do
>                {-# SCC "dT-blit" #-} SDL.blitSurface terrainSurf sr mainSurf dr
>                return ()
>          else
>               return ()
> redrawScreen ::  UIState -> IO ()
> redrawScreen ui@(UIState vp _ mainSurf terrainSurfs terrainMap) = do
>     {-# SCC "rS-fillRect" #-} SDL.fillRect mainSurf Nothing (SDL.Pixel 0)
>     {-# SCC "rS-mDrawTile" #-} mapM_ (drawTile ui) $ {-# SCC "rS-tmKeys" #-} DMap.keys terrainMap
>     {-# SCC "rS-flip" #-} SDL.flip mainSurf
>     return ()

The “{-# SCC “LABEL” #-} set a cost center for the expression that follows it. With this, rebuild the project and run the binary with profiling enabled.

cabal build --ghc-options="-caf-all -auto-all"
dist/build/strat/strat +RTS -p

This will generate a report that looks like this (only part of the report is shown):

    Tue Apr 13 21:05 2010 Time and Allocation Profiling Report  (Final)

       strat +RTS -p -RTS

    total time  =       12.72 secs   (636 ticks @ 20 ms)
    total alloc = 4,299,436,864 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

dT-blit                        Main                  58.2    0.8
rS-tmKeys                      Main                  10.8   44.9
dT-gHO                         Main                   7.2   24.7
gamePoint2View                 Main                   6.1   24.7
rS-mDrawTile                   Main                   4.1    0.0
dT-gP2V                        Main                   3.9    0.0
drawTile                       Main                   3.0    0.2
getHexmapOffset                Main                   1.7    3.7
rS-fillRect                    Main                   1.6    0.0
tileInViewPort                 Main                   1.1    0.0


                                                       individual    inherited
COST CENTRE              MODULE       no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN           1           0   0.0    0.0   100.0  100.0
 main                    Main         452         987   0.5    0.0   100.0  100.0
  vpY                    UIState      513         443   0.0    0.0     0.0    0.0
  vpX                    UIState      512         443   0.0    0.0     0.0    0.0
  uiViewPort             UIState      511         443   0.0    0.0     0.0    0.0
  uiMouseButtonsDown     UIState      510           6   0.0    0.0     0.0    0.0
  redrawScreen           Main         472        1006   0.0    0.0    99.1   99.5
   rS-flip               Main         509        1006   0.0    0.0     0.0    0.0
   rS-mDrawTile          Main         474    10062012   4.1    0.0    97.5   99.5
    drawTile             Main         476    10060000   3.0    0.2    82.5   54.6
     dT-tmLookup         Main         505      109188   0.2    0.1     0.2    0.1
     dT-tsLookup         Main         501      109188   0.2    0.0     0.3    0.0
      ==_aO1             UIState      504      385235   0.2    0.0     0.2    0.0
     dT-blit             Main         488      109188  58.2    0.8    58.2    0.8
     dT-dr               Main         487      109188   0.0    0.2     0.0    0.2
     dT-tIVP             Main         482    10060000   0.8    0.0     1.9    0.0
      tileInViewPort     Main         483    10060000   1.1    0.0     1.1    0.0
     dT-gP2V             Main         477    10060000   3.9    0.0    19.0   53.2
      gamePoint2View     Main         479    10060000   6.1   24.7     6.1   24.7
      dT-gHO             Main         478    10060000   7.2   24.7     9.0   28.5
       getHexmapOffset   Main         480           0   1.7    3.7     1.7    3.7
    rS-tmKeys            Main         475        1006  10.8   44.9    10.8   44.9
   rS-fillRect           Main         473        1006   1.6    0.0     1.6    0.0
  makeRandomMap          Main         469         102   0.3    0.4     0.5    0.5
   getRandomTerrain      Main         470         100   0.2    0.1     0.2    0.1
    toEnum_aOV           UIState      508         294   0.0    0.0     0.0    0.0
  loadArt                Main         456           1   0.0    0.0     0.0    0.0

Now we have a decent idea what expressions are really costing us in execution.

The call to SDL.blitSurface is by far the most costly. This is natural as this is the function doing all the actual rendering IO work behind the scenes.

What caught me off guard was the hit taken by calling Data.Map.keys. Almost half of my memory allocations are coming from this call.

Lets add a list comprehension to generate the coordinates just once, since that’s all the terrainMap’s keys were anyway.

> mapCoordinates :: [Point]
> mapCoordinates = [(x,y) | x <- [1..mapRows]
>                         , y <- [1..mapColumns]]

> redrawScreen ::  UIState -> IO ()
> redrawScreen ui@(UIState vp _ mainSurf terrainSurfs terrainMap) = do
>     {-# SCC "rS-fillRect" #-} SDL.fillRect mainSurf Nothing (SDL.Pixel 0)
>     {-# SCC "rS-mDrawTile" #-} mapM_ (drawTile ui) $ {-# SCC "rS-mapCoords" #-} mapCoordinates
>     {-# SCC "rS-flip" #-} SDL.flip mainSurf
>     return ()

A recompile and another run with ‘+RTS -p’ yeilds a profile report with these as the top call centers:

        Tue Apr 13 21:55 2010 Time and Allocation Profiling Report

           strat +RTS -p -RTS

    total time  =        8.10 secs   (405 ticks @ 20 ms)
    total alloc = 1,868,126,656 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

dT-blit                        Main                  64.7    1.4
dT-gHO                         Main                   9.1   44.8
drawTile                       Main                   4.9    0.4
gamePoint2View                 Main                   4.2   44.8
dT-gP2V                        Main                   3.7    0.0
rS-mDrawTile                   Main                   3.5    0.0
rS-fillRect                    Main                   3.0    0.0
getHexmapOffset                Main                   2.5    6.8
dT-tIVP                        Main                   2.2    0.0

A run with ‘+RTS -sstderr’ gives this output:

dist/build/strat/strat +RTS -sstderr
   2,082,792,480 bytes allocated in the heap
      33,971,320 bytes copied during GC
       4,853,488 bytes maximum residency (6 sample(s))
          95,640 bytes maximum slop
              13 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  3955 collections,     0 parallel,  0.07s,  0.06s elapsed
  Generation 1:     6 collections,     0 parallel,  0.02s,  0.03s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    5.05s  (  6.54s elapsed)
  GC    time    0.09s  (  0.09s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    5.14s  (  6.63s elapsed)

  %GC time       1.8%  (1.3% elapsed)

  Alloc rate    412,461,270 bytes per MUT second

  Productivity  98.2% of total user, 76.2% of total elapsed

That change alone eliminated a ton of GC copies and drastically reduced the amount of time spent in the GC. The program felt snappier as well.

However, the big payoff comes from realizing that I need to cut down on my blits. Our main’s inner function eventLoop looks like this:

>      eventLoop ui = do
>          redrawScreen ui
>          e <- SDL.waitEvent
>          checkEvent ui e

This means that for every event sent by SDL the screen is redrawn. Mouse movement generates lots of events! Recall how fast they printed out to console when we were writing tracing output at the beginning of part two?

A better solution is to process all of the events in the queue before redrawing the screen. After the screen is redrawn, we can block on waiting for an event. Replace the code above with this:

>      eventLoop ui = do
>          e <- SDL.pollEvent
>          checkEvent ui e
>      checkEvent ui (SDL.NoEvent) = do
>          redrawScreen ui
>          e <- SDL.waitEvent
>          checkEvent ui e

It adds another pattern match for checkEvent that will run when no events are available. This made an enormous difference in the scroll speed.

With that change, I’m happy with the performance of the game for now.

Here’s a preview of the new programmer art that will be used in the next post:

Comments are closed.