Friday, December 16, 2005

MiniRails

Lots of people are using web technologies in their desktop applications, and it is especially common to find a windows application which embeds WebBrowser control to render great GUI elements. There are some bumps, for example, you can not do disk/socket IO with javascript(even java applet), which most applications have to reply on.

To overcome this, we can either use ActiveX control(I am using it myself and am satisfied with it), or launch a web server on the background. The second has an additional advantage: the server and client are decoupled, so in the future you can easily migrate to a pure web environment without touching too much code. While this approach looks attractive, however, we have to face a reality: most web application servers can be too big for most desktop applications to distribute. Note I am taking web application servers here, not simple web servers. It is easy to find a small footprint web server or even write one by yourself, but your development style will have to go back to the 80s (CGI in C). Even though Tomcat Embedded 5.5.12 can be downloaded as a 3.1 MB zip file, your customers may not have JVM installed in the first place. Cassini suffers the same problem, Microsoft has not pushed hard enough to make .NET universal.

How about the Ruby on Rails? It turns out to be a great choice. While a complete distribution takes lots of disk space, it is easy to customize so that we only ship what we need.

Here is how I build my minimum ruby distribution that rail can run on(use windows as example):

1. Have ruby installed to C:\ruby. Later we need to copy files from it.

2. Download rails stand-alone packages, uncompress it to C:\rails, and empty doc/ folder to save space.

3. Copy ruby.exe and msvcrt-ruby18.dll from C:\ruby\bin to C:\rails.

4. Copy the following OS independent libraries (implemented in pure ruby) from C:\ruby\lib\ruby\1.8 to C:\rails\lib\ruby\1.8:

base64.rb
benchmark.rb
cgi.rb and cgi/
date.rb
date/format.rb
delegate.rb
drb.rb and drb/
e2mmap.rb
erb.rb
English.rb
fileutils.rb
ipaddr.rb
irb.rb and irb/
kconv.rb
logger.rb
mutex_m.rb
net/http.rb
net/protocol.rb
net/smtp.rb
observer.rb
optparse.rb and optparse/
parsedate.rb
pathname.rb
pstore.rb
rational.rb
rexml.rb and rexml/
set.rb
singleton.rb
soap.rb and soap/
tempfile.rb
thread.rb
time.rb
timeout.rb
tmpdir.rb
uri.rb and uri/
webrick.rb and webrick/
xmlrpc/
xsd/
yaml.rb and yaml/


5. Copy the following OS dependent libraries from C:\ruby\lib\ruby\1.8\i386-mswin32 to C:\rails\lib\ruby\1.8\i386-mswin32:

digest.so
digest/md5.so
digest/sha1.so
fcntl.so
nkf.so
rbconfig.rb
socket.so
strscan.so
stringio.so
syck.so


6. Now we may still need a database (optional). Here is how I get sqlite3 installed:
a) Download precompiled binaries. Copy sqlite3.exe and sqlite3.dll into C:\rails folder.
b) Download SQLite-Ruby. Copy sqlite3_api.so to C:\rails\lib\ruby\site_ruby\1.8\i386-msvcrt\, copy sqlite3.rb and sqlite3/ to C:\rails\lib\ruby\site_ruby\1.8
That's it! You can start WEBrick now:

C:\rails>ruby script\server
=> Booting WEBrick...
=> Rails application started on http://0.0.0.0:3000
=> Ctrl-C to shutdown server; call with --help for options
[2005-12-16 15:23:09] INFO WEBrick 1.3.1
[2005-12-16 15:23:09] INFO ruby 1.8.2 (2004-12-25) [i386-mswin32]
[2005-12-16 15:23:09] INFO WEBrick::HTTPServer#start: pid=440 port=3000


And do whatever rails developement as you want.

This distribution takes 6.28M on disk. If you zip it, it shrinks to a 2.3M file.

Tuesday, December 13, 2005

Good experience with boost memory pool

With increasing number of source code to compile, the compiler I maintain becomes unacceptable slow. After profiling the code with AQTime, I found about 30% percent CPU is used by malloc(), and half of them are for tree node allocation(fixed size).

After testing the performance of boost memory pool, I simply replaced malloc() with boost::object_pool::malloc(). The result is great, it almost eliminated the cost of tree node allocation. The total amount of time used for malloc is down to 13%. Still plenty of room for improvement, but the performance is at least acceptable.

There is one thing to remember: boost::object_pool is designed for "allocate many objects and then destroy all of them" type of scenario, so freeing memory from the pool can be very slow. For example, at the beginning I also replaced free() with boost::object_pool::free(), which make the performance even worse: boost::simple_segregated_storage::find_prev() and boost::simple_segregated_storage::nextof() start to take over all the CPU. So I do not do free anymore and simply let ~object_pool() to release the memory. This works perfect for my compiler, but may not be good for other applications.

Monday, December 12, 2005

Scope of a local variable in ruby

In ruby, only class, module and method can introduce new scope (see Ruby FAQ). Even if ruby has "if .. end", "while ... end" constructs, unlike c/c++/java/c#, they won't create new scope.

For example:

if true
  x = "hello"
end

# we can access x outside of "if". This will print "hello".
puts x


While there might be other motivations for this design, I think it is most likely driven by a technical issue: as I wrote before, ruby's lexer need to distinguish local variable and method call(command). It is easy for it to check "module", "class" and "def" keyword to recognize a new scope. However, as for "if", "while" etc (may followed by a complex expression), it is too hard for the lexer to handle (requires serious parsing capability).

Wednesday, December 07, 2005

Compare performance of malloc and boost::pool

Andrei Gorine and Konstantin Knizhnik wrote a great article on Dr. Dobb's Journal titled Memory Management & Embedded Databases. In this article, they examine several memory-allocation strategies, explained the advantages/disadvantages based on their experience creating the eXtremeDB in-memory database.

As we know it is normal for malloc/free(new/delete) to take 10-20 percent of the CPU time of an application. Sometime it is nice to have a memory pool to handle some fixed size allocation, especially when you have Boost Pool Library for free.

I wrote a small program, which creates a small and simple data structure on heap a million times to benchmark malloc and boost::object_pool, and the result shows boost::object_pool is a clear winner. I also tried to compare with new operator, even though there is no constructor for the dummy data structure. The difference between malloc and new in this case is insignificant.

The result also shows something interesting which have been mentioned in Raymond Chen's "5 Things Every Win32 Developer Should Know" PDC 2005 talk. If boost::object_pool is used as a pointer, the performance hurts a lot, even through still way faster than malloc.

Here is the result on my laptop (Windows XP, VC 2003 release build, P4 2.4G, 512M):

alloc_with_new_array: 26 microseconds
alloc_with_malloc_array: 11 microseconds
alloc_with_new: 365120 microseconds
alloc_with_malloc: 351586 microseconds
alloc_with_boost_pool_construct: 7852 microseconds
alloc_with_boost_pool_malloc: 3371 microseconds
alloc_with_boost_pool_malloc_use_pointer: 11751 microseconds
alloc_with_boost_pool_malloc_use_reference: 3383 microseconds


Here is code:


#include <boost/pool/object_pool.hpp>

struct dummy_data
{
int a;
int b;
};

inline void alloc_with_new(size_t size)
{
dummy_data* p;
for (size_t i = 0; i < size; ++i)
{
p = new dummy_data;
delete p;
}
}

inline void alloc_with_malloc(size_t size)
{
dummy_data* p;
for (size_t i = 0; i < size; ++i)
{
p = (dummy_data*)malloc(sizeof(dummy_data));
free(p);
}
}

inline void alloc_with_boost_pool_construct(size_t size)
{
boost::object_pool<dummy_data> pool;
dummy_data* p;
for (size_t i = 0; i < size; ++i)
{
p = pool.construct();
pool.destroy(p);
}
}



inline void alloc_with_boost_pool_malloc(size_t size)
{
boost::object_pool<dummy_data> pool;
dummy_data* p;
for (size_t i = 0; i < size; ++i)
{
p = pool.malloc();
pool.free(p);
}
}

inline void alloc_with_boost_pool_malloc_use_pointer(size_t size)
{
boost::object_pool<dummy_data>* pool = new boost::object_pool<dummy_data>;
dummy_data* p;
for (size_t i = 0; i < size; ++i)
{
p = pool->malloc();
pool->free(p);
}

delete pool;
}

inline void alloc_with_boost_pool_malloc_use_reference(size_t size)
{
boost::object_pool<dummy_data> _pool;
boost::object_pool<dummy_data>& pool = _pool;
dummy_data* p;
for (size_t i = 0; i < size; ++i)
{
p = pool.malloc();
pool.free(p);
}
}

inline void alloc_with_new_array(size_t size)
{
dummy_data* p = new dummy_data[size];
delete[] p;
}

inline void alloc_with_malloc_array(size_t size)
{
dummy_data* p = (dummy_data*)malloc(sizeof(dummy_data) * size);
free(p);
}

int main (int, char**)
{
LARGE_INTEGER freq, t0, t1;
QueryPerformanceFrequency(&freq);

size_t number = 1000000;

QueryPerformanceCounter(&t0);
alloc_with_new_array(number);
QueryPerformanceCounter(&t1);
printf("alloc_with_new_array: %d microseconds\n",
((t1.QuadPart-t0.QuadPart)*1000000)/freq.QuadPart);

QueryPerformanceCounter(&t0);
alloc_with_malloc_array(number);
QueryPerformanceCounter(&t1);
printf("alloc_with_malloc_array: %d microseconds\n",
((t1.QuadPart-t0.QuadPart)*1000000)/freq.QuadPart);

QueryPerformanceCounter(&t0);
alloc_with_new(number);
QueryPerformanceCounter(&t1);
printf("alloc_with_new: %d microseconds\n",
((t1.QuadPart-t0.QuadPart)*1000000)/freq.QuadPart);

QueryPerformanceCounter(&t0);
alloc_with_malloc(number);
QueryPerformanceCounter(&t1);
printf("alloc_with_malloc: %d microseconds\n",
((t1.QuadPart-t0.QuadPart)*1000000)/freq.QuadPart);

QueryPerformanceCounter(&t0);
alloc_with_boost_pool_construct(number);
QueryPerformanceCounter(&t1);
printf("alloc_with_boost_pool_construct: %d microseconds\n",
((t1.QuadPart-t0.QuadPart)*1000000)/freq.QuadPart);

QueryPerformanceCounter(&t0);
alloc_with_boost_pool_malloc(number);
QueryPerformanceCounter(&t1);
printf("alloc_with_boost_pool_malloc: %d microseconds\n",
((t1.QuadPart-t0.QuadPart)*1000000)/freq.QuadPart);

QueryPerformanceCounter(&t0);
alloc_with_boost_pool_malloc_use_pointer(number);
QueryPerformanceCounter(&t1);
printf("alloc_with_boost_pool_malloc_use_pointer: %d microseconds\n",
((t1.QuadPart-t0.QuadPart)*1000000)/freq.QuadPart);

QueryPerformanceCounter(&t0);
alloc_with_boost_pool_malloc_use_reference(number);
QueryPerformanceCounter(&t1);
printf("alloc_with_boost_pool_malloc_use_reference: %d microseconds\n",
((t1.QuadPart-t0.QuadPart)*1000000)/freq.QuadPart);

return 0;
}

Thursday, December 01, 2005

Programming and playing chess

There are lots of things in common between leaning how to program and leaning how to play chess:

First, you need to leaning the rules
For programming, the rules are the grammar of your language of choice. You can not hit the road without knowing them, but this is the easiest part.

Nothing can replace realworld experience
That's why some gurus claim building software is a craftsmanship rather than engineering. While there are tons of tools, smart IDEs to assist you, software design masters reach their level by years of practicing.

Pattern is shortcut
Then how do we improve ourselves quicker? Lots of people know how to play chess and keep playing for years, but have very little improvement on their skills. Well, if you do not know it yet, patterns can be the shortcut. If you study how chess masters deal with common scenarios, you will improve much faster than your friends how do not.

So far, it is all about individuals. Multiplayer game is much more complicated, and so is professional software developement in corporate.