Saturday, July 16, 2011


Tim Uckun

Linux + postgreSQL + Ruby + Rails + Gearman
"search internet every 10 minutes and predict the future"
goal achieved
1000 seasrches per minute
twitter and reddit digg facebook etc
url mentions
trendings - how fast?
likes videos etc
predict viral or not

painless parallelization
LCA conf Brisbane talk
Manager anagram
massive dist fault tolerant fork mechanism
Joe Stump SimpleGeo
protocol - multiple implementations
diff languages
[scheme / CL / Clojure / FORTH / Factor?]
C and Perl serers
gearman::server CPAN


Go client

Client API
commandline tools
user def SQL db functioins

MySQL etc

worker API
most common langs
usually in same pkgs as client API
cmdline tool

Why Gearman?
lotsa msg brokers
FOSS and not
unique in scope
vs. rapid rabbit?

OSS simple fast multi language
flexible app design

persistent Queues or not

default - jobs ponly stored in mem
various persistence opts
MySQL Drizzle
Tokyo cabinet

Foreground or background
sync and async

large scale archs will work but can start off simple

how does it worek?

clients servers workers
every worker conn to every server
clients not conn to all servers
(no replication)
can get around it

use cases
scatter / gather
async queues
pipeline processing
Erlang says duh - but this is Ruby

scatter gather

# of tasks concurrently
speed up web apps
tasks dont need to be related
alloc dedicat resources for diff tasks
push logic down to where data exists

DB query x 2 fulltext srch
location srch


similar to s/g but split one task

push logic to where data exists (map)
report aggregates or other sumery (reduce_
can be multitier
can be syns, async
aggregates/summary services

client, n tasks, can delegate to subtasks

Async queues
help to scale
not everything needs immed processing
log entries
insert and indexing

LCA 2011

pipeline processing

some tasks need series of xformations
chain workers to send data for next step

client -> worker -> worker -> worker -> server? DB


more complex sync client

event handlers

example - worker raises an exception
gearman server takes exceptions and raises exception in client library (?)

LAMP excel spreadsheet via COM
SQL Server without TTS / TDS?
ans cross-language

chnked data client

data, completion events

get state of the queue (query gearman itself)

serialized hash to see on screen
hash =

Zabbix? monitoring tool

alarm on empty work queue

database UDFs

database trigger
start bkgnd jobs on db changes

PostgreQSQL, MySQL , drizzle

SELECT gmen_servers....

Opitonal ingredients

shared/dist file sys
other nwk protocols
domain specific libs
image manipulation
full-text indexing


by default ops block forever
clients may want timeout on foregnd jobs

workers may need to periodically run oter code besides job callback

cluster-wide cron's

over crashes
(job state is persistent?)

other jobs - may not be good thing - not done in an hour -> alarm (e.g.)

clients must conn to all servers
no replication betw servers
can solve with mysql postgreSQL replication
slower than pure messaging servers
logging not all that great

steps must be taken to assure recovery of queued msgs if a server is completely destroyed
small community - development has slowed
(on which fronts?)

rapid/rabbbit enqueue

all-or-nothing logging level 2 vs 3 big step in volume - pretyt much sux

recently devt has picked up pace again
Brian Akers (MySQL and Drizzle guy)
several new versi0ons
bug - in PHP or other?

CLint ?? also working on it
OSS proj
in C
BSD license? GPL?

priority in queues? - yes but not the way you'd think

retries in X hours / submit job @ time
new feature

boost libs C++

mailing list, docs, related projects

#gearman on

Xing using it

PHP people extension? libs


website lowtraffic
gearman in db server
occasional heavy lifting
queue certain size - fire up vm's til queue only has 3-4 entries in Q

startup time for VM non-critical

Tim - Linode instance 256K ? M? instance

global config file
Capistrano cap deploy

5 twitter and 6 youtube workers etc.

No comments: