Saturday, July 16, 2011

Gearman

Tim Uckun
http://tim.uckun.com/

enspiral
Linux + postgreSQL + Ruby + Rails + Gearman
application
"search internet every 10 minutes and predict the future"
goal achieved
1000 seasrches per minute
twitter and reddit digg facebook etc
url mentions
trendings - how fast?
likes videos etc
predict viral or not

painless parallelization
LCA conf Brisbane talk
Manager anagram
workers
massive dist fault tolerant fork mechanism
Joe Stump SimpleGeo
protocol - multiple implementations
diff languages
[scheme / CL / Clojure / FORTH / Factor?]
C and Perl serers
gearman::server CPAN

C: http://launchpad.net/gearman

Go client

Client API
commandline tools
user def SQL db functioins

MySQL etc

worker API
most common langs
usually in same pkgs as client API
cmdline tool


Why Gearman?
lotsa msg brokers
FOSS and not
unique in scope
vs. rapid rabbit?

OSS simple fast multi language
flexible app design
no SPOF

persistent Queues or not

default - jobs ponly stored in mem
various persistence opts
MySQL Drizzle
POstgreSQL
SQLite
Tokyo cabinet
...


Foreground or background
sync and async

large scale archs will work but can start off simple

how does it worek?

clients servers workers
every worker conn to every server
clients not conn to all servers
(no replication)
can get around it

use cases
scatter / gather
map-reduce
async queues
pipeline processing
Erlang says duh - but this is Ruby

scatter gather

# of tasks concurrently
speed up web apps
tasks dont need to be related
alloc dedicat resources for diff tasks
push logic down to where data exists
auto-balancing

DB query x 2 fulltext srch
location srch
...


map/reduce

similar to s/g but split one task

push logic to where data exists (map)
report aggregates or other sumery (reduce_
can be multitier
can be syns, async
aggregates/summary services

client, n tasks, can delegate to subtasks


Async queues
help to scale
not everything needs immed processing
email
tweets
log entries
notofications
insert and indexing

LCA 2011

pipeline processing

some tasks need series of xformations
chain workers to send data for next step


client -> worker -> worker -> worker -> server? DB

examples
ruby

more complex sync client
lambdas

event handlers

example - worker raises an exception
gearman server takes exceptions and raises exception in client library (?)

LAMP excel spreadsheet via COM
SQL Server without TTS / TDS?
cross-platform
ans cross-language


chnked data client

data, completion events

get state of the queue (query gearman itself)

serialized hash to see on screen
hash = Gearmam::srver.new(gearman_servers).status

Zabbix? monitoring tool

alarm on empty work queue

database UDFs

database trigger
start bkgnd jobs on db changes

PostgreQSQL, MySQL , drizzle

SELECT gmen_servers....


Opitonal ingredients

Databases
shared/dist file sys
other nwk protocols
http
email
domain specific libs
image manipulation
full-text indexing


timeouts

by default ops block forever
clients may want timeout on foregnd jobs

workers may need to periodically run oter code besides job callback

cluster-wide cron's

keepalive
over crashes
(job state is persistent?)

other jobs - may not be good thing - not done in an hour -> alarm (e.g.)



Shortcomings
clients must conn to all servers
no replication betw servers
can solve with mysql postgreSQL replication
slower than pure messaging servers
logging not all that great

steps must be taken to assure recovery of queued msgs if a server is completely destroyed
small community - development has slowed
(on which fronts?)

rapid/rabbbit enqueue

all-or-nothing logging level 2 vs 3 big step in volume - pretyt much sux

recently devt has picked up pace again
Brian Akers (MySQL and Drizzle guy)
several new versi0ons
bug - in PHP or other?

CLint ?? also working on it
OSS proj
in C
BSD license? GPL?

priority in queues? - yes but not the way you'd think

retries in X hours / submit job @ time
new feature

boost libs C++

http://gearman.org/

mailing list, docs, related projects

#gearman on irc.freenode.net

Xing using it


PHP people extension? libs

Questions?

Justin??
website lowtraffic
gearman in db server
occasional heavy lifting
queue certain size - fire up vm's til queue only has 3-4 entries in Q

startup time for VM non-critical

Tim - Linode instance 256K ? M? instance

global config file
Capistrano cap deploy

5 twitter and 6 youtube workers etc.

No comments: