Tim Uckun
http://tim.uckun.com/
enspiral
Linux + postgreSQL + Ruby + Rails + Gearman
application
"search internet every 10 minutes and predict the future"
goal achieved
1000 seasrches per minute
twitter and reddit digg facebook etc
url mentions
trendings - how fast?
likes videos etc
predict viral or not
painless parallelization
LCA conf Brisbane talk
Manager anagram
workers
massive dist fault tolerant fork mechanism
Joe Stump SimpleGeo
protocol - multiple implementations
diff languages
[scheme / CL / Clojure / FORTH / Factor?]
C and Perl serers
gearman::server CPAN
C: http://launchpad.net/gearman
Go client
Client API
commandline tools
user def SQL db functioins
MySQL etc
worker API
most common langs
usually in same pkgs as client API
cmdline tool
Why Gearman?
lotsa msg brokers
FOSS and not
unique in scope
vs. rapid rabbit?
OSS simple fast multi language
flexible app design
no SPOF
persistent Queues or not
default - jobs ponly stored in mem
various persistence opts
MySQL Drizzle
POstgreSQL
SQLite
Tokyo cabinet
...
Foreground or background
sync and async
large scale archs will work but can start off simple
how does it worek?
clients servers workers
every worker conn to every server
clients not conn to all servers
(no replication)
can get around it
use cases
scatter / gather
map-reduce
async queues
pipeline processing
Erlang says duh - but this is Ruby
scatter gather
# of tasks concurrently
speed up web apps
tasks dont need to be related
alloc dedicat resources for diff tasks
push logic down to where data exists
auto-balancing
DB query x 2 fulltext srch
location srch
...
map/reduce
similar to s/g but split one task
push logic to where data exists (map)
report aggregates or other sumery (reduce_
can be multitier
can be syns, async
aggregates/summary services
client, n tasks, can delegate to subtasks
Async queues
help to scale
not everything needs immed processing
email
tweets
log entries
notofications
insert and indexing
LCA 2011
pipeline processing
some tasks need series of xformations
chain workers to send data for next step
client -> worker -> worker -> worker -> server? DB
examples
ruby
more complex sync client
lambdas
event handlers
example - worker raises an exception
gearman server takes exceptions and raises exception in client library (?)
LAMP excel spreadsheet via COM
SQL Server without TTS / TDS?
cross-platform
ans cross-language
chnked data client
data, completion events
get state of the queue (query gearman itself)
serialized hash to see on screen
hash = Gearmam::srver.new(gearman_servers).status
Zabbix? monitoring tool
alarm on empty work queue
database UDFs
database trigger
start bkgnd jobs on db changes
PostgreQSQL, MySQL , drizzle
SELECT gmen_servers....
Opitonal ingredients
Databases
shared/dist file sys
other nwk protocols
http
email
domain specific libs
image manipulation
full-text indexing
timeouts
by default ops block forever
clients may want timeout on foregnd jobs
workers may need to periodically run oter code besides job callback
cluster-wide cron's
keepalive
over crashes
(job state is persistent?)
other jobs - may not be good thing - not done in an hour -> alarm (e.g.)
Shortcomings
clients must conn to all servers
no replication betw servers
can solve with mysql postgreSQL replication
slower than pure messaging servers
logging not all that great
steps must be taken to assure recovery of queued msgs if a server is completely destroyed
small community - development has slowed
(on which fronts?)
rapid/rabbbit enqueue
all-or-nothing logging level 2 vs 3 big step in volume - pretyt much sux
recently devt has picked up pace again
Brian Akers (MySQL and Drizzle guy)
several new versi0ons
bug - in PHP or other?
CLint ?? also working on it
OSS proj
in C
BSD license? GPL?
priority in queues? - yes but not the way you'd think
retries in X hours / submit job @ time
new feature
boost libs C++
http://gearman.org/
mailing list, docs, related projects
#gearman on irc.freenode.net
Xing using it
PHP people extension? libs
Questions?
Justin??
website lowtraffic
gearman in db server
occasional heavy lifting
queue certain size - fire up vm's til queue only has 3-4 entries in Q
startup time for VM non-critical
Tim - Linode instance 256K ? M? instance
global config file
Capistrano cap deploy
5 twitter and 6 youtube workers etc.
No comments:
Post a Comment