ruby - Big task or multiple small tasks with Sidekiq -

i'm writting worker add lot's of users group. i'm wondering if it's better run big task had users, or batch 100 users or 1 one per task.

for moment here code

class adduserstogroupworker   include sidekiq::worker   sidekiq_options :queue => :group_utility    def perform(store_id, group_id, user_ids_to_add)     begin       store = store.find store_id       group = group.find group_id     rescue activerecord::recordnotfound => e       airbrake.notify e       return     end      users_to_process = store.users.where(id: user_ids_to_add)                                   .where.not(id: group.user_ids)     group.users += users_to_process      users_to_process.map(&:id).each |user_to_process_id|       updatelastupdatesforuserworker.perform_async store.id, user_to_process_id     end   end end

maybe it's better have in method :

def add_users     users_to_process = store.users.where(id: user_ids_to_add)                                   .where.not(id: group.user_ids)      users_to_process.map(&:id).each |user_to_process_id|       addusertogroupworker.perform_async group_id, user_to_process_id       updatelastupdatesforuserworker.perform_async store.id, user_to_process_id     end end

but many find request. think ?

i have sidekig pro licence if needed (for batch example).

here thoughts.

1. single sql query instead of n queries

this line: group.users += users_to_process produce n sql queries (where n users_to_process.count). assume have many-to-many connection between users , groups (with user_groups join table/model), should use mass inserting data technique:

users_to_process_ids = store.users.where(id: user_ids_to_add)                          .where.not(id: group.user_ids)                          .pluck(:id) sql_values = users_to_process_ids.map{|i| "(#{i.to_i}, #{group.id.to_i}, now(), now())"} group.connection.execute("   insert groups_users (user_id, group_id, created_at, updated_at)   values #{sql_values.join(",")} ")

yes, it's raw sql. , it's fast.

2. user pluck(:id) instead of map(&:id)

pluck quicker, because:

it select 'id' column, less data transferred db
more importantly, won't create activerecord object each raw

doing sql cheap. creating ruby objects expensive.

3. use horizontal parallelization instead of vertical parallelization

what mean here, if need sequential tasks a -> b -> c dozen of records, there 2 major ways split work:

vertical segmentation. aworker a(1), a(2), a(3); bworker b(1), etc.; cworker c(i) jobs;
horizontal segmentation. universalworker a(1)+b(1)+c(1).

use latter (horizontal) way.

it's statement experience, not theoretical point of view (where both ways feasible).

why should that?

when use vertical segmentation, errors when pass job 1 worker down another. such kind of errors. pull hair out if bump such errors, because aren't persistent , reproducible. happen , aren't. possible write code pass work down chain without errors? sure, is. it's better keep simple.
imagine server @ rest. , new jobs arrive. b , c workers waste ram, while a workers job. , a , c waste ram, while b's @ work. , on. if make horizontal segmentation, resource drain out.

applying advice specific case: starters, don't call perform_async in async task.

4. process in batches

answering original question – yes, process in batches. creating , managing async task takes resources itself, there's no need create many of them.

tl;dr in end, code this:

# model code  batch_size = 100  def add_users   users_to_process_ids = store.users.where(id: user_ids_to_add)                            .where.not(id: group.user_ids)                            .pluck(:id)   # 100,000 users performance of query should acceptable   # make in synchronous fasion   sql_values = users_to_process_ids.map{|i| "(#{i.to_i}, #{group.id.to_i}, now(), now())"}   group.connection.execute("     insert groups_users (user_id, group_id, created_at, updated_at)     values #{sql_values.join(",")}   ")    users_to_process_ids.each_slice(batch_size) |batch|     addusertogroupworker.perform_async group_id, batch   end end  # add_user_to_group_worker.rb  def perform(group_id, user_ids_to_add)   group = group.find group_id    # heavy load batch whole   # ...   # ...   # if nothing here left, call updatelastupdatesforuserworker model instead    user_ids_to_add.each |id|     # synchronously – parallelized job     # splitting in slices in model above     updatelastupdatesforuserworker.new.perform store.id, user_to_process_id   end end

Search This Blog

Premier

ruby - Big task or multiple small tasks with Sidekiq -

Comments

Post a Comment

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -