ruby - Big task or multiple small tasks with Sidekiq -


i'm writting worker add lot's of users group. i'm wondering if it's better run big task had users, or batch 100 users or 1 one per task.

for moment here code

class adduserstogroupworker   include sidekiq::worker   sidekiq_options :queue => :group_utility    def perform(store_id, group_id, user_ids_to_add)     begin       store = store.find store_id       group = group.find group_id     rescue activerecord::recordnotfound => e       airbrake.notify e       return     end      users_to_process = store.users.where(id: user_ids_to_add)                                   .where.not(id: group.user_ids)     group.users += users_to_process      users_to_process.map(&:id).each |user_to_process_id|       updatelastupdatesforuserworker.perform_async store.id, user_to_process_id     end   end end  

maybe it's better have in method :

def add_users     users_to_process = store.users.where(id: user_ids_to_add)                                   .where.not(id: group.user_ids)      users_to_process.map(&:id).each |user_to_process_id|       addusertogroupworker.perform_async group_id, user_to_process_id       updatelastupdatesforuserworker.perform_async store.id, user_to_process_id     end end 

but many find request. think ?

i have sidekig pro licence if needed (for batch example).

here thoughts.

1. single sql query instead of n queries

this line: group.users += users_to_process produce n sql queries (where n users_to_process.count). assume have many-to-many connection between users , groups (with user_groups join table/model), should use mass inserting data technique:

users_to_process_ids = store.users.where(id: user_ids_to_add)                          .where.not(id: group.user_ids)                          .pluck(:id) sql_values = users_to_process_ids.map{|i| "(#{i.to_i}, #{group.id.to_i}, now(), now())"} group.connection.execute("   insert groups_users (user_id, group_id, created_at, updated_at)   values #{sql_values.join(",")} ") 

yes, it's raw sql. , it's fast.

2. user pluck(:id) instead of map(&:id)

pluck quicker, because:

  • it select 'id' column, less data transferred db
  • more importantly, won't create activerecord object each raw

doing sql cheap. creating ruby objects expensive.

3. use horizontal parallelization instead of vertical parallelization

what mean here, if need sequential tasks a -> b -> c dozen of records, there 2 major ways split work:

  • vertical segmentation. aworker a(1), a(2), a(3); bworker b(1), etc.; cworker c(i) jobs;
  • horizontal segmentation. universalworker a(1)+b(1)+c(1).

use latter (horizontal) way.

it's statement experience, not theoretical point of view (where both ways feasible).

why should that?

  • when use vertical segmentation, errors when pass job 1 worker down another. such kind of errors. pull hair out if bump such errors, because aren't persistent , reproducible. happen , aren't. possible write code pass work down chain without errors? sure, is. it's better keep simple.
  • imagine server @ rest. , new jobs arrive. b , c workers waste ram, while a workers job. , a , c waste ram, while b's @ work. , on. if make horizontal segmentation, resource drain out.

applying advice specific case: starters, don't call perform_async in async task.

4. process in batches

answering original question – yes, process in batches. creating , managing async task takes resources itself, there's no need create many of them.


tl;dr in end, code this:

# model code  batch_size = 100  def add_users   users_to_process_ids = store.users.where(id: user_ids_to_add)                            .where.not(id: group.user_ids)                            .pluck(:id)   # 100,000 users performance of query should acceptable   # make in synchronous fasion   sql_values = users_to_process_ids.map{|i| "(#{i.to_i}, #{group.id.to_i}, now(), now())"}   group.connection.execute("     insert groups_users (user_id, group_id, created_at, updated_at)     values #{sql_values.join(",")}   ")    users_to_process_ids.each_slice(batch_size) |batch|     addusertogroupworker.perform_async group_id, batch   end end  # add_user_to_group_worker.rb  def perform(group_id, user_ids_to_add)   group = group.find group_id    # heavy load batch whole   # ...   # ...   # if nothing here left, call updatelastupdatesforuserworker model instead    user_ids_to_add.each |id|     # synchronously – parallelized job     # splitting in slices in model above     updatelastupdatesforuserworker.new.perform store.id, user_to_process_id   end end 

Comments

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -