Difference Between Arrays Preserving Duplicate Elements in Ruby

Difference Between Arrays Preserving Duplicate Elements in Ruby

I'm quite new to Ruby, and was hoping to get the difference between two arrays.
I am aware of the usual method:
a = [...]
b = [...]
difference = (a-b)+(b-a)

But the problem is that this is computing the set difference, because in ruby, the statement (a-b) defines the set compliment of a, relative to b. 
This means [1,2,2,3,4,5,5,5,5] - [5] = [1,2,2,3,4], because it takes out all of occurrences of 5 in the first set, not just one, behaving like a filter on the data. 
I want it to remove differences only once, so for example, the difference of [1,2,2,3,4,5,5,5,5], and [5] should be [1,2,2,3,4,5,5,5], removing just one 5. 
I could do this iteratively:
a = [...]
b = [...]

complimentAbyB = a.dup
complimentBbyA = b.dup

b.each do |bValue|
  complimentAbyB.delete_at(complimentAbyB.index(bValue) || complimentAbyB.length)
a.each do |aValue|
  complimentBbyA.delete_at(complimentBbyA.index(aValue) || complimentBbyA.length)

difference = complimentAbyB + complimentBbyA

But this seems awfully verbose and inefficient. I have to imagine there is a more elegant solution than this. So my question is basically, what is the most elegant way of finding the difference of two arrays, where if one array has more occurrences of a single element then the other, they will not all be removed?


Answer 1:

I recently proposed that such a method, Ruby#difference, be added to Ruby’s core. For your example, it would be written:

a = [1,2,2,3,4,5,5,5,5]
b = [5]

a.difference b
  #=> [1,2,2,3,4,5,5,5]

The example I’ve often given is:

a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]

a.difference b
  #=> [1, 3, 2, 2] 

I first suggested this method in my answer here. There you will find an explanation and links to other SO questions where I proposed use of the method.

As shown at the links, the method could be written as follows:

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }

Answer 2:


ha = a.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
hb = b.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
ha.merge(hb){|_, va, vb| (va - vb).abs}.inject([]){|a, (k, v)| a + [k] * v}

ha and hb are hashes with the element in the original array as the key and the number of occurrences as the value. The following merge puts them together and creates a hash whose value is the difference of the number of occurrences in the two arrays. inject converts that to an array that has each element repeated by the number given in the hash.

Another way:

ha = a.group_by(&:itself)
hb = b.group_by(&:itself)
ha.merge(hb){|k, va, vb| [k] * (va.length - vb.length).abs}.values.flatten