Epita:Algo:Course:Sup-Info:Sets

From EPITA_Algo_Lectures
Jump to: navigation, search

Contents

The sets

In addition to checking if an element is in a set, we must be able to:

  • insert an element in a set,
  • delete an element from a set,
  • check if a set is empty,
  • etc.

As previously said, we accept the possibility of multiple occurrences of elements (bag). Some representations (to be mentioned later) representations will only be used by sets and others will also be valid for multi-sets. Let us start by giving the abstract algebraic type of the sets.

The set type

types
set
uses
element, boolean
operations
emptyset : \rightarrow set insert : element x set \rightarrow set delete : element x set \rightarrow set _ \in _ : element x set \rightarrow boolean


For operations insert and Delete, there are two possible behaviors, depending on whether we consider that the element to add is already present or that the element to remove is already absent in the set. This gives the two following possibilities:

  • We manage an error, implying that these two operations are partial and in this case require a precondition,
  • These two operations are then considered as having no effect.


In the following axioms we will consider the second possibility, that is to say, that the set stays the same after a non effect action.

axioms
x \in emptyset = false x = y \Rightarrow (x \in insert(y, e)) = true xy \Rightarrow (x \in insert(y, e)) = (x \in e) x = y \Rightarrow (x \in delete(y, e)) = false xy \Rightarrow (x \in delete(y, e)) = (x \in e)


We can add the operation that returns the number of elements in a set:

operations
size : set \rightarrow integer


The latter is defined anywhere, so we have the following axioms:

axioms
size(emptyset) = 0 x \in e = true \Rightarrow size(insert(x, e)) = size(e) x \in e = false \Rightarrow size(insert(x, e)) = size(e) + 1 x \in e = true \Rightarrow size(delete(x, e)) = size(e) - 1 x \in e = false \Rightarrow size(delete(x, e)) = size(e)


Remark: It would also be possible to add classical set operations such as union, intersection, difference and subset.

Enumeration of a set

In a list to apply the same processing to every element, we use a sequential parsing of the list. In a set, we prefer to choose an arbitrary element, applying the processing to it, deleting it from the set and starting again on the resulting set. So we add a choice operation to the signature of the set type, which gives:

operations
pop : set \rightarrow element


with the precondition and axiom as follows:

preconditions
pop(e) is-defined-iaoi e ≠ emptyset
axioms
pop(e) \in e = true


The algorithm processing all the elements of a given set could be the following:

algorithm procedure completeprocessing
Local parameters
   set e
Variables
   element  x
Begin
   While e <> emptyset do
      x \leftarrow pop(e)                               /* Choice of x in the set */
      process(x)                                 /* Applying of the process to the choosen element */
      e \leftarrow delete(x, e)                         /* deletion of x */
   end while
end algorithm procedure completeprocessing


Remark: We will see, later in this chapter, the different possible set representations.

Set with repeated elements (bags)

In these cases, the previous operations are preserved, but their behavior differs. Indeed, the addition of an existing element increases the number of elements of the set. Similarly, deleting an item does not mean that it disappears from the set (if there are several occurrences). We then add a new operation (nboccurrences) which allows us to know the number of occurrences of an element in the multiset.

That gives the following algebraic abstract type:

types
multiset
uses
element, boolean, integer
operations
emptymultiset : \rightarrow multiset insert : element x multiset \rightarrow multiset delete : element x multiset \rightarrow multiset _ \in _ : element x multiset \rightarrow boolean size : multiset \rightarrow integer nboccurrences : element x multiset \rightarrow integer
axioms
x \in emptymultiset = false x = y \Rightarrow (x \in insert(y, ms)) = true xy \Rightarrow (x \in insert(y, ms)) = (x \in ms) nboccurrences(x, ms) <= 1 \Rightarrow (x \in delete(x, ms)) = false xy \Rightarrow (x \in delete(y, ms)) = (x \in ms)
size(emptymultiset) = 0 size(insert(x, ms)) = size(ms) + 1 x \in ms = true \Rightarrow size(dlete(x, ms)) = size(ms) - 1 x \in ms = false \Rightarrow size(delete(x, ms)) = size(ms)
nboccurrences(x, emptymultiset) = 0 x = y \Rightarrow nboccurrences(x, insert(y, ms)) = nboccurrences(x, ms) + 1 xy \Rightarrow nboccurrences(x, insert(y, ms)) = nboccurrences(x, ms) nboccurrences(x, ms) = 0 \Rightarrow nboccurrences(x, delete(x, ms)) = 0 nboccurrences(x, ms) >= 1 \Rightarrow nboccurrences(x, delete(x, ms)) = nboccurrences(x, ms) - 1 xy \Rightarrow nboccurrences(x, delete(y, ms)) = nboccurrences(x, ms)


Remarks:

  • As for the sets, it would be equally possible to add to the multi-set classical set operations such as union, intersection and difference.
  • The enumeration of the elements of a multi-set is done using the same algorithm as for a set. With the deletion removing just one instance of the element, the number of elements corresponds to the total number of elements, all occurrences combined.


Set representations

There are several manners to represent the sets in memory. We will present two of them:

Boolean array implementation

If the possible number of elements N is finite, we can represent a set with an array of N Booleans. That would give the following declaration:

Constants
MaxNb = ...
Types
t_set = MaxNb boolean /* Definition of the boolean array */
Variables
t_set s

If the element i ∈ s then s[i] = true, false otherwise. the "belonging test" to a set is just done looking at the corresponding index, whether it is true or not. The insertion or deletion of an element i is done respectively fixing s[i] to true or false.

Remark:

  • These operations are easy and fast to implement because there is only one access to the datum and posibly one value assignment.
  • In contrast, the memory cost is important:
    • especially if N is large and if few elements of the set are present. The occupancy ratio is very low in this case.
    • The empty set is represented by an array in which all elements are false. s result that implies that the initialization of an empty set requires N value assignments.
    • Same thing for the complete enumeration of the set.


In this kind of representation, the correspondences between the abstract operations and the static implementation of the type (for a set s) are as follows:

Abstract type Boolean array representation
insert( x, s) s[x]=true
delete(x,s) s[x]=false
x ∈ s s[x]
emptyset for i=1 to N do
 : s[i]=false
endfor
size(s) c=0
for x=1 to N do
 : if s[x] then
 : : c=c+1
 : endif
endfor


Remark: This representation has to be modified for the multi-sets. Indeed, in this case the Boolean will be replaced by integers giving the number of element occurrences.

List implementation

When the number of possible elements is too large or when the set is infinite, but the sets that are handled are quite small, they can be represented as lists.

In this case a set of N elements is represented by an N elements list.

Remark: Insofar as the order of the elements does not matter, there are several possible lists.

  • The "belonging test" of an element x to a set s (the search) is done sequentially through the list comparing x to each element of the set s.


  • The insertion of an element x in a set s can be done in two ways:
    • insert the element x anywhere in the list (it depends on the chosen representation of the list) without redundancy test,
    • insert the element x anywhere in the list (same remark) with an existence test. It will then simplify the deletion and the search of the element.


  • The deletion of an element x from a set s can also be done in two ways depending on the choice done for the insertion:
    • In the first case, you browse the entire list to make sure you delete all occurrences of the element x.
    • In the second case, we can stop as soon as we found the element x.


  • The enumeration of a set is also done sequentially browsing the complete list.


In the case of a multi-set:

  • The "belonging test" of the element x to a set s is unchanged.
  • The insertion of an element x will naturally be done without the existence test of x.
  • The deletion of the element x stops at the first occurrence of x encountered.
  • For the enumeration too, the whole list has to be browsed. If the number of occurrences of the elements is great, it is necessary to use a couple list < x, nboccurrences(x)>. We then adapt the insertion, deletion and research at this new representation.
Static implementation

The table that we will use must be oversized. To know the current exact number of elements in the set, it is necessary to use a position indicator, for the last element of the set, in the table(their number), which gives:

Constants
MaxNb = 20
Types
t_element = ... /* Definition of element type */ t_vectMaxNbelts = MaxNb t_element /* Definition of element array */ t_set = record /* Definition of type t_set */ t_vectMaxNbelts elts integer nb end record t_set
Variables
t_set s


Using this statement, the different algorithms of set operations are:

Belonging of x to the set s:

algorithm function belong: boolean
Local parameters
   element x
   set s
Variables
   element  i             
Begin
   i \leftarrow 1
   While (i <= s.nb) and (x <> s.elts[i]) do
      i \leftarrow i + 1
   endwhile     
   return (i <= s.nb)
end algorithm function belong

Remark: We can avoid the overflow test (i <= s.nb) positionning a sentinel (the searched element x) at the place s.nb+1. In this way the loop test covers only by the equality of element to x. If we leave the loop at index s.nb+1, this means that x does not belong to the set s.

In this case, the algorithm becomes:

algorithm function  belong: boolean
Local parameters
   element x
   set s
Variables
   element  i              
Begin
   i \leftarrow 1
   s.elts[s.nb+1] \leftarrow x
   while (x <> s.elts[i]) do
      i \leftarrow i + 1
   endwhile     
   return (i = s.nb + 1)
end algorithm function belong


Insertion of x in the set s with existence test:

algorithm procedure insert_with_test
Local parameters
   element x
Global parameters
   set s
Begin
   if not(belong(x,s)) then
      s.nb \leftarrow s.nb + 1
      s.elts[s.nb] \leftarrow x
   endif
end algorithm procedure insert_with_test


In this case, the deletion of x from the set s is simple and corresponds to the following algorithm:

algorithm procedure delete_with_test
Local parameters
   element x
Global parameters
   set s
Variables
   element  i             
Begin
   i \leftarrow 1
   while (i <= s.nb) and (x <> s.elts[i]) do
      i \leftarrow i + 1
   endwhile     
   if x = s.elts[i] then
      s.elts[i] \leftarrow s.elts[s.nb]
      s.nb \leftarrow s.nb - 1
   endif
end algorithm procedure delete_with_test

Remark: Again, x could be used as a sentinel.

Insertion of x in a set s without existence test:

algorithm procedure insert_without_test
Local parameters
   element x
Global parameters
   set s
Begin
   s.nb \leftarrow s.nb + 1
   s.elts[s.nb] \leftarrow x
end algorithm procedure insert_without_test


In this case, the deletion of x from the set s must traverse the whole list to delete all occurrences of x, which corresponds to the following algorithm:

algorithm procedure delete_without_test
Local parameters
   element x
Global parameters
   set s
Variables
   element  i             
Begin
   i \leftarrow 1
   while (i <= s.nb) do
      if x <> s.elts[i] then
         i \leftarrow i + 1
      else
         s.elts[i] \leftarrow s.elts[s.nb]
         s.nb \leftarrow s.nb - 1
      endif
   endwhile     
end algorithm procedure delete_without_test



General remarks:

  • The empty set is tested by s.nb = 0. It is sufficient to assign to s.nb the value 0 to represent the abstract operation emptyset.
  • In the case of multi-sets:
    • We can keep this structure, but we have to use insert_without_test and delete_with_test to manage the multiple occurrences.
    • We also can use a couple list < x, nboccurrences(x)> where each element appears several times. In this case, the insertion should either create the non existing couple or increase the occurrence number by 1 if the element already exists. Similarly for the deletion, it suffices to decrease the occurrence number by 1 and to really delete the element if this number reaches 0.


Dynamic implementation

It is sometimes difficult, even impossible, to determine the maximum number of elements of the set. In this case, a static representation (array) is not conceivable and the obvious solution is then the dynamic representation (linked-list). In this case the definition of the type set is as follows:

Types
t_element = ... /* Definition of type element */ t_pset = \uparrow t_set /* Definition of the pointer on record t_set*/ t_set = record /* Definition of type t_set */ t_element elt t_pset next end record t_set
Variables
t_pset ps


Using this statement, the different algorithms of set operations are:

Belonging of x to the set s:

algorithm function belong : boolean
Local parameters
   element x
   t_pset  s
Variables
   t_pset ps             
Begin
   ps \leftarrow s
   while (ps <> NUL) and (x <> ps\uparrow.elt) do
      ps \leftarrow ps\uparrow.next
   endwhile     
   return (ps <> NUL)
end algorithm function belong

Remark: With this representation, using a sentinel requires a direct link to the last element. Without this link it would be necessary to first parse the list of elements completely to place the sentinel. And the cost would obviously be too expensive.

Insertion of x in the set s with existence test:

algorithm procedure insert_with_test
Local parameters
   element x
Global parameters
   t_pset s
Variables
   t_pset ps             
Begin
   if not(belong(x,s)) then
      malloc(ps)
      ps\uparrow.elt \leftarrow x
      if s = NUL then              /* Particular case of an empty set */
         ps\uparrow.next \leftarrow NUL          /* No next */           
      else          
         ps\uparrow.next \leftarrow s            /* Creation of the element on the first place */
      endif
      s \leftarrow ps
   endif
end algorithm procedure insert_with_test

Remark: We could, in case of non existence of the element x, add it at the end of the list because its search leads us to the end of the list.


In this case, the deletion of x from the set s is simple and corresponds to the following algorithm:

algorithm procedure delete_with_test
Local parameters
   element x
Global parameters
   t_pset s
Variables
   t_pset ps, tps             
Begin
   ps \leftarrow s
   while (ps <> NUL) and (x <> ps\uparrow.elt) do
      tps \leftarrow ps                             /* Memorization of the previous one*/
      ps \leftarrow ps\uparrow.next
   endwhile     
   if x = ps\uparrow.elt then
      if ps <> s then                        /* Modification of the next link */ 
         tps\uparrow.next \leftarrow ps\uparrow.next     
      else                                   /* particular case of the first element */ 
         s \leftarrow s\uparrow.next     
      endif           
      free(ps)                               /* deletion of the element */        
   endif
end algorithm procedure delete_with_test


Insertion of x in a set s without existence test:

algorithme procedure insert_without_test
Local parameters
   element x
Global parameters
   t_pset s
Variables
   t_pset ps             
Begin
   malloc(ps)
   ps\uparrow.elt \leftarrow x
   if s = NUL then            /* Particular case of an empty set */
      ps\uparrow.next      \leftarrow NUL     /* No next */           
   else          
      ps\uparrow.next \leftarrow s              /* Creation of the element of the first place */
   endif
   s \leftarrow ps
end algorithm procedure insert_without_test



Again in this case, the deletion of x in the set s has to parse the list of elements completely to remove all occurrences of x, which gives the following algorithm:

algorithm procedure delete_without_test
Local parameters
   element x
Global parameters
   t_pset s
Variables
   t_pset ps, tps             
Begin
   ps \leftarrow s
   while (ps <> NUL) do
      if x = ps\uparrow.elt then
         if ps <> s then                           /* Modification of the link of succession */ 
            tps\uparrow.next \leftarrow ps\uparrow.next     
         else                                      /* particular case of the first element */ 
            s \leftarrow s\uparrow.next     
         endif           
         free(ps)                                  /* deletion of the element */        
      endif
         tps \leftarrow ps                                 /* Memorization of the previous one */
         ps \leftarrow ps\uparrow.next
   end while     
end algorithm procedure delete_without_test


Conclusion

In all cases considered there is always at least one operation whose complexity is at worst and on average O(N) where N is either the number of present elements or the number of possible elements. This means that beyond a hundred elements, these methods are no longer possible.

Algorithmic's Course:
Algorithms :