Appendix A. A Brief Theory of Bags

Set theory has long been useful in mathematics and computer science. Bag theory is a natural extension of set theory. A bag, like a set, is a collection of elements over some domain. However, unlike a set, bags allow multiple occurrences of elements. In set theory, an element is either a member of a set or not a member of a set. In bag theory, an element may be in a bag zero times (not in the bag) or one time, two times, three times, or any specified number of times.

Bag theory has been developed in [Cerf et al. 1971] and [Peterson 1976].

As examples, consider the following bags over the domain { a, b, c, d }:

B₁	=	{ a, b, c }
B₂	=	{ a }
B₃	=	{ a, b, c, c }
B₄	=	{ a, a, a }
B₅	=	{ b, c, b, c }
B₆	=	{ c, c, b, b }
B₇	=	{ a, a, a, a, a, b, b, c, d, d, d, d, d, d, d }

Some bags are sets ( B₁ and B₂, for example). Also, as with sets, the order of the elements is not important, so B₅ and B₆ are the same bag. (Ordered bags are sequences.)

In set theory, the basic concept is the is a member of relationship. This relationship is between elements and sets and defines which elements are members of which sets. The basic concept of bag theory is the number of occurrences function. This function defines the number of occurrences of an element in a bag. For an element x and a bag B, we denote the number of occurrences of x in B by # ( x, B) (pronounced “the number of x in B ”).

With this conceptual basis, we can define the fundamentals of bag theory. Most of the concepts and notation are borrowed from set theory in the obvious way. If we restrict the number of elements in a bag such that 0 ≤ # ( x, B) ≤ 1, then set theory results.

A.1. Membership

The # ( x, B) function defines the number of occurrences of an element x in a bag B. From this it follows that # ( x, B) ≥ 0 for all x and B. We distinguish the zero and the nonzero cases. An element x is a member of a bag B if # ( x, B) > 0 . This is denoted x ∈ B. Similarly, if # ( x, B) = 0, then x ∉ B.

We define the empty bag ∅ with no members: For all x, # ( x, ∅ ) = 0 .

A.2. Cardinality

The cardinality | B | of a bag B is the total number of occurrences of elements in the bag:

\| B \|	=		Σ	#(x, B)
			x

A.3. Bag Inclusion and Equality

A bag A is a subbag of a bag B (denoted A ⊆ B ) if every element of A is also an element of B at least as many times:

A ⊆ B iff #(x, A) ≤ #(x, B) for all x

Two bags are equal ( A = B ) if # ( x, A) = # ( x, B) for all x.

From these definitions, we can immediately show that

A	=	B iff A ⊆ B and B ⊆ A
∅ ⊆ B for all bags B
A	=	B implies \| A \|	=	\| B \|
A ⊆ B implies \| A \| ≤ \| B \|

A bag A is strictly contained in a bag B ( A ⊂ B ) if A ⊆ B and A ≠ B. Note that # ( x, A) < # ( x, B) does not follow from A ⊂ B, although we do have that | A | < | B | .

A.4. Operations

Four operations are defined on bags. For two bags A and B, we define

Bag union	A ∪ B	#(x, A ∪ B)	=	max(#(x, A), #(x, B))
Bag intersection	A ∩ B	#(x, A ∩ B)	=	min(#(x, A), #(x, B))
Bag sum	A + B	#(x, A + B)	=	#(x, A) + #(x, B)
Bag difference	A - B	#(x, A - B)	=	#(x, A) - #(x, A ∩ B)

These operators have most of the properties that would be expected. Union, intersection, and sum are commutative and associative. In addition, the expected inclusions hold:

A ∩ B ⊆	A ⊆ A ∪ B
A − B ⊆ A ⊆ A + B

The distinction between union and sum is clearly stated by

\| A ∪ B \| ≤	\| A \| + \| B \|
\| A + B \|	=	\| A \| + \| B \|

Unfortunately no such simple statement distinguishes A ∩ B from A − B. The latter is complicated by the impossibility of removing elements from a bag which are not there.

A.5. Bag Spaces

We define a domain D as a set of elements from which bags are constructed. The bag space Dⁿ is the set of all bags whose elements are in D such that no element occurs more than n times. That is, for all B ∈ Dⁿ,

x ∈ B implies x ∈ D.
# ( x, B) ≤ n for all x.

The set D^∞ is the set of all bags over a domain D; there is no limit on the number of occurrences of an element in a bag.

A.6. Parikh Mappings

For finite domain D = { d₁, d₂, …, d_n }, there is a natural correspondence between each bag B over D and the n -vector f = ( f₁, f₂, …, f_n ) defined by

f_i

#(d_i, B)

This vector is known as the Parikh mapping [Parikh 1966].

A.7. Examples

Let D = { a, b, c, d } be a domain. Then for the following bags,

A	=	{ a, b }
B	=	{ a, a, b, c }
C	=	{ a, a, a, c, c }

we have

\| A \|	=	2
\| C \|	=	5
A ∪ B	=	{ a, a, b, c }	=	A
A ∪ C	=	{ a, a, a, b, c, c }	=	B ∪ C
A ∩ C	=	{ a }
B ∩ C	=	{ a, a, c }
A + B	=	{ a, a, a, b, b, c }
A − B	=	∅
C − A	=	{ a, a, c, c }
C − B	=	{ a, c }

Home

Comments?

\| A ∪ B \| ≤	\| A \| + \| B \|
\| A + B \|	=	\| A \| + \| B \|