Archaeological metadata is dynamic for a variety of reasons. First and foremost, it is accretional knowledge accumulated over a lengthy period of time by different individuals at different stages of a project. Much of the most important information generated about specimens is not necessarily known or recorded at the time of an object's field recovery. Additional information is added during laboratory processing and during analysis, such as catalog numbers, type designations, measurements, weights, drawings, and images. Individual specimens may also be subjected to additional specialized testing, such as radiocarbon dating, carbon isotope analysis, residue analysis, etc. Some archaeological data relate only to specific objects (one-to-one) and some relate to multiple objects (one-to-many relationships).
Secondly, specimen metadata includes relationships that are apparent at both the time of discovery and in subsequent stages. At the initial recovery, an object's provenience is documented as well as its context and associations, but this information can easily be subject to change as the object's relationships are further recognized or modified. A unique trait of archaeological collections is that relationships exist between objects and other objects, but also between objects and physical non-object spaces (e.g. features, stratigraphic layers, and empty spaces) that are equally important. As more objects or information are added to the collection, these relationships may change in their content, significance, or interpretation.
Finally, the unique nature of archaeological investigation brings about great unpredictability in what information will be recovered, when it will be recovered, and seemingly minor details can later become highly significant in the interpretation of objects, features, sites, and cultures. Archaeology is about discovering the unknown, which includes false starts, unexpected twists and turns, revisiting old collections, and constant reevaluation of what we think we know across diverse scales of inquiry. Some archaeological projects take place intermittently over years and decades with shifting personnel and methodologies, the introduction of new technologies, and ever-changing interpretations.
A landform containing prehistoric archaeological sites is being evaluated in advance of a highway construction project. An archaeological investigation will be used to evaluate the landform for cultural resources that may be adversely affected by construction. A local site grid is set up and a surface survey for artifacts is conducted to ascertain whether or not further investigation is justified. Clusters of artifacts are found in several locations and small excavation units are established for further investigation. Since this is an agricultural field, the first 25 centimeters below the surface are expected to have been disturbed by past plowing activities; commonly designated as the "plowzone." Artifacts are likely to be found in the plowzone, but plowing normally eliminates any features. Surface or plowzone artifacts are consequently assumed to be indicative of potential subsurface features and additional artifacts, but are not ideal since they lack true context and associations are unreliable. For example, it is common for artifacts of different time periods to be mixed together in the plowzone. Although some shallow sites have no subsurface features, the field crew hopes to find preserved subsurface cultural materials beneath the overlying plowzone. A young field assistant is directed to hand-excavate (with shovel and trowel) a two by two meter unit (designated as "25N 100E") to a depth of 25 centimeters below the surface.
At 25 cm, the field assistant levels off the unit to a flat floor, produces a map, and photographs the unit at this level. A single ceramic sherd from the rim of an earthenware pot is conveniently found at this level. Its provenience is recorded simply as 25N 100E, 25 cm (its depth below an arbitrary known reference point on the surface). No other artifacts or cultural materials are found with it (i.e. there are no obvious associations with other objects at that depth) and no features are seen, although our field assistant noted he had a difficult time getting a good photograph because the soil kept drying out quickly in the hot weather. Some charcoal flecks are noticed near the sherd and in a few other places within the unit. The soil properties (color and texture) and charcoal flecks are recorded on the map. The charcoal flecks are too small to be collected, nor is there any specific reason to do so. Our field assistant removes and bags the sherd with its associated provenience data recorded on the paper bag. The field specimen bag is sent to the lab at the end of the day, the map and photograph are filed, and the crew heads home for the weekend. It appears that provenience, context, and associations have been recorded, but have they? Consider the following realistic scenario.
On Monday morning, our field assistant is sick and a different, more experienced field assistant is assigned to continue excavating down to the next level: 30 centimeters below the surface. The dark soil color and warm weather made it hard to see much of anything at 25 centimeters, but the soil matrix is getting lighter as the field assistant removes soil. It is now clear to our experienced worker that there are subtle cultural features visible at the 30 centimeter level (a difference of only 5 cm or approximately 2 inches). The exact spot where the sherd was located is now identifiable as a likely postmold (a stain where an ancient post was placed and subsequently decayed in place). Upon reviewing the photographs in the lab, it becomes clear after the fact that the postmold was subtle, but was even slightly visible in the 25 cm photograph even though it wasn't noticed at the time. The charcoal flecks (remnants of the ancient post) continue in this spot as the field assistant trowels down to the next level and the soil texture there is softer than in the surrounding soil matrix. It is now clear, well after the time of initial discovery, that the sherd was in a postmold. The postmold is given the designation F2016.18. The sherd now has additional significant provenience information ("F2016.18") which does not negate or replace the original provenience information (25N 100E, 25 cm), but needs to be appended to it.
In this common scenario, there are several potential pragmatic limitations that may prevent the postmold data from being effectively appended to the sherd's data (and potentially the 25 cm map as well). First, the experienced field assistant may not necessarily be aware that her colleague recovered a sherd from the previous level. This may be because the experienced worker was not present at the time of recovery, or has simply forgotten about it in the intervening week, or perhaps doesn't think to go back to the previous level's data and update it because there are dozens of other sherds found every day. Second and more importantly, the additional provenience information cannot be immediately added to the field specimen bag containing the sherd because the bag is no longer in the field. It is in the lab where the sherd is being washed, inventoried, and placed into a storage container. The most likely outcome is that the additional contextual information (its location within a feature) does not get added to the sherd in most cases because the information was not immediately available. In addition, at the time of discovery it was not possible to predict that the additional information would be available at a later date.
If provenience did not matter, there would be little distinction between archaeology and looting. This distinction is exactly why previously discussed antiquarian collections have limited research potential beyond their individual physical attributes. Recording provenience, context, and association is a key prerequisite for archaeological research and these relationships are arguably the most important aspect of most, if not all artifacts. On its own individual merits, the sherd's functional and stylistic properties tell archaeologists only a little new information. Context and association provide the essential information necessary to build larger arguments about human behavior that go beyond the characteristics of individual artifacts. Smoking guns (meaning individual points of data of very high significance) are relatively rare in science. Few artifacts are individually very informative in most archaeological investigations, but the temporal and spatial patterns in which they occur are some of the most meaningful information recovered. Those patterns cannot be recognized or articulated with inconsistent or incomplete metadata.
Not necessarily, because in most archaeological practice, not all artifacts are equally useful for analysis. Some artifacts will (and should) receive more attention than others. For example:
In this example, the sherd in question fits all three of these criteria. Its distinction as a rimsherd alone would immediately be noticeable as likely to yield more information than other artifacts. There are other reasons why an artifact might be designated a special find, only some of which might be apparent at the time of discovery. The problem is that we often don't know and can't predict which artifacts will ultimately turn out to be special finds of high significance because of the gradual accretional process of how information is recovered, encountered, and generated.
[Readers unfamiliar with the unique nature of research potential in archaeological collections can further familiarize themselves here.]
The excavation temporarily shuts down during the winter season. The crew is reassigned to process artifacts and to other higher-priority projects. Since no further field work can be undertaken during the winter, most of the field crew is not retained and most move on to other opportunities.
As spring weather arrives, the supervisor returns to the site with a new crew for the next stage of the project. Given that the original test unit was productive, he directs additional units to be opened around the original unit. Our supervisor knows that a single postmold does not necessarily indicate architecture, but is hoping to find evidence of domestic homes at the site.
More postmolds and more artifacts are found. It is now clear that the postmold is one of many that represent the continuous wall of a house. Other postmolds that do not clearly belong to this house are also found in the vicinity and may be part of this house, other adjacent houses, or no house at all. Additional provenience information (the house number "Structure 2016.3") is now available for our sherd excavated last year, but no one has any reason to remember the humble sherd that we started with nor any of those recovered since - all of which have long since been catalogued and placed in temporary storage to await analysis. As new sherds are recovered, those associated with the house are assigned provenience information accordingly. Due to the sequence in which data was recovered, the full and most useful provenience (its association with the postmold and with the house) of the original sherd was not initially clear at the time of discovery and was only evident months later after more excavation was completed.
Significant omissions in metadata happen countless times every day to finds of high and low significance. Limiting our scenario to a single sherd makes it easier to provide an example for discussion, but it also misleadingly implies that this is only an occasional problem that affects few objects at one stage of the process. It is a widespread problem of data management, which will become increasing difficult to handle as the project continues.
Months pass while this site (and many others nearby) are evaluated to determine which sites are significant enough to justify further work. The site is slated for destruction by the new interstate. A new crew returns to the site. They re-establish datum points on the original grid system and are directed to mitigate what remains of the site prior to demolition. This crew has more funding, but less time to work. The decision is made to use mechanical stripping to remove the plowzone, record intact features, and excavate a sample of those features. The new crew does not have time to excavate and map individual units. Although they use the same arbitrary mapping grid, they do not set up physical stakes or pins for mapping. Instead, they use a total station to record the 3D coordinates of the features and individual artifacts they recover from the area of the house. They record more postmolds, more ceramic sherds, and many other kinds of artifacts.
In contrast to our original sherd (which has a unit and level provenience), the other hand-excavated sherds (which have a unit, level, feature, and house number), the sherds from the mitigation have provenience information only in the form of 3D coordinates (and no other provenience information reflecting their membership in the house or individual units.) There is now a problem of metadata that is partially incompatible with the data collected from the earlier stages of the project. This occurred because field methodologies are also dynamic and change as different resources or requirements are introduced throughout the course of a project.
The same arbitrary grid was used to collect both types of provenience information, so it is possible to at least overlay these different spatial datasets to make a composite map. In the lab, maps are made by overlaying the coordinate data with excavation maps, but all of this three-dimensional spatial information is confined within the mapping software. There is no mechanism for its incorporation into other applications, such as those being used to catalog/inventory the specimens, which contributes to metadata fragmentation.
During the mitigation, many postmolds are located. A sample of postmolds are excavated and small amounts of wood charcoal are recovered from their fill. Some samples are kept for botanical analysis (for wood species identification) and others are sent for radiocarbon dating. The fieldwork is completed and the site is destroyed by construction. Due to the speed of the mechanical stripping and the use of the total station, the mitigation went quickly. As the project nears completion, the investigation unravels in light of new findings. After the radiocarbon date results are received months later, it becomes clear that the postmolds represent adjacent houses from completely different time periods whose artifacts will need to be differentiated in the lab. Good provenience information is the only way to tell which artifacts go with which house/time period prior to their analysis.
It is not a problem limited to field recovery, but it begins there. It is ultimately a problem related to the curation of specimens and metadata, which begins in the field, continues to the lab, persists in storage, and is further exacerbated each time the collection is inventoried or re-analyzed. Long after an artifact is recovered and inventoried, additional data becomes incrementally available (e.g. radiocarbon dates) and new relationships (e.g. refits; groupings) are recognized.
More months pass. When it comes time to perform the ceramic analysis (and analysis of other artifact types), the uneven manner in which provenience information was discovered and recorded makes this a cumbersome task. Given the constraints of time and funding, the ceramic analyst tries to choose the lesser of evils:
The analyst arbitrarily selects a sample of what she believes to be a representative sample of the most significant specimens, which are rimsherds and other decorated sherds with motifs. She is conscientious, reviewing and compiling the provenience data for the selected sherds into a spreadsheet. In the process, she is able to fill in some of the obvious missing provenience metadata. The remainder of her spreadsheet includes measurements on each sherd such as weight, thickness, dimensions, and other information generated by the analyst. She also creates additional new information about each artifact in other formats, such as photographs and digital illustrations of rim profiles. All of this information is compiled into her own spreadsheets and files, none of which is likely to be used to populate the catalog, especially since the catalog was completed in the lab months ago.
For her sample, she locates rimsherds from different postmolds across the sites that are similar to each other. Some of these rimsherds originate from the same pots and can be physically rejoined or "refitted." Refits are a type of relationship that can be highly significant in the interpretation of behavior at a site. Refits can be used to make many different kinds of arguments, which differ based on the type of artifact. For ceramics, refits are commonly used to increase the sample size of a pot, which allows for a more accurate reconstruction of the shape, size, and form of the complete vessel. In the context of this site, refits will be also used to demonstrate that certain features are contemporary with each other, aiding the primary investigator in interpreting this multi-component site. Refits could be denoted in several possible ways. The analyst produces a spreadsheet listing refits between individual ceramic specimens. The primary investigator then uses this list to interpret which features are contemporary with each other, ultimately producing a table and map visually denoting these links.
The analyst reviews the remainder of the ceramic assemblage, which are body sherds that fall into several categories based on the type of temper used to construct the pot in prehistory. For these non-special sherds, she seeks only to produce a count and a weight for each temper type. In order to do so, it is necessary to further subdivide most of the groupings into sub-groupings based on temper type. These subgroupings will need to be differentiated from each other with some form of new inventory number. In addition, their original inventory numbers will have to somehow be negated or annotated to avoid the erroneous appearance of the sherds suddenly doubling in number within the catalog inventory.
In the process of sorting the body sherds, the analyst finds additional rims and decorated sherds mixed in with the body sherds. These were missed by the non-specialists who processed the collection in the lab many months ago. These sherds should be separated and placed with the other special finds. New inventory numbers will need to be generated, the count will need to be adjusted within the existing inventory, new storage containers, new storage locations, etc. Lacking time or a processing procedure, the analyst is unsure what to do with these special finds.
Although our hypothetical team of professionals will presumably manage to complete the excavation, conduct the analysis, and produce the necessary reports, the types of messy obstacles described here will not be resolved at the completion of the project. Objects, documents, and files will be placed in storage, but the organization of the collection and its associated data will be very difficult for future researchers to understand or reverse.
This is a hypothetical scenario that may not directly resemble the process of any one organization or agency nor the wide divergence in the types of sites and material culture that American archaeologists investigate. The types of problems identified here (which are by no means comprehensive) should hopefully be familiar enough to most archaeologists, especially those who are charged with the curation of archaeological collections.
For non-archaeologists, it is most important to understand that the data associated with artifacts are not just another kind of metadata that is recorded once or can only be assigned in one acceptable way. The provenience, context, and associations of an artifact are recorded at the time of discovery, but they are dynamic and subject to change or be appended as new objects, information, or relationships are added. It is common for changes and additions to occur throughout the initial recovery as well as long after the completion of an excavation.
A large part of the underlying problem is that we often do not know (and cannot expect to know) all of the potentially relevant information about an object at the time of its initial discovery (e.g. the relationship of the sherd to a feature; the relationship of that feature to other features), what information might later prove to be informative (e.g. the sherd's attributes provide information about the feature/site; the sherd refits that helped resolve some of the ambiguity in dating; the radiocarbon dates provide an independent set of information about dating that relates to the sherd), and other information that results from analysis or an improved understanding of context.