Standards
General
The basic organization of the data is defined for experiments that may include multiple sites and years as well as various crop and weed species, managements, and initial conditions at one site in one year. Various subsets of site, species, managements and initial conditions data are referenced in a central subset of data items identified as “Treatments.” Additional subsets of descriptive data and results are linked to “Treatments” through level indicators that are defined in the treatment subset. These linked subsets describe the genotypes, the fields, crop management and other features of the experiment (Fig. 1). Data from experiments are recorded in two separate subsets, one dealing with measurements or observations made at one or a few times during the course of the experiment, the other with those made at intervals throughout the experiment. Field data typically include crop developmental stages, yield and yield components, and growth analysis data such as leaf area index (LAI), stem, leaf, aboveground and grain biomass, but they can include measurements of soil water content, soil nutrient levels, pest damage or any other variables deemed relevant. Weather and general descriptions of soil profiles are managed separately since a single set of data may apply to multiple experiments.
Data items
The basic unit is a data item that contains one or more values, which may be numeric, identifiers, codes, or descriptive text, plus a name. The names are character strings with no distinction between upper and lower case. Information in a data item can be either:
- Variables - information pertinent to the experiment/situation documented.
Examples:
2070 |
Value in kg ha-1 for stem dry weight on a given date. |
|
1982 |
The year an experiment was planted. |
|
3.2 |
The number of seed per pod from a soybean treatment. |
|
-5.1 |
The minimum temperature in °C on a given day. |
|
BAT 477 |
The name of a common bean cultivar. |
|
UFGA8701 |
The name of a weather data set. |
|
Early planting |
The name of a specific planting date treatment. |
-
Level indicators - character strings or numbers that link with data reported elsewhere.
Examples:
7 |
Treatment number 7 in an experiment. |
|
6 |
Irrigation regime number 6 in an experiment. |
|
IB0488 |
A cultivar identifier that links to a table of cultivars. |
|
KSAS0401WH |
An experiment identifier. |
Variables can be numbers (decimal or integer), character strings, or text. Variables associated with a single name must be of the same data type. Units for numeric variables largely follow the International System of Units (SI), but “cm” and “ha” (hectare) are permitted in order to conform to dominant practices in agricultural research, e.g., by the American Society of Agronomy (Anon., 1998). Times of events such as planting, fertilization, or anthesis are recorded using year (four- or two-digit, depending on context) and day of year. Further examples of variables are presented in Table 1, and the full set of standard variables is listed at the ICASA website.
Codes are provided for non-numeric variables where some degree of standardization is convenient or required, such as for describing fertilizer types, irrigation methods, or planting methods. Examples are presented in Table 2, and the complete list of codes is provided at the ICASA website.
Data Sets and Subsets
Data items are organized into a hierarchy with two levels, i.e., sets, and subsets. Sets are the highest level of aggregation. They allow connected but not necessarily related data to be kept together. Three types are currently recognized (Table 3):
- Experiments. A description of treatments, initial conditions and field measurements for a single experiment, which could be multi-location or multi-season.
- Weather. Daily weather data from one or more recording stations.
- Soils. Collections of soil information, usually from a single geographic region or data source.
Data subsets are comprised of closely related data items stemming from one set of measurements or field operations, one soil profile characterization, or one weather station. For experiments, most subsets correspond to specific management activities, e.g. planting or irrigation, or to field measurements (Table 3).
Names and Identifiers - General
Experience from managing data from large numbers of experiments has demonstrated the need for datasets and subsets to be identified in a consistent manner. Furthermore, a compact name is valuable for manipulating data electronically, i.e., in spreadsheets or statistical packages. Thus, a consistent and compact naming system has been defined.
Nonetheless, it is recognized that some flexibility is necessary in naming datasets and subsets, to accommodate both user preferences and established local practice. When a user introduces an identifier that deviates from the standard convention, however, it is important that an effort is made to ensure that such introductions are consistent, especially in linking weather or soil data to experiment descriptions
Names and Identifiers - Datasets
Datasets are identified by one of three names, EXPERIMENT, SOIL or WEATHER (Table 3), associated with a specific identifier constructed to provide information on the contents of the set. These are constructed differently for experiments, soils, and weather. For experiments, the set of identifiers are constructed by combining:
An institute or region code (two characters, e.g., “ UF” for “ University of Florida”, “ CA” for “ Canada”),
A code for the site or set of sites (two characters, e.g., “ GA” for “ Gainesville”),
A year code (two characters representing the year in which the experiment was initiated, the year in which it was finally harvested, or another year of significance to the principal investigator or coordinating Institute),
An experiment number or code (two characters), and
A crop, multi-crop (for mixed cropping or crops with weed populations) or sequence (for rotation experiments) code (two characters).
Thus, the third experiment ( 03) conducted by the University of Florida ( UF) at Gainesville ( GA) in 2006 ( 06) with soybeans would yield a specific dataset identifier of UFGA0603SB.
For weather, dataset identifiers can be constructed from Institute and Site codes plus, if desired, four digits to indicate the starting year (e.g., UFGA2006). Optionally, a twelve-character code may be used, where the first two additional characters indicate the number of years of data and the last two characters can be used to identify other characteristics of the set. Thus, “ UFGA196825R1” might indicate a 25-year series from Gainesville, Florida that started in 1968 and that used a method “R1” for estimating daily solar radiation.
For soil data, specific set identifiers can be constructed using a two- character code for the institute or region, plus a two-character code for the site or collection of sites. Alternatively, a longer name (but staying within an eight character limit) can be used to provide significant information on the contents of the set. Thus, “ ARIZONA” could be used as a general name for a dataset containing soil profile descriptions from diverse sites in Arizona
Names and Identifiers - Subsets
For experiments, in which each set is restricted to data from one experiment, general names (Table 3) provide an unambiguous identification for each subset, e.g., TREATMENTS, PLANTING, but for weather and soil files, which may contain information from different weather stations or soil profiles, the single name may not guarantee unique identification. Thus, a specific subset identifier may need to be appended to the general name to provide a unique reference for the data items. For example, for weather subsets that contain only a part of an overall dataset, e.g., a single year or portion of a year, the subset should be identified with the general name plus some specific information, as shown below:
WEATHER_STATION:UFGA2004S1
In which the ending “ S1” might indicate “Season 1”.
For soil subsets, the specific identifiers are of ten-character length, with the Institute and Site codes in the leading four positions, the year the profile was described in the field in the next four positions, and a specific profile identifier in the remaining two positions, e.g., UFGA198501. A soil subset identifier thus may have the appearance shown below:
SOIL_PROFILE:UFGA200401
Names and Identifiers - Variables
Many variable names in DSSAT were limited to four or five characters to permit displaying a name as a label over a column of data that contained no more than five digits. Recognizing, however, that the expansion of the standards to other electronic formats reduces the need to limit the size of variable names, the ICASA standards now include two name formats, the long variable name and the abbreviated name (Table 1). The long name is generally 12 to 24 characters long and uses complete words as much as possible. Words are separated by the underscore (“_”) character. The abbreviated name usually corresponds to the previous DSSAT name, although some variables have been re-named to correct inconsistencies. Abbreviated names for most data are no longer than five characters to permit their use as compact column headings. Allowing five spaces for numeric data permits displaying at least four significant figures, a level of precision greater than achieved in most agronomic, weather and soil measurements.
To facilitate interpretation of the abbreviated names, a major effort has been made to use a consistent naming strategy. Thus, for observed data relating to plant tissue masses or nutrient content, the first character indicates the tissue type, e.g., “L” for leaf, the second character describes the quantity being measured, e.g., “W” for dry weight, “N” for nitrogen, the third and sometimes the fourth character(s), the measurement reference, e.g., “A” for area, “PC” for percentage, and the final character indicates the time or frequency of measurement or observation, e.g., “D” for time series data referred to specific sampling or observation dates, “H” for data recorded at harvest.