ICASA Standards Survey
Tabulated Results
Background
Promoting interchange of data and software tools in the
agricultural research community is a core ICASA activity.
Standard data descriptors and file formats greatly facilitate
such interchanges. Standards were proposed in 1999, drawing on
those used with DSSAT software. ICASA is revising the standards
and seeks to involve the agricultural community and software
engineers in developing standards that will meet the needs of
diverse user groups.
This informal survey was undertaken in 2004 to allow stakeholders
to provide input on the ICASA data standards. The survey was advertised
over three listservers, with a downloadable version being available at
the ICASA web site. Copies of the survey were also distributed at the
Eight ICASA Open Forum, which was held at the ASA-CSSA-SSSA meetings
in Seattle, Washington in 2004.
Summary
Thirty-two people returned surveys. They represented a broad range of
academic fields, but no plant breeders, growers or farm advisors provided
feedback. Most respondents were from the public sector and used models on
a regular basis. Most of the respondents were also familiar with the existing
standards. Half of the respondents preferred to use "flat" ASCII files for
ICASA standards, while the other half preferred another format, such as Excel,
Access and XML. For storing data, Excel spreadsheet was the preferred choice.
Most people felt that the lack of data exchange was mainly due to the efforts
required to properly organize and document experimental data. However, most
people agreed that data should be shared after publication. Surprisingly most
of the respondents also stated that models and DSS should be platform independent.
Most of the respondents also had to write software to convert data among different
formats. Overall the respondents seemed to be receptive to data standards,
although the recommended format of the standards was mixed.
Survey Results
| Daily | Weekly | Monthly | Within a year | Almost never/never | |
| How often do you use simulation models, decision support systems or similar tools? | 11 | 14 | 4 | 3 | 0 |
| How often do you work with data that are in DSSAT or ICASA standard formats? | 3 | 8 | 8 | 6 | 7 |
| How often do you use data that originating from outside your research group or institution? | 6 | 7 | 6 | 7 | 5 |
| How often do you use software that uses DSSAT or ICASA standard formats? | 5 | 6 | 3 | 11 | 7 |
What is your discipline/specialization?
| Agronomy | 10 |
| Breeding | - |
| Pathology | - |
| Entomology | - |
| Irrigation | 1 |
| Soil science | 4 |
| Social science | - |
| Software | 1 |
| Databases | 1 |
| Management | - |
| Grower | - |
| Farm advisor | - |
| Agrometeorology/Agroclimatology | 9 |
| Crop/Plant Physiology | 1 |
| Ag Engineering | 1 |
| Ag Engineering/Hydrology | 1 |
| Integrated systems | 1 |
| Plant sciences | 1 |
Who is your main employer?
| State | Federal | Private sector | Gov’t research inst. | CGIAR | University |
| 12 | 10 | 1 | 1 | 1 | 4 |
Formats for data in ICASA Standards
Which data format would you most likely use on a routine basis for modeling?
| "Flat" ASCII |
MS Excel spreadsheet |
MS Access database |
Other database |
XML | Comma sep. (CSV) |
| 15 | 8 | 2 | 0 | 5 | 1 |
Which data format would you most likely use for archiving data?
| "Flat" ASCII |
MS Excel spreadsheet |
MS Access database |
Other database |
XML | Comma sep. (CSV) |
"Un- specified" |
"Un- decided" |
| 10 | 11 | 3 | 4 | 3 | 1 | 1 | 1 |
Which types of data most require data standards?
(Ranked from 1 = highest priority to 4 = lowest priority)
| 1 | 2 | 3 | 4 | |
| Crop management (cultivar, irrigation, fertilizer, tillage, etc.) as in DSSAT File-X | 6 | 7 | 6 | 5 |
| Weather data | 7 | 8 | 3 | 5 |
| Soil profile description | 2 | 6 | 11 | 5 |
| Field observations (typically used for model calibration or validation) | 9 | 4 | 3 | 8 |
What do you perceive as the major barriers to data exchange in the agricultural community?
(Ranked from 1 = highest priority to 5 = lowest priority)
| 1 | 2 | 3 | 4 | 5 | |
| The effort that is required to properly organize and document data. | 12 | 6 | 3 | 4 | 2 |
| Concerns that data may be misinterpreted, leading to scientific or decision support errors. | 3 | 4 | 8 | 6 | 6 |
| Concerns that data recipients will publish prior to the data providers. | 5 | 5 | 4 | 4 | 9 |
| Concerns that the data sources will not be correctly recognized/credited. | 4 | 7 | 4 | 8 | 8 |
| Lack of efficient formats, standards or tools for exchange. | 4 | 5 | 9 | 4 | 3 |
Expectations for Data and Software Exchange
| Strongly disagree |
Disagree | Neutral | Agree | Strongly agree |
|
| 1. Data standards aren’t very useful because software can easily be modified for different formats. | 6 | 18 | 1 | 4 | 1 |
| 2. After planned publications are completed, most research data should be freely shared. | 1 | 2 | 3 | 15 | 10 |
| 3. Short (4 to 5 character) variable names are valuable for compact representation of data. | 5 | 6 | 9 | 9 | 2 |
| 4. The data standards should permit flexible attachment of comments (e.g., on methods, quality or suspected problems) to most sections of data. | 0 | 2 | 1 | 17 | 11 |
| 5. Data to describe a single experiment should be packaged as a single file (e.g., of management, weather, soils, field observations, etc.) | 1 | 5 | 8 | 11 | 6 |
| 6. In using models or DSS, our research group often works with data using GIS or remote sensing tools. | 0 | 4 | 7 | 15 | 5 |
| 7. Models and DSS should be "platform independent" (run on multiple operating systems). | 0 | 0 | 8 | 10 | 13 |
| 8. Models and DSS should be licensed similar to commercial software to permit full cost recovery. | 2 | 11 | 11 | 5 | 0 |
| 9. In our research group, we often write software to convert data among different formats. | 0 | 3 | 6 | 17 | 5 |
| 10. Our research group has effective long-term data archiving procedures. | 2 | 11 | 4 | 9 | 4 |
Please list any alternate data or software standardization initiatives that you believe should be considered or reviewed as we revise the ICASA standards:
- CDIAC "20-year" standard
- How about examining the WMO CLImat COMputing Project (CLICOM)?
- Might investigate what ARS is doing through the CEAP program to ensure compatibility.
Please provide other suggestions on the revised ICASA standards:
- Use XML as the NATIVE ICASA format for data (convert to "flat file", not from)
- XML-based standardization of data format with metadata should be considered to enhance data exchange.
- Look at utilizing XML-based files to update and retrieve data from a centralized database as well as input and output files for models and DSS
- Consider "open" standard software, e.g., PostGreSQL, MySQL, HDF, NETcdf or XML ought to considered for data archive
- Standardize measured data file the same as the simulated data.
- In the weather input file, the annual average temperature (TAV) and amplitude (TAMP) are not needed. They can be calculated with a function embedded in the input subroutine.
- Flat files are easy to read but lack structure and therefore be incomplete.
- Ensure that software is available that works with new standards.
- Communication networks between users will be a critical element for successful information sharing.
- It is important to emphasize that the standards should link with GIS to allow geospatial modeling.
- Evaluate archiving of data in a central database (SQL, DB2, Oracle or MS Access).
- Development of software and standards to handle transfer to and from a centralized database.
- Creation of web applications and web services for data transfer.
- Data standards for both ASCII and relational formats may be useful for real applications, which require large data sets. For example, the study of climate change.
- Items 4 & 5 in expectations seem to be a matter of metadata. I believe metadata catalogue will be a crucial factor for data and information sharing among diverse disciplines.
- On licensing (point 8): Licensing is a difficult issue. As far as I understand, DSSAT was formulated for practical application. I strongly agree with charging DSSAT users with license fees if those users gain economic benefit from using the model. However, I would disagree with charging license fees from researchers who would like to contribute in the further development of DSSAT. Well-documented source code should be made available for the latter group for speeding up development (under strict, but cost-free license terms).
- How about including some checking facilities for data quality control, especially for weather data?
- The formatted text files currently used by DSSAT are sometimes too restrictive. it often takes time to ensure that the spacing is correct, and when it is not, the model often still runs but gives an incorrect answer (for example if the soil moisture limits are spaced incorrectly).
- I think providing metadata on how things were measured (e.g., support of soil samples, methods used to measure soil limits, etc.) and on the context of the experiment files (does it represent typical farmer practices, what was the goal of the experiment, etc.) will [rest of text not legible in print-out].
- It seems that these standards are being written for one software, DSSAT. Perhaps your group should ensure users and developers of other software are involved in this effort. For instance, models frequently used by researchers and government agencies include Century, EPIC (and APEX), SWAT, AGNPS, etc.
- On point 3, (Short variable names are valuable …) - strongly disagree (modeler time is much more precious than disk space)
- On point 5, (Data to describe a single experiment should be packaged as a single file) - neutral (should be able to access the information related to a single experiment in a convenient form, but that could be via a Web service request, or database query, or could be a zipped package of files)
- On point 8, (Models and DSS should be licensed similar to commercial software …) - neutral - (depends on the business model chosen/funders requirements. I would strongly agree with "should be able to be licensed" - which would exclude an open source license like GNU , but not the GNU LGPL)
