Setting the Tone: Open Data in Agriculture
Speakers: Mars Benzle and Jane Wyngaard
https://datascience.nih.gov/CaseStudy
Overview
- Advantages/Disadvantages of Open Data
- Lots of good reasons to keep it open
- Still lots of pushback; why?
- #DiscussionNeeded
- Open Source Plant Breeding
- 4 different entities:
- Commercial
- Farmers
- Research
- Government
After a brief overview, GOAT attendees broke into groups, according to those 4 entities to discuss gains and barriers.
Open Data discussion in regard to farmers
"Open data” in this context is focused on three types of farm data:
- Environmental: soil, water, air, wind
- Production: yield, planting dates, harvest dates, selected varieties, seed sources
- Market: number of order, buyers, sellers, intermediaries; price data for inputs and outputs; supply vs. demand
Ways to acquire data
- Manual input (census)
- Automated via tools a farmer already using that gives a direct benefit
Opportunities
- Simplify regulatory compliance
- Improved production practices: best/worst performing varieties, cultural practices, equipment configurations
- Price transparency: cost of inputs and supplies, how to price products
- Market demand: what does my market want? what is trending in other markets?
- Gamify benchmarks: creating engaging/delighting user experiences to increase farmer participation
- Leverage my farm’s data: have good reports of my farm’s success to get grants/funding
Barriers
- Expectation is that I should be paid for my data, reluctance to share
- How to be compensated for sharing my data?
- Mistrust of who will use my data and closed systems that want to consume open data
- Lack of interoperability
- Farms and operators come and go, so more difficult to deal with transiency
- Data accuracy
- Threat of losing proprietary advantage/competition
- Cumbersome to do manual data input, need automated ways to extract and share data from existing tools that farmers use
Prompt: What are the barriers and gains of participation and investment in open source agriculture data?
Government perspective:
Gains:
• Save money on surveying and surveys.
• Tax payers can better see the befits of research being done.
• Tools to see and benefit from data.
• There could be better management of researchers or better communication between researchers. Researchers will have a better idea of what other researchers with similar goals are doing.
• More viable to have more democratic data collection.
• More possibility of community governance and participation. Some example is Data.gov.
• People saw the value of data collected in something like the NASA program.
• Like SBIR?
• Jobs?
• Market data can stabilize prices, create tighter supply chains, better supply chains, more evenly distribute food systems, AMIS for example, or the anonymized avocado price data project.
• Environmental data
• A more distributed system could be a gain but it could also be a problem.
• More Economic data for financing people who are better suited to be funded.
• People love free
• People don’t want to put in the work to collect their own data.
• More redundancy to understand wider trends.
Barriers
• People like farmers don’t trust the government which hinders what programs the government can implement.
• Taxpayers don’t want more money to go to the government.
• No unified chain of communication for how data will be gathered, managed, and distributed.
• There needs to be restrictions to protect individual privacy.
• Governance of how data is maintained and shared.
• Discomfort and problems with unchecked government power.
• Misuse of data who drives what is collected in the first place.
• Which data sets are aggregated and used.
• What outcomes are preferable?
• How do we insure all data is voluntarily submitted?
• Data can “vanish“ if political values change between admins.
• Does the government own the data it shares?
• Misrepresentation of data through stats.
• Disinterest by the government to advocate for open source.
• Are there regulatory barriers that already exist?
• Commercial spinoff?
• National security concerns of foreign governments tracking water or Natural resources.
• Foreign powers looking at our food system or food insecurities and using that against us.
• The commercial agriculture lobby.
• What is the economic value of OS data
• Management of a huge quantity of data (technical storage)
• Capacity- data management could put a stress on human and economic resources.
• The use of electricity and power in data storage and management.
• Patents could result in positive commercial development or negative monopolies if commercial entities use the os data to develop non competitive markets.
• Broadband is not up to snuff in many areas.
• How do you verify if the data is good? Could the system succumb to data trolls or data terrorists or any one that benefits to the manipulation of the data points?
• Is such technology accessible to people how do we get them access?
• Cultural acceptance of OS some governments want control for their own reasons.
• Free is cheap or not good.
• If it is free how to you have technical support in how to use it?
• How will we insure its persistence through time (soft money)
• Lack of a shared technical language, share units, shared natural language, vocabulary. How do people interface?
Other notes from the group:
There are two issues with government and open source data there is the governments’’ willingness to contribute and participate and the people’s willingness to allow the government to collect large amounts of data on individuals and share it.
Research perspective:
Opportunities:
• Collaboration around data
• Share data with farmers
• Creating communication.
• Creative commons but for data sharing
• Education and training on research data
• Not reinventing wheel to test machine learning
• Data citation opportunity
• Many farmers-data points
• Can better calibrate and validate models
Challenges:
• Data Quality (How good is it? Cal/Val). People want to make sure it is right before publishing
• Disincentives in academia because of wanting to publish papers
• Producer data-commodity
o Geocoding
o Privacy
• Data in form that farmers can use.
• Assumptions about data
o Researchers need to shar it in a meaningful way
• How do researchers get farmers to collect data?
• Competition for grants
• Documentation
o Metadata
o Education self documenting
Where:
• Git Hub
• Library at university
• USDA
• Repository
Questions:
How are technologies changing the knowledge process?
How do we built trust between stakeholders?
Other notes from the group:
Social vs environmental data: the same challenges and opportunities apply. You could leverage more connection and collaboration with shared interests.
Main points:
- Incentive in academia
a. Papers not data
b. Grants incentivize competition
- Invite more people/disciplines in and collaborate
- Providing input and advancing all stakeholders’ goals (eg stakeholders from different disciplines)
- Collaboration and funding opportunities.
- Data sharing with commercial interest
a. Licenses open source licenses can create constraints
b. Coops with data not eager to share with and credit researcher but perhaps some arrangement is possible
- Learn from companies and research that already deal with privacy problems such as health and genetic data.
a. Similar challenges: quality formats, privacy, metadata
- Mutually beneficial data sharing
a. Better data driven regulations
b. Policy
c. Benefits to all stakeholders
Commercial perspective:
Pros
- Cost savings: data can be reused instead of recaptured, and it should lead to efficiency/method gains
- The supply chain already currently likes aggregate data
- Distributed data storage decreases the cost storage
- Data can prove fault/innocence of various parties in relation to liability cases
- Could offer new avenues of revenue eg:
- Remote sensing services
- Carbon trading etc
- Enables integration of sensors/actuators from different companies (eg Nest with John Deer say)
- Enables the creation of new data products and new automated analyses
- Could serve to train AIs
- Will effect - slowly - a broader cultural change towards data but also technologies, sharing practices, ….
- Expanding data is a new domain rather than something that is replacing an old domain
Cons
- Open knowledge about land yield say could lead to the devaluation of your property
- Insurance companies may become even more heavy handed in requiring certain practices be adopted in order to be covered
- There’s a question of data quality control and possible consequent liability issues
- There are many walled gardens in which companies are currently operating successfully and see no need to change their model
- Open data has yet to be presented in a manner that actually shows any value add to a farmer
Challenges
- Language barriers (tech speaking to Ag and vice versa)
- Cultural barriers
- Could be issues are patents eg patents being issued for broad methods (eg say a patent for using smart soil water sensors to feedback into a irrigation control system) - hasn’t happened yet but could
- Trademarking (trust) is and will be very important in this sector even of open source tools/data
- Who is going to bear the cost of storing data?
- Approaches are needed thet sell open data as a means of generating value not as a cost reduction exercise
- EASY to implement data sharing levels are needed eg: high res data is captured and shared with the farmer say but only regional aggregates and metadata are ever made open
- Having “open is default” practices in tools will facilitate increasing the amount of data that is open but again - MUST be EASY to use/select range of sharing levels
- Many drivers in the current model of Ag are unhelpful. Eg Subsidies are driven/triggered/proportional to volume
- OpenData really needs to be rebranded/better communicated/the value add shown more clearly if it’s to see any traction within the Agricultural sector. Possible rebranding could include say using terms of “knowledge base”.