LTER Data Management Workshop - 7/28/95 A. Agenda Revisions 1. need to add structure for next-years meeting 2. working groups a) may need to have more standing committees b) may come out in discussion of internal structure 3. want reports actually generated and printed out NOW! a) early draft important 4. changing of report format - to something more like minutes? a) don't want to lose product! b) known for productivity - need to keep that up c) many hands make light work! 5. edit site capabilities list B. Strategic Plan - Stafford 1. evolution - meeting in Corvallis 6 weeks ago a) rising expectations 1) internal 2) external a> NSF, ESA, scientific community 3) fork in road b) we need to be in charge of our own destiny! 1) we are closer to the problem c) really need to evolve vision plan 2. strategic plans a) many points hit of on Site Bytes... b) DM work well as team - reinforcement and support c) rapid advances in technology d) science - technology e) users 1) scientists a> site b> network c> outside 2) natural resource managers 3) legislators 4) need better accessibilty - not just to technophiles f) major challenge - Double foci 1) can no longer look just at site level 2) also need focus on network level g) major expectations from NSF and community and general 1) when ecological data comes up, "what is LTER DM doing?" is frequent question 3. guiding principles a) ongoing process b) scientific research uses in mind c) evaluate in light of specific research uses d) ease of access e) how to find existing data sets f) tangible results and publications 1) important measure g) synthesis volumes 1) could consider 2) could track how different technologies were infused into system 3) don't lack for ideas - lack time! 4. discussion a) have tried to define for ourselves what is acceptable DM at site level b) curriculum development - outreach c) user input 1) may be blending between DM and users 2) would be good to have joint meeting with CC d) Hayden - LTERs create social compact based on scientific mission 1) DM has taken over as central glue of social compact 2) difference between PI and DM is vanishing 3) system gives back more than any individual puts in a> a bargin! e) want feedback from PIs on strategic plan 1) we're not doing it in a vacuum 2) will need validation from larger PI community 3) we need to be clear on the theme f) LTER DM as research platform for ecological DM 1) diversity is necessary 2) often perceived as monolythic but it can't be if it is to serve as a research platform 3) really extra core area g) community requires some technologies that have not been invented yet 1) early adopters, experimenters 2) need to actively engage computer science commmunity to actually build what we need h) NASA archive center 1) ground data 2) have real problems managing old NASA ways 3) lots of funding 4) perception LTER has it all figured out.... i) good feedbacks between individual sites 1) but diversity makes it harder to do things at network level 2) need to look further down the line at site level at network goals j) some things are short term, but we should never be appologetic that some things will NOT have immediate benefits - archival role is also critical 1) well documented data is critical to synthesis k) NSF may move to including datasets in results of prior support l) by end of meeting want it polished 1) but don't want to put out too soon 2) assess status relative to release at end of meeting 5. charge to working groups a) read statement b) recommended changes 1) not wordsmithing 2) overlooked topics C. Working Groups 1. Group I notes a) mission statement needed 1) to facilitate scientific advancement through the maintenance and accessibility of LTER data at both the site and network level b) implementation 1) helpful to think about in formulating vision 2) tangible results and publications a> normal referred journals means articles are out of date b> could start own electronic journal c> Ecological Bulletin might be a forum c) lack of technical supplements 1) could we do grant as group 2) how would it be crafted to keep sites cohesive d) request for biodiversity information 1) concern that data structures are evolving in that field - makes it hard to respond to requests 2) is it a ripe area for network standardization? e) group overall liked document f) discussion 1) want to avoid knee-jerk reaction to specific crises - need longer view a> how do we anticipate requests 2) cross site proposals by info. managers 3) would like some PI input to avoid Us vs Them document a> take to Exec. Committee and PIs as whole 2. Group II notes a) general discussion 1) why is more important than how - looks fantastic 2) extremely important statement - need to make sure it is Right! b) Introduction 1) JH- need explicit goal statement right up front 2) JP- research component put in introduction 3) JH- dynamic between ECOLOGICAL science and INFORMATION technology 4) MT- don't want to get too specific a> social scientists 5) GC- is an E in the middle of LTER c) Vision 1) JH- Disagree with conversion from data to information is possible a> definitions are not right b> can't waffle on that c> definition of "Information" 1> data are raw facts, quantified measurement 2> information is more qualitative a: more text d> is species list information based on data from plots? 1> JH-need to preserve raw DATA a: right now I am mining data FROM information e> discussion 1> metadata is information 2> need to worry about both DATA and INFORMATION 3> metadata+data allows you to go to INFORMATION 4> JH- substitute INFORMATION for METADATA a: is metadata a buzz word we want to use 5> JH-Dataset - don't know what that really means a: type of data b: data file 6> in document use both Data Management and Information management, often interchangably 7> JH- I am using "holdings" a: Information is more inclusive category -as placeholder 8> RL- could sneak in definition of information and metadata in intro. 9> JH-should choose word and stick with it 2) change extensible to proactive a> how about portable b> JP- need to evolve d) guiding principles 1) RL- concern that scientists have individual research agendas a> in saying specific research uses are we opening ourselves up to having to customize 2) GC- balance between network and site a> sometimes best site tools are not best network tools b> more detailed than this document 3) JH- are doing ecological research using information technologies but we have never developed information technologies a> should not just conceive of ourself as users b> requires at least an order of magnitude c> could be separate bullet 4) JH- finding data is only one component a> addresss long term continuity of data b> JP- bullet on archival 1> JH-goes beyond archival, need to make usable now and future 5) JP- I prefer improve query and information systems 6) JH- need to evaluate the way science is done a> science drives the system 7) MT- questions in second paragraph are confusing - don't want questions in vision statement 3. Group 3 a) disliked term "strategic plan" - preferred "strategic initatives" 1) have existing system 2) call for 3 strategic initatives a> insure info system development 1> provide testbed for research 2> implement/research/improve b> data is an integral part of the LTER research platform 1> site/local 2> network 3> international 4> discussion - gets to scale issue c> develop human resources necessary to insure proper and continued development 1> develop traitng a: information systems b: applications b) two things we do: 1) research on how to display information etc. 2) also facilitating the science 3) feedback loop! c) need to remove data manager from information access loop 1) move more from DM to IM to PI d) roles of researchers and data managers for legacy data 1) whose job is it to make sure data is submitted e) bullet framework might be better 4. Group 4 a) key points that need to be emphasized 1) long term investment that may not show immediate results 2) network-wide focus on information management solutions a> will require sites to relinquish some autonomy b) under ease of access 1) explicit that information system should be accessible platform independent c) stimulate and intiate development of tools 1) in collaboration with SDSC perhaps... d) stressed importance of idealized look at what information management system should do 1) needs to be framed as attractive vision 2) could aid in getting funding 3) broad terms a> integrated system b> hierarchical access 1> central point - what types of data does LTER have e) possibility of putting together proposal to get funding to implement goals 1) valid cross-site proposal 5. discussion a) carrot vs stick models for inducing sites to participate in network activities 1) every site needs to give up something to have information system 2) sites need to get something back for that 3) need to WIN battles, not LEGISLATE them a> concensus takes time 6. Group 5 a) bullets tend to draw attention, but are not necessarily most important thing b) critical aspects 1) site science 2) archival quality of data a> different from immediate availability 3) network goals 4) serving as information interface 5) information technology use and development c) vision statement 1) potential users a> it is recognized that data mangement is interface to or for... b> Interface - proactive stance d) guiding principles 1) accepting system development as an ongoing process 2) questions need to be put as affirmative statements e) footnote 1) stands out as negative thing... f) feedback from PIs 1) did at PAL - got positive response 2) recommend passing around at site 3) needs to go to both lead PIs and all other PIs 7. will be working group assigned to assimilate suggestions from working groups - heads of all working groups D. Additional Topics 1. Data Requests Description a) requests for data drove need for document b) request considerations 1) context a> who b> terms defined c> scientific question d> specific needs 1> examples e> time frame 1> spatial 2) end product 3) feedback a> summary table helps c) answering questions is iterative process d) recent requests 1) electronic communication survey - Stuart Gagnon 2) PI site survey - Hayden a> took iterative survey with summary table showing only data 3) climate data - Greenland/Rosentrater 4) species list - Waide 5) accession numbers - Bledsoe 6) FLED metadata - Pake 7) metadata - Veen e) frustration with surveys that don't do their own homework 1) often request specialized processing 2) do need to have means for feedback 3) expectations.... need to wrestle with them 2. Executive Committee Perspective - Hayden a) site reviews - what should the site review panel expect in the area of data managment 1) site and proposal reviews 2) currently look at system using Net a> but lead time was extremely small 3) NSF PO has discression on charge given to review committee 4) plan a> establish document specifying expectations of site 1> Level 1 - ...... a: needs to be defined 2> perhaps higher levels - burgeoning technologies a: parts may eventually be part of level 1 b> expectations change -- so document needs frequent updating 5) eventually would like for other areas of LTER endevour 6) charge by executive committee for development review criteria a> need by some time next summer b> with bi-annual updates 7) discussion a> creation of corporate history b> seriously hold people to standards c> three levels suggested 1> level 1 is essential capabilities 2> level 2 enhanced.... 3> should NOT get specific d> problem with generic document is judging sites at different ages e> this is our voice at table 1> work 2> but good opportunity to recapture voice at table b) new sites - LMER may become LTER 1) how do we bring them up to speed quickly a> did not have long-term mission previously 1> lack infrastructure b> what is appropriate time to get up to speed? c> will be holding workshop to discuss possible transition 1> will try to have DM at meeting a: would like volunteers and designated representative 2> no time-frame YET -- could be as soon as the Fall at annual LMER meeting 2) how would a brand-new site come up to speed 3) discussion a> reminds me of management by objectives faculty review 1> essentially ranks b> but don't want second-class citizens! Need to bring new sites up first c> no caste system!!!! d> possibly standing committee - new sites c) publication 1) meeting on 4-5pm August 2 to talk to publishers at ESA a> publishers will be putting forward proposals 1> cost 2> marketing b> proposals to support publication series 2) general PI agreement that there should be a site synthesis series by single publisher a> not iron clad b> strong encouragement 3) site synthesis is one possible series a> goes beyond individual articles 4) also, site benchmark system a> both inside and outside LTER b> pull together all available information 5) also - Network syntheses - cross site a> with help of Synthesis Center 6) technical publications a> DM have a lot to offer here b> e.g., Albequerque meeting c> also new instrumentation etc. d> methods book 7) interest in electronic publication 8) having publishers online will help with getting funding for workshops etc. d) discussion 1) what do you see as needs for information management systems? a> need to make more transparent across sites b> ways to foster intersite research c> should be part of future initiatives for data management d> how do you integrate data and informatin from cross-site proposals? E. Information System Introduction and Overview - Hastings 1. Personal (but generic) vision of what might be meant by LTER Network Data and Information Management System a) Data & Information Management 1) why distinction between Data and Information? 2) took while to come to terms with them 3) Data - quantitative, precise, numerical 4) Information - qualitative, largely textual and graphical envelope in which that data exists a> both envelops the data and comes after it b> Information can be cooked out of data c> some Information is very precise 1> e.g. GPS locations used in calibration are precise d> metadata includes both data and information b) System 1) organized, methodical, reproducable structure and process for managing a flow a> in this case flow is data and information 2) example: Cliff Lodge Hotel is system a> process and structure go together to meet guest needs 3) example: Bread and Breakfast a> also a system b> simpler c> homey - less industrial d> can't multiply by people by 200 and make a Cliff Lodge c) Library 1) captures good deal of what we mean when we say LTER data and information management system 2) place 3) also a system for managing data and information a> usually focused on printed matter b> also maps, seeds, rocks - not necessarily printed 1> there for information content - not display 4) exchange facility for meaning of things 5) organized in standard ways a> useful terminology 1> holding - whatever it is you have in your library - physical materials of library a: generally accessible, but not always - rare books etc. b: use technology - eg movable shelves 2> catalog - now mostly electronic a: reduces cost in installation and maintenance b: couldn't keep up with new publications - volume c: benefits 1: only file once 2: remote access d: includes a dozen or so attributes or items - access points 1: e.g., author, title, subject, keywords A. rules for what is an author 2: entry actually printed in holding e: how do you deal with 1: editions 2: series A. periodicals B. aperiodic 3: multiple copies 4: access restrictions A. holds B. late book fines b> DON'T have inventory 1> don't really know precisely where book X is a: only generally know where it is b: use indexed sequential access method - manually! c> have faculty 1> dean of libraries is very important position 2> professionals 3> student volunteer staff d> organized by departments 1> acquistions - process new holdings 2> circulation 3> research a: seeks new sources of materials 4> reference a: helps people find things e> chronically underfunded, overcrowded, short-staffed - periodic upgrades f> many have distributed themselves - many disciplinary libraries 2. points a) many overwhelming parallels between library and data and information system 1) too little appreciated b) data and information managers are highly technical librarians 1) like library dean c) currently have bread and breakfast boutique library d) users want more industrial style 3. could have cake and eat it too with federated approach a) federation of independent specialty libraries b) distributed private catalog 4. can accomplish because a) we are technical specialist librarians b) have a mechanism and tradition of federation c) know at some level that tensions will increase 1) volume growing rapidly 2) still painful process 5. metadata only mentioned twice - dreaded M word! prefer data and information and library model 6. may need to consider creating federation a) need to develop and agree on process for getting there 1) no specifics on technology b) process needs to be fairly formal - document: 1) find current and expected future states a> what will be here 2-3 years from now 2) questionaire (not a waste of time) - want good solid answers c) executive level vision and description of federated library 1) need to go beyond general vision d) some preliminary prototyping - try out current technologies -- see what might be possible 1) can write up e) implimentation plan 1) set of blueprints 2) schedule 3) budget 4) guts of new proposal f) rollout 1) training for users and self g) operational maintenance plan h) need to avoid hodgepodge! 7. unusual and uncomfortable planning process a) but have gone as far as we can at individual sites b) need to leave site concerns at door 8. funding often referred to as "gate" -- but that is smallest problem - more serious a) trained, sensitive, willing personnel 1) we have people who can guide work, but we need to locate other people who are not concerned with day-to-day site work b) collaborative venues 1) repeated face-to-face meetings 2) several times a year, minimum 3) can do a lot long distance, but not everything 4) need compatible equipment a> wrestling with on X-Roots project c) need "hard" decision process 1) makes hard decisions e.g., door will swing out, not in 2) not majority rule 3) John Briggs discussion d) set of documents 1) needs and wants 2) what will final system look like 3) vision statement is part of this process 9. funding issue a) each one of possible collaborative entities (NCSA, SDSC etc.) will not be going away b) can help with getting funding c) we don't need to worry early on about size of $$$ flow needed 10. discussion a) how long will it take? 1) depends on planning process -- what is goal -- needs to be well defined 2) can get early results as part of big picture 3) need to get master plan ASAP 4) think could be within a year 5) working on proposal b) analogy of hard decision making process... even if 80% didn't want doors to swing out hard decision will facilitate.. but may be false analogy -- if door swinging out is needed, you'll be able to get concensus 1) need to have mechanisms for building concensus - can't ram down throat 2) can't afford to alienate 3) A: a> sometimes ask wrong questions too early b> worry about paint color before putting in larger context c> don't Balkanize over little stuff c) Cliff Lodge is relatively permanent - long life 1) information system must evolve rapidly with technology 2) cells are replaced but body goes on d) in federated system 1) whats left to do a> have pointers to site catelogs etc. 2) A: a> need to have readers - people who have training to make good sense of things b> in library translation is in hand of reader -- library does not translate different languages for you e) what should role of system be in reformatting data? AI literature has lots of valleys.... may never have reader that can parse information into desired format f) what is reader? 1) active agents - software that works on your behalf to explicate text 2) goes beyond viewer g) is it a substitution for training of patrons? 1) can't afford to learn all languages 2) do need to learn how to use catalog h) need commitment as ongoing minimum statement i) catalogs are very complete in library - LTER catalog is very incomplete and ill defined right now 1) right now need to abstract from source 2) alternative is full search j) another theme - How far do we reach? Most major scientific journals are in English - patrons may be able to meet us over half-way 1) expectations of outside world k) many datasets extremely complicated - require careful reading 1) many scientists just want numbers 2) beyond a certain point we will get diminishing returns 3) different than molecular l) need to be able to deal with whole spectrum of ability to integrate data m) don't like distinction between data and information 1) e.g., monthly averages 2) favorite story - floating bouys in S. Hemisphere caused problems for general circulation models a> but problem was when they crossed equator, they did not change sign b> had to go and correct data and reprocess 3) another story - gates on data threw out outliers a> caused it to miss some things! n) derived products may be good term 1) derived products are information 2) but also calibration records that are unrelated to raw data o) already are federation - why we have catalog and data online 1) don't have consitution p) some things may not be as hard as they seem 1) BNZ automatically scans web servers for data 2) clever use may allow doors to swing both ways q) online reference librarian possible - rule-based AI 1) but don't have to invent - can go to specialists r) want to get away from DATA 1) 99 is number (data) 2) 99-degrees C is information (have attached more to data) 11. Handouts on why computer systems fail a) which problems apply to us! F. What are next steps between now and this time next year? 1. need concrete things for working groups 2. interest in alternative approaches to system design - what would YOU do in developing system? 3. need to define what we want to do 4. confusion about role of data management in things like data reduction - processing data to meet specific user needs 5. players a) not really clear what each of these should be supporting and doing b) Site c) Network d) Network Office e) Collaborators 1) outside f) expectations run entire gauntlet, but support is at site level 1) also at Network Office 2) so need to focus at site and network office - vision for future a> proposal is being written for network office - need input 3) should also work on proposals at intersite level a> need to come to terms with control and DM of intersite b> what data from intersite is in DM systems? 1> right now only know about variability workshop data a: small dataset 6. network a) as a collection of sites b) network has never taken responsibility (actually do have North Inlet data) to go back to failed sites to get their data 7. DM have many hats 8. mission for breakout groups a) site b) network c) global d) users, audience, functions e) dream system at different levels 9. comments from Gosz a) worried that we have B&B people here -- are they equiped to design Cliff Lodge? b) there is strong network of B&Bs across the country that share maps, directions etc. c) is a sense that network system is needed - issue is common voice 10. need to look at other Cliff Lodges.... 11. Two basic questions a) what is it? b) how do we get there? 1) be as specific as possible G. Working Groups 1. Working group 1 a) focus on function b) LTER Catalog system -ENTER - Expert Network Tool for Ecological Research 1) queriable and browsable a> natural language b> able to provide help c> able to suggest additional information d> transparent - centralized look - current and up-to-date e> able to get 'product' when listed 1> process request locally to fetch data 2> products could be datasets, graphs, pictures a: arbitrarily rich - combinations of many different types of data and information f> platform independent both on client and server end g> able to provide statistics on use 2) what do we have a> B&B Directory - Site information 1> location 2> emphasis 3> personel 4> contact 5> directions b> movement from low-information files (ASCII) to high information (graphs etc.) c> publication level data vs raw data 1> use at own risk 2> vs publication quality 3) discussion a> how do we get there? 1> need another 45 minutes 2> more approach from functional standpoint than from structural standpoint b> need to archive queries 1> unsuccessful queries valuable research base... 2. Working group 2 a) functions 1) transparency 2) distributed 3) queriable 4) input a> not just extraction b> managing intersite types of datasets c> evolving paradigm - Lattice System 1> allows growth in information model by adding modules 2> some loss of efficiency as modules are added b) implementation 1) workshop a> vision needs to be further refined b> define functionality 2) mini workshops that pick apart functionality of system a> possibly at synthesis center 3) workshop 2 a> implementation c) role of outside collaborators 1) SDCS as testbed for large datasets 2) shared personnel with network office d) discussion 1) who would attend workshop? a> woke up before getting to that in group.... b> need users and some experts c> might lead to another workshop on functionality... 3. Working group 3 - notes a) functions 1) catalog a> need to achieve system that allows you to transparently access data from all sites 1> right now need to browse individual servers which are organized literally b> much easier to be too ambitious and fall flat on our face -want clearly do-able steps c> content standards for catalog entries 1> subset of metadata standards d> consistant formating within sites - ideally network-wide format 1> could handle with filters 2> create standard view of catalog entry e> can run at each site or centrally 2) tools for evaluation of network functions a> user feedback b> monitoring logs c> analyses of use 3) query a> acts on catalog b> or on original dataset c> catalog -> metadata ->data 4) could provide translation of metadata into consistent formats a> create standard view of metadata b> in long run - actual DBMS system c> consistent structure of metadata - at least within site critical 5) translation of data into consistent formats 6) analysis or display tools a> integration tools for cross-site data merging b> fruitful area for collaboration with NCSA, SDSC 7) complex/cross-dataset queries a> integration tools b) general process 1) development of distributed cross site catelog a> requires consistency within site b> standardize keywords 1> needs to be process acting on initial set of keywords 2> development of master list 2) development of query system for catalog a> minimum metadata standards b> structure/format/access method standard within site c> writing filters to produce desired displays 1> could serve as intermediate inteface to other programs 3) development of analysis and integration tools c) specific process 1) standing committee to develop content standard and recommended structure for catalog entries a> minimum standards b> expansive standards 1> graphical data etc. 2) development of filters role for network office a> hire Rick Ingersoll to develop filters.... :) b> or collaborator d) discussion 1) none 4. Group 4 a) dream system 1) while sitting by pool, ask system why there is biodiversity. System queries all site servers, answeres questions and automatically submits paper to Nature which is immediately accepted..... then we woke up..... b) what is it? 1) system consisting of an amalgamation of locally maintained datasets-tables transparently visable on the net 2) with components to tie distributed datasets together in queriable form c) how to get there? 1) workshop on creating workshop on a vision/working model for information system for LTER Network a> will use as example cross-site experiment b> focus on distributed DB access and info retrieval c> will be feasibility study of domain specific experts and members of computer science community 5. Take information and sleep on it -- will talk about it tomorrow... H. Internal Structure - Briggs 1. standing group - membership a) propose formalized schemes b) have reached size where needed 2. review of proposed structure a) rotation scheme b) datatask - serves as interface to outside investigators etc. - representatives of group as whole.... 1) never formal c) list on datatask was not right 1) confusion on who is on datatask d) also chair person 3. proposal a) 6 members - rotate off 2 per biennial 1) can re-up 2) but limit of re-up for six years b) chair - should serve at least 4 years 1) need some stability here c) acknowlegement of standing committees 4. comments a) what is term on executive committee? 1) 3 years 2) how chosen a> election b) timing 1) plan to do by calendar year 2) but may need overlap c) need to get organization committee for next year's meeting d) what are current standing committees 1) meeting organization - but this year was data task 2) connectivity committee - appointed by CC? 3) site guidelines for review - committee needed 4) information management system - committee e) want input from everyone in the room f) Chair of datatask be chosen by data task? 1) yes, but should be approved by overall committee g) eligibility - often more than one per site 1) network office is ex-officio member 2) one member per site maximum? - probably not 3) or just leave up to voters! 4) not just rule but merits of individuals h) want at least one from each of the cohorts on data task! i) need specifically worded motion to present to group j) standing committees - decisions/actions need to be approved by all of dman 1) chair of data task responsible for disseminating k) datatask is similar to executive committee for CC l) Reports need to go to CC meeting prior to general publication - sending to NSF... 1) especially for design of system 2) still are subcommittee of CC I. Agenda tomorrow 1. follows up on things discussed today 2. built in open part to come to closure on things 3. want to finish some things.... a) also want to let folks add things to agenda 4. leaders of working groups on strategic plan need to get together to edit document J. Final Comments - Gosz 1. open competion for network office a) executive committee writing proposal b) very much want impact on what to build into proposal 1) this is the group to write it 2) need WORDS 3) need WHAT YOU WANT 4) need WHAT YOU WANT NETWORK OFFICE TO DO 5) need BUDGET c) starting from scratch on this proposal 1) relatively short window to do this 2. have not yet received announcement for competition 3. CC meeting Oct. 21-22 4. discussion a) should we dream big? 1) think about at least a decade of effort a> probably 6 year effort 2) need vision and goals