The 2015 Sagan Exoplanet Summer Workshop – Exoplanetary System Demographics: Theory and Observations

Every Summer, the NASA Exoplanet Science Institute (NExScI) hosts the Sagan Workshop, which sets out to advance scientific and technical understanding of the objectives of the NASA Exoplanet Exploration Program, and is aimed primarily at early career scientists.

This year, the topic was Exoplanetary System Demographics: Theory and Observations. From the conference web page: “The 2015 Sagan Summer Workshop will explore exoplanetary systems through the combined lens of theory and observations. Several observational techniques have now detected and characterized exoplanets, resulting in a large population of known systems. Theoretical models, meanwhile, can synthesize populations of planetary systems as a function of the input physics. Differences between the predicted and the observed distributions of planets provide strong constraints on the physical processes that determine how planetary systems form and evolve, ruling out some old theories while suggesting new ones. Leaders in the field will summarize the current state of the art in exoplanet observations and planet formation theory. Observations needed to discriminate between competing theories will be discussed and compared against the expected improvements in exoplanet detection limits.”

The standards of presentations was uniformly high, and you can see them here, along with the full agenda here.

Here are three of my favorite videos:

Gary Blackwood, introducing the NASA Exoplanet Exploration Program:

Transit Surveys, by Courtney Dressing:

and Imaging Surveys, by Beth Biller:

*

 

 

 

 

 

Posted in astroinformatics, Astronomy, astronomy surveys, computer modeling, computer videos, Computing, computing videos, exoplanets, informatics, information sharing, Kepler, software engineering, telescopes, Time domain astronomy, time series data, Transiting exoplanets, Uncategorized | Tagged , , , , , , , , , , , , , | Leave a comment

Just How Should We Cite Astronomy Software?

The central importance of software in astronomy is generating debates on issues such as how software should be cited, how credit should be accorded to software practitioners, and how software should be preserved and created.

The American Astronomical Society (AAS) publishing team is participating in these debates. They took part in a meeting organized by the Sloan Foundation, with the goal of establishing “protocols, policies, and platforms for astronomical software citation, sharing, and archiving.” You can read about the meeting below (published at http://astronomy-software-index.github.io/2015-workshop/). The Society is actively seeking feedback on the document, and the community is invited to share the document, comment on it, discuss it on social media using the hashtag #astroware, or send private comments to julie.steffen@aas.org.

 _________________________________

Invitation to comment on a proposal for a cohesive research software citation-enabling platform

by Laura Norén*

* This overview was written by an ethnographer who has been involved in the astronomy community in the interest of presenting a relatively unbiased summary.

» Overview: Why (and how) to do research software citations in astrophysics well

A group of roughly forty professionals in the astronomy community from university faculty, telescope facilities, grant making foundations, software repositories, journal editors, and libraries met in May 2015 to formulate the goals and establish a plan to build protocols to weave software citation practices into science-as-usual in astronomy.

This effort dovetails with a larger trend in astronomy (and the natural and social sciences) towards more computationally intensive research. Writing research software is becoming an integral part of science-as-usual though norms and protocols for recognizing, sharing, and rewarding those who contribute well-crafted software to the scientific process have not kept pace. As yet, practices for citing, archiving, assessing the quality of, and otherwise integrating software development into the astronomical research processes have been led by forward-thinking voluntarism in a well-intentioned, time intensive, flexible fashion that has produced a set of experts within astronomy who have a clear understanding of the challenges associated with widening research processes to efficiently and sustainably incorporate research software. Voluntary project-based efforts inevitably run into financial and temporal limits. The current meetings aim to establish processes that use human curation wisely (and sparingly) in conjunction with greater automation, some crowd-sourcing, and author-performed tasks.

One of the first questions that arose at the meeting was about what it means to have a contemporary career in astronomical research and what software skills mean for career paths in astronomy. Should tenure decisions take software citations into account? If so, what is the best way to develop well-understood norms for the proper citation of software? Should there be full-time non-faculty positions created in academia for software developers? What about astronomers who spend time working for telescopes, especially as post-docs (or the equivalent)? How can they maintain coherent curricula vitae so that they may be eligible for academic positions in the future? Should software citations mean the same thing as other types of citations? Will papers that cite “too many” pieces of software be respected in the profession? And how many software citations are too many?

The anxiety around this line of questioning is palpable, and not just in astronomy.

At the workshop in May, the group opted to table the career path discussion in favor of addressing a problem that could be solved: improving the process for citing research software.

Questions about the normative value of software citations is beyond the purview of this effort, though it is clear that changing the technical system around research software will have an impact on the social value of software creation, sharing, citation and archiving. The goal was to make a plan, assign tasks, avoid intractable questions, engage the community who could not attend the event, and solve research software citation.

» Invitation to participate

Please discuss the following proposal here and on social media using the hashtag #astroware (astronomy + software = astroware).

The main purpose of this document is to generate discussion and elicit feedback.

» What is the purpose of the proposal for research software citation?

To create protocols, policies, and links between existing platforms for astronomical software citation, sharing, and archiving that are:

  • uniform;
  • public;
  • uninterested in centralized gatekeeper systems;
  • using health checks rather than upvotes to indicate value;
  • resilient to the (mis)fortunes of specific companies, institutions, and individuals;
  • as simple and efficient as WordPress;
  • as useful as git for version control and GitHub for collaboration;
  • as robust as the taxonomy and metadata management systems for print publishing;
  • as meaningful to peer participation and career progression as journal publishing has been;
  • intrinsically incentivized to encourage researchers to upload software in robust ways (e.g. not to their own webpages);
  • thinking more about building a better future than carefully archiving the past.

As one of the publishers at the workshop summarized, our strategy is to, “make clear how to [cite and store software] but not require it. Show off success stories and serve the community rather than police the community.” This preceded a similar opt-in sentiment adopted by a different group of forty professionals in scientific publishing who released Transparency and Open Publishing (TOP) Guidelines in July. This astroware proposal would help meet TOP Guidelines but is a proposal to build and connect existing platforms and is not a competing set of guidelines.

» To-do list

Please weigh in on the specific tasks below using the numbers associated with each task.

Additional information about working partners is included after the to-do list.

1. Establish a preferred location for the generation of unique identifiers for software

One goal from the workshop is to select and promote a single place for the generation of unique digital identifiers (DOIs). The criteria for selecting a registry include being able to work with the astronomy community, to establish an appropriate set of metadata collection practices, to automate certain processes, and to allow for manual data collection/correction. Evidence of effective past interactions and high adoption rates is key.

Zenodo is the proposed candidate to generate unique software-object identifiers.

2. Define a metadata protocol for software

A major task along the way to registering software and meaningfully sharing it is establishing a metadata schema. The easier it is for researchers to request a registered ID for their software, the more likely they are to do so. Thus, some fields can be automatically populated with no human intervention required, assuming the software repository is within the ecosystem (e.g. GitHub, Zenodo).

Some librarians and academic astronomers voiced a desire to ensure the metadata protocol could integrate with reference tools like EndNote, RefWorks, etc. though no representatives of those companies were present.

The proposed format for metadata is to carry it in a package.json type file.

The fields marked with an asterisk are required. All others are optional.

Proposed Metadata schema

*Item Type: Software
*Author(s):
(affiliations)
Contributor(s):
(affiliations)
*Title:
*One liner description
Description that can be longer than one line (Could this automatically
include text from a readme.txt file, if one exists?)
Tags (auto-populated with an option to manually finesse)
Class (e.g. library, utility, simulation, analysis, executable, GUI...)
*Subject Area (librarians in the group wanted this to be a required field) Algorithms, Methods
URL to Home Page
DOI to Software (or software metadata, but that would be self-referential)
Primary Citation (to paper or website describing software, if any)
Preferred citation (could be to a repo, paper, website or other item cite-able now or in the future)
Language(s) (e.g. python, javascript not English, Spanish, etc.)
Platform(s) (e.g. Windows XP, Mac OS X, Linux)
Major dependencies
*Version (e.g. 10.x, 12-Jan-2015)
Date of First Release (ideally, this is automatically pulled with an option to manually adjust)
Date of Last Revision (Last Commit) (ideally, this can be automatically pulled in)
Contains / Is a part of/ other relationships
Format for Data Input (MIME type(s))
Format for Data Output (MIME type(s))
Free or fee?
Source of funding
*License
Open Source? (via license type in search interface)
Size in Lines of code OR bytes for compiled code
Typical execution time (e.g. n sec on hardware-spec)

3. Consider normative standards for citing software

Figuring out what needs to be cited in order to maintain the professional norms of the community is closely related to the creation of technical platforms and protocols. This process of arriving at normative behaviors is iterative and emergent. We invite comments around norms but are not currently tackling the project of enforcing behavioral change.

4. Quarterly workshops

Coordination is the key to successful integration of the existing astronomical research software citation components.

Quarterly technical workshops between publishers, librarians, software developers, telescopic institutions, grant makers, ethnographers, and academic astronomers to see process through development, testing, and implementation are part of the grant currently being proposed.

5. Who will be involved? Existing keepers of astronomical research software

The demand for a more robust, long-term sustainable process for citing software is a response to both the success and limitations of the current sociotechnical systems for citing, sharing, storing, and discovering software for research. Representatives from all of the organizations mentioned below were in attendance at the workshop and have remained actively involved in the efforts to build better sociotechnical systems for sharing astronomical research software. Their insights and willingness to volunteer their time are a phenomenally promising aspect of the current effort.

Astrophysics Source Code Library

The Astrophysics Source Code Library (ASCL) currently serves the astrophysics community as a free source code registry. Hundreds of volunteer hours of human curatorial labor have gone into the creation of the ASCL. Lessons learned by the human curators point to the necessity to keep some human curation, but incorporate automation and author-based tasks to make sure human time is used efficiently.

Astrophysics Data System

Any identifier is only as useful as its adoption rate. In this case adoption requires users to request a registry ID and for publishers, libraries, and commercial repositories to recognize registry IDs. Within astronomy and physics, the Astrophysics Data System (ADS) hosted at Harvard and supported jointly by the Smithsonian Astrophysical Observatory and the National Aeronautics and Space Administration (NASA) operates as a digital library and already, “provides access and pointers to a wealth of external resources, including electronic articles available from publisher’s websites, astronomical object information, data catalogs and data sets hosted by external archives.” Our goal is to make new protocols that weave neatly together with ADS and ASCL so that past projects are not lost.

GitHub

Citing software is a primary goal that for many researchers is closely related to the secondary goal of effectively sharing software within the research community. GitHub is currently the most frequently used web-based storage, version control, and sharing repository for astronomical software. Because of GitHub’s dual purpose as a repository and an incubator for ongoing software projects, integrating object identifiers with GitHub-based discovery processes is important for traction. Researchers who are better able to make their software projects discoverable to relevant colleagues are probably more likely to register their own projects.

There is some anxiety about incorporating a for-profit company into the scholarly ecosystem. The reliance of scholarly careers on for-profit academic publishers has generated criticism and frustration in the past. However, GitHub relies primarily on the private sector outside of academia for revenue. When researchers opt to make their repositories open to the public on GitHub, they do not incur fees. GitHub only charges for private storage. Because a sizable number of their private-sector clients pay for storage, GitHub’s revenue growth is healthy without requiring financial input from the astronomy community.

Zenodo

Zenodo already provides an easy way to “share the long tail of small research results” within the sharing pipelines familiar to astronomers by archiving a snapshot of their software and assigning the snapshot a DOI when a release is made in GitHub. With their group of developers, Zenodo is taking an active role in the technological development of new protocols and connections between the databases underlying journals, digital ID registries, GitHub, citation software (e.g. Zotero), and other software platforms in the ecosystem.

American Astronomical Society

Journal publishers like the American Astronomical Society (AAS), which is also the main professional organization for astronomy, are critical partners in any ecosystem of information sharing. Leaders from the AAS were in attendance at the workshop and continue to take the lead in this effort. Separate citation accounting for papers that are about software (“software papers”) and for citations of software within research papers is a logical need. The AAS journals will not only sort these accounting matters, they will also play a key role in drawing the connections between the aforementioned platforms and organizations.

Concluding: Thoughts on the future of software citation in astronomy

The meeting concluded with a round table (well, actually, there was no table and people were mostly sitting on couches) summing up the perspectives that should drive work forward. A telescopic instrument manager reminded us that, “it is important to get the sociology of this right. It has to be perceived as by developers, for developers. We want to present this as a goodie rather than a process.” One of the grant makers invited the group to draft a proposal that would, “[get] the community to participate” because “the essential part is not seeing how the world ought to be but seeing how to get there.”

Sharing the proceedings of the workshop and the draft protocols with the broader astronomy community is our good faith effort to be earnestly transparent and to eagerly engage those who were not at the workshop.

Are you in the astronomical community? Please join the conversation using #astroware

If you think what you have read will solve problems for you, please let us know what kind of work you do with software in astronomy.

If you are coming from another field, feel free to chime in as an ally and share how this proposal could solve problems in your field. Or maybe your field already solved software citation and you want to share some best practices. Please do.

If you think we have missed something, tell us how you would expand.

If you think we have gotten something wrong, kindly share your critique.

Please comment here or use hashtag #astroware if you end up having the conversation elsewhere on social media.

You can suggest revisions to this document on GitHub here.

Posted in astroinformatics, Astronomy, informatics, information sharing, Open Access, Open Source, programming, publishing, Scientific computing, social media, social networking, Software citation, software engineering, software maintenance, software sustainability | Tagged , , , , , , , , , | Leave a comment

“Software Use in Astronomy: An Informal Survey” by Momcheva and Tollerud

This the title of a paper by Ivelina Momcheva and Erik Tollerud that was recently posted on astro-ph at http://arxiv.org/abs/1507.03989.  Between December 2014 and February 2015, they carried out an informal survey about software use in astronomy,  marketed mainly through social media. It is not presented as representative of the worldwide astronomy community, and there are no attempts to correct for selection biases: the results are presented as collected. I recommend this article to everyone writing software in astronomy. Figs 1 and 2 alone make the article worth reading (no spoilers!). You can also interact with some of the visualizations I reproduce below.

Participants were asked the following questions:

  1. Do you use software in your research?
  2. Have you had formal training in software development?
  3. Which of community or self-written software is more common in your work?
  4. Select which if the community tools that you use regularly for your research.

Three questions requested basic demographic information:

1. What is your field of research?
2. What is your career stage?
3. What is the location of your institution?

The survey received 1,142 responses, across all career stages. 100% of respondents used software to do research,  yet only 8%  report that they have received substantial training in software development. Another 49% received “little” training, and the remaining 43% have received no training. This seems to be true across all career stages:

2015-07-24_17-53-06The same is true for the 90% of participants who write their own code:

2015-07-24_17-55-19

Astronomers appear to use a quite narrow set of tools, with 10 tools used by more than 10% of respondents, again with little demographic variation; e.g: 2015-07-24_17-59-11

That Python tops the list of tools should surprise no-one, given the high cost of licensing its closest rival, IDL, and the outstanding free Python distributions available to astronomers. We often talk of IDL having lost much of its market share to Python (and the comments at the end of the paper do back this up), but it’s not that simple, as many astronomers do use both:

2015-07-24_18-04-30

Some  of the most revealing information was in the comments at the end.

  • Many respondents were learning Python, and its displacement of IDL as the programming language of choice will almost certainly continue.
  • Many decried that lack of formal training on software engineering, and some thought it should  be a required part of their graduate programs.
  • There were suggestions that greater credit and career opportunities should be afforded to those developing community software.

 

 

Posted in astroinformatics, Astronomy, astronomy surveys, Career Advice, Computing, cyberinfrastructure, informatics, information sharing, programming, publishing, Python, Scientific computing, social media, social networking, Software citation, software engineering, software maintenance, software sustainability, user communities | Tagged , , , , , , , , , , , , | Leave a comment

The 8th Extremely Large Databases Conference and Workshop

These annual workshops that discuss ” …the real-world challenges, practical considerations, and nuts-and-bolts solutions in the realm of managing and analyzing extreme scale data sets.”  The attendees include Big Data users from industry and science, developers, researchers, and providers. This year’s topics were:

  • Integrating statistical tools with databases and clouds
  • Big Data centers
  • Current practices, and unsolved challenges in Big Data
  • Urban science.

PDFs and videos all the talks are on-line at http://www-conf.slac.stanford.edu/xldb2015/ProgramC.asp, and there is a youTube channel at https://www.youtube.com/playlist?list=PLE1UFlsTj5AGd364NBD2R1_RkESSgn8rD

If you are interested in Big Data and its challenges, these talks are well worth a look.

Two of my favorites are
“R in the World: Interfaces between Languages,” by John Chambers, and “Critical Technologies Necessary for Big Data Exploitation,” by Stephen Brobst. You can watch them on the youTube channel (the embed codes appear to be incorrect, so I can’t post them here).

Posted in astroinformatics, Cloud computing, computer modeling, computer videos, Computing, computing videos, cyberinfrastructure, Grid Computing, High performance computing, informatics, information sharing, Parallelization, programming, publishing, Python, R, Scientific computing, social media, social networking, software engineering, software maintenance | Tagged , , , , , , , , , , , | Leave a comment

Toward a Framework for Evaluating Software Success

Many of us in the astronomical software business have been debating the best way to evaluate the quality of software and its success within its user community. Here is one proposal submitted by a group of us to the Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP) Workshop, October 15-16, 2015, Washington, DC, USA.

Briefly, we are proposing the creation of a software “peer-review group,” comprised of grant recipients funded to develop sustainable software, who would meet periodically to evaluate each others’ software, developing and refining success metrics along the way. What do others in the field think of this approach?

Toward a Framework for Evaluating Software Success: A Proposed First Step

Stan Ahalt (ahalt@renci.org), Bruce Berriman, Maxine Brown, Jeffrey Carver, Neil Chue Hong, Allison Fish, Ray Idaszak, Greg Newman, Dhabaleswar Panda, Abani Patra, Elbridge Gerry Puckett, Chris Roland, Douglas Thain, Selcuk Uluagac, Bo Zhang.

Software is a particularly critical technology in many computational science and engineering (CSE) sectors. Consequently, software is increasingly becoming an important component in the evaluation of competitive grants and the execution of research projects. As a result, software can be viewed as a scholarly contribution and has been proposed as a new factor to consider in tenure and promotion processes. However, existing metrics for evaluating the capability, use, reusability, or success of software are sorely lacking. This lack of software metrics permits the development of software based on poor development practices, which in turn allows poorly written software to “fly under the radar” in the scientific community and persist undetected. The absence of evaluation by knowledgeable peers often leads to the establishment and adoption of tools based on aggressive promotion by developers, ease-of-use, and other peripheral factors, hindering the sustainability, usefulness, and uptake of software and even leading to unreliable scientific findings. All of these factors mean that addressing the current lack of software evaluation metrics and methods is not just a question of increasing scientific productivity, but also a matter of preventing poor science.

As a first step toward creating a methodology and framework for developing and evolving software success metrics for the CSE community, we propose the creation of a software “peer-review group.” This group, comprised of grant recipients funded to develop sustainable software, would meet periodically to evaluate their own and each others’ software, developing and refining success metrics along the way. We envision the group as a pilot test for a potential larger-scale effort to establish a more formal framework for software success metrics and evaluation.

Framing Success Metrics

Our perspective on framing software success metrics arose from a breakout session held at a recent NSF-funded workshop attended by more than 75 Software Infrastructure for Sustained Innovation (SI2) principal investigators.  The breakout team identified the need to create a methodology and framework for academic software success metrics, brainstormed factors to consider in developing such a framework, and outlined the actionable steps needed to advance this effort. The idea of a software review group was introduced in these discussions, and possible outcomes—presented briefly here—were discussed. We believe further discussion by Computational Science and Engineering Software Sustainability and Productivity Challenges (CSESSP) workshop attendees will help to further develop these ideas and emphasize the importance of framing software success metrics as an integral part of developing a sustainable software ecosystem.

The Need to Evaluate Software Success

On the whole, the development of research software in academia, government, and national labs trails the rigor of industry-developed software. Incentives and measurements of what constitutes successful software differ among and within these sectors, yet all are ultimately part of the same software ecosystem. Generally speaking, successful software must be reliable, sustainable, have value to the target user community and beyond, and provide outcomes that are meaningful to societal stakeholders. Sound software development and engineering practices lead to sustainable software. Stakeholder adoption, use, and reuse of software create feedback loops that further enhance software success. To improve the productivity and sustainability of research software and the research communities it supports, we should be able to objectively measure what makes software successful—or not.

Factors to Consider

There are multiple dimensions to consider in developing an effective methodology and framework for evaluating software success. One dimension relates to the factors that contribute to software success, such as criticality, usability, performance, functionality, availability, and scientific impact. These terms may have different meanings in different fields; for example, usability may mean something different for networking software than it does for security software. Another dimension relates to the types of outcomes we might want to measure, such as the value of the scientific contributions of a grant or project, the value of the products of a grant or project (i.e., the value of the software), or the nature of the team’s “community conduct” (e.g., its value to the software ecosystem). Another relates to defining needs: for example, what is it that funders, researchers, or the broader community need to know in order to inform better decisions and improve sustainability? Finally, we must develop robust metrics to address these dimensions, inform project goals, and empower software creators, researchers, funders, and others to effectively evaluate software.

Next Steps

To begin to develop and evolve a software evaluation framework, we propose establishing a peer review group —an organization of representative stakeholders who will self-review software works created by their respective communities. This group would effectively constitute a pilot program to inform the feasibility, scope, and approach of a future, larger effort to establish and refine a framework for sustainable software metrics. At a minimum, this group would give its members an opportunity for regular review and enhance their own self- improvement processes. If successful more broadly, the group would help to characterize key challenges in software evaluation, define and refine evaluation criteria, and lead to a more informed approach to software development and evaluation for the CSE community as a whole.

We believe further discussion of this idea at the CSESSP workshop would refine and inform our approach and help to generate momentum toward achieving better software evaluation approaches. Examples of questions that warrant further exploration include:

  •  How should we determine who should be included in the review group? o What attributes make someone an expert software reviewer?
  • How should we manage the process for submitting software for evaluation?
  • Should we require all group members to regularly submit their own software
  • How can others opt in to have their software reviewed?
  • How will the process provide adequate protections against conflicts of interest, address reviewers’ knowledge limitations, and address the possibility that some software creators may be competing with each other or with reviewers?
  • How should this activity be structured to continually advance the ultimate aim of establishing anobjective set of review criteria that can be applied to different types of software?
  • What evaluation criteria or mechanisms are needed to ensure the group works effectively toward its goals?
  • What types of documentation or outcomes would be useful toward developing a larger-scale metrics framework

Submitted to: Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP) October 15-16, 2015, Washington, DC, USA

Report from the National Science Foundation-funded workshop held February 17-18, 2015, at the Westin Arlington Gateway in Arlington, Virginia for Software Infrastructure for Sustained Innovation (SI2) Principal Investigators, http://dl.acm.org/citation.cfm?id=2764957.

Posted in Computing, information sharing, Open Source, Peer review, programming, Scientific computing, Software citation, software engineering, software maintenance, software sustainability, user communities | Tagged , , , , , , , , , , | Leave a comment

Machine Learning with Scikit-Learn (I) – PyCon 2015

An excellent introduction to machine learning, by Jake VanderPlas at PyCon 2015. Long, but full of useful information.

Posted in astroinformatics, Astronomy, computer modeling, computer videos, Computing, computing videos, cyberinfrastructure, informatics, information sharing, knowledge based discovery, Machine learning, programming, Python, Scientific computing, software engineering, statistical analysis | Tagged , , , , , , , , | Leave a comment

What’s the Difference Between Cluster, Grid and Cloud Computing?

After my last post on introductory videos on cloud computing, I was asked if there were videos that explained the difference between Cluster, Grid and Cloud Computing. Here is a very good one by Prof. Ajit Pal, Department of Computer Science and Engineering, IIT Kharagpu. He explains the architectural differences between these platforms, as well as the implications for maintenance, deployment and cost. Although long at 55 min, this video is worthwhile for its technical approach for computer professionals  and for scientists wishing to exploit these approaches to computing.

 

Posted in astroinformatics, Cloud computing, computer videos, Computing, computing videos, cyberinfrastructure, Grid Computing, High performance computing, information sharing, Parallelization, programming, Scientific computing, Uncategorized | Tagged , , , , , , | Leave a comment