DoyenThe {problem, need, market} seems to be to develop a scientific computation platform. The goal of such a platform would be to create an ecosystem where scientists can
Computational science seems to be in a state where people work independently. They develop whole systems from scratch just to support their research work. Many make an effort to distribute their system but fail for lack of interest. Worse yet, the research that gets published often cannot be used by others or verified for correctness because the supporting code is based on a specialized system and gets lost. There is currently no community expectation that software should accompany research results. The end result is a loss of significant scientific wealth. We need to build a collection of systems in such a way that they will attract attention and use. The collection needs to achieve a critical mass of users. Thus we have to not only build it but promote it widely. In the ideal case scientists should be able to do research on common platforms that are centrally supported but locally available. The research results and its supporting software could be "published" to a central repository and made available worldwide. Conferences would use this repository as both a publication and presentation center. Scientists would use this repository to dynamically update local systems with the latest research results and software changes. So there appear to be some design criteria that we can express.
The breakdown of "Mother Doyen" and "Daughter Doyen" seems necessary to achieve certain goals. The Axiom wiki is most useful in a shared environment. Software and Research papers are best distributed in a one to many, publish-when-ready model which argues for a server (the mother doyen). However, research usually isn't published until it is ready. Heavy computation and special purpose rewrites are best done on a local machine. This argues for a client (the daughter doyen). Since neither completely covers the issue it seems best to assume both are needed and architect the solution to have both. The Mother DoyenThe browser model of a front end, such as the Axiom wiki, supports working on a remote server. The browser model also supports collaborative work with the advantage that there can be hyperlinking to other online work. This would be especially useful if combined with online conference proceedings. Perhaps there needs to be some "privacy" mechanism that would allow only certain groups to view and modify pages that represent active unpublished research. Further, a centralized model allows shared work of indexing and cross-referencing that makes the repository more valuable. It would also be possible to develop and maintain systems such as Axiom directly on the host. Suppose the user is writing a literate program (using a tool such as noweb). This literate program can be rendered using the wiki software (zope, python), or printed (assuming a local tool chain (noweb->latex->dvips). The program can be versioned under a DARCS or CVS system automatically. The key advantages of the mother doyen is that software can be centrally maintained, it could be demonstrated and used without being locally installed, it could be run from systems which do not have a port and it could be updated on a local machine with a single request. The Daughter DoyenThe browser model as a front end also supports working locally. Using the live-CD approach (Quantian) is a good way to distribute and advertise the available software. In addition it is useful for building a "standardized" scientific platform with a defined set of available tools like noweb, latex, darcs, a browser setup, as well as certain scientific software and libraries that are difficult for a user to install and configure. An initial implementation of this approach is described in DoyenCD and a copy of the CD is available for download. Another advantage of the daughter live-CD model is spreading awareness. CDs? can be given out at every scientific conference, of which there must be at least one per week. If we think about scientific software beyond the mathematical we can find other packages, such as Molgen, which is used in Molecular Biology. Yet another advantage of the live-CD approach is that they can be distributed in an educational environment. There both online (such as the MIT online course work) and standalone, class specific, work can be used. The daughter CD can be easily updated with the latest software using the yum update facility allowing the user to install new or needed software as well as fetch and use research papers. The 30 Year HorizonI think it is important that we focus our attention on building for the longer term. We need to think about the fundamental issue of how the computer will affect scientific research, collaboration, and teaching. We need to think about what is needed for the long term and try to architect it now. Thirty years from now everyone will take this work for granted. We might as well start on it now. Feedback? Tim Re: doyen --Page, Bill, Tue, 24 Aug 2004 03:05:14 -0500 reply Tim, et al.
First another administrative detail. Earlier you wrote: I tried to set up a mailing list on the axiom site but the mailer program is broken. I'll search for another way to host a mailing list. You might not be aware that the Axiom MathAction wiki is able to operate like a mailing list. Basically, anyone can "subscribe" to the individual web pages or to the whole web site. First they must identify themselves by clicking preferences (or logging in) and specifying their name (or psuedonym) and email address. Then all they have to do is click the "subscribe" link at the top right side of the page. Any comments subsequently attached to a page will be automatically distributed by email to all subscribers. If you are subscribed to any page on the MathAction web site, then it is also possible for you to use email to reply directly to the emails sent out by MathAction. These replies will in turn be attached to the original MathAction web page and again sent out to subscribers. This way a chronological record of the discussion is kept with the web page and later (if desired) this discussion can be editted and kept for posterity. So ... I have just set-up a web page on MathAction for doyen. To subscribe to it, all you have to do is click on the following link http://page.axiom-developer.org/zope/mathaction/doyen Then click If you want you can click Email sent to mathaction@axiom-developer.org with [doyen] in the subject (like this one) will be automatically attached to the doyen web page and also sent out to all subscribers. Let me know if this is ok and if it works for you. Second. I have to admit that I am not so fond of the project
name Third, although I see your point about promotion at conferences etc. I have tried the Knoppix/Quantian distribution. It's got a lot of "neat stuff" but really as a more or less experienced linux user, my point of view is: "I wouldn't install it on one of my machines" ... but if someone wants to try it, well ok... I don't really know what level of experience one should assume for the "average scientific computer user" these days but I do believe we are moving more and more to the state where the issue is no longer whether "linux or not" but really how easy is it for me to install this thing. So in that regard it is pretty hard to beat Debian apt-get (with the RPM format a close 2nd). Bill Page. .....[snip]?.....So ... I have just set-up a web page on MathAction for doyen. To subscribe to it, all you have to do is click on the following link Actually, I was unaware of this feature. Very nice. I subscribed to the doyen page. I see you've arrived at the "MathAction" name for the wiki. I'll start using it (although Bill Page's wiki has a nice ring :-) Second. I have to admit that I am not so fond of the project name Point taken but we need to name things and finding an unused name that has any relation to anything is quite challenging. If I recall the mother-daughter distinction came up in our phone conversation. You can't blame me for all of it :-) The idea is important though. If the terms are painful please suggest others. Third, although I see your point about promotion at conferences etc. I have tried the Knoppix/Quantian distribution. It's got a lot of "neat stuff" but really as a more or less experienced linux user, my point of view is: "I wouldn't install it on one of my machines" ... but if someone wants to try it, well ok... I don't really know what level of experience one should assume for the "average scientific computer user" these days ... In the beginning there was Rosetta. I collected and distributed FOSS computer algebra systems and gave them away at every conference I attended. The effort was entirely my own and the CDs? were all at my own personal expense (CAISS eventually supported the concept). I'd hoped to get people aware of the range of systems and to start using a standard distribution. There is a Rosetta document (I thought it was on MathAction but I don't see it) that detailed the syntax differences between the systems. In general, this was well received and I received requests for additional copies after every conference. One fellow set up a mirror for Rosetta and built a windows version for distribution. One issue that arises is that each algebra system has to be built for a particular opsys distribution. I included a prebuilt runnable version for RedHat? and the sources if someone wanted to build for some other system. I helped a couple people get algebra systems running on non-RedHat? so somebody actually used the Rosetta CDs?. But they only chose one of the systems to use and there was no way to update the software easily. Quantian is a similar idea (except using quantitative software). It has two key advantages. The first is that all of the systems are pre-built to run under Quantian so you don't have to assume that a user has a Linux system to run the examples. I found that a lot of the feedback was related to assumint the user had RedHat? linux. Quantian fixes this issue by including the system. The second feature is that users can "try and buy" since Quantian can be easily installed on a system. This is important because you can set up examples in the user's area of work (e.g. physics) and they can see the results without much personal cost. Windows is still quite pervasive and the choice is either a canned demo or a trivial redirect to a website. Neither one shows the power of these freely available systems. Quantian steps around Windows for the initial try. But both Rosetta and Quantian fail to provide a comprehensive, attractive platform with wide support. It needs to be more than a collection of "neat stuff". Both Dirk and I tried by individual effort and experiment. The "take away" lessons are useful. One is that I don't think we can change the world without heavily marketing the idea and, your point, a central location to focus the awareness. I rented the axiom-developer virtual machine years after starting Rosetta and it never occured to me to make it into a Mother Doyen. Even if I had done so I was only personally capable of distributing a few hundred Rosetta CDs? at a few conferences. I also tried to pioneer the "proceedings on CD" for the ACM. ACM still wants to bind up the result so that only subscribers can get at the published papers. I was hoping to introduce the idea that all of the papers (as well as the supporting source code) would be electronically available to all. There is great institutional resistance to this and it will take great effort with the ACM to change (since they make money off the proceedings and subscriptions). Science may be free but you can't get the results (yet) without paying for them. This needs to change and the whole Doyen approach might make a dent in the current thinking. And the ACM, despite their mission, has not been promoting a standardized computational science platform. If we can enlist their help (as well as other societies) we can get much better leverage. Of course, we have to build the infrastructure at the same time. If eggs didn't morph into chickens the conundrum would be solved :-) .... but I do believe we are moving more and more to the state where the issue is no longer whether "linux or not" but really how easy is it for me to install this thing. So in that regard it is pretty hard to beat Debian apt-get (with the RPM format a close 2nd). Would that were true but in fact I don't use apt-get or yum or update because almost every system I touch is either firewalled against it, lacking the software, or not net connected (like my 366Mhz laptop). I do use RPM for installation of most things. A second issue is that I know quite a few people (including computer science professors at my college) who are unaware of these facilities. The school uses a mix of windows, linux, apple, and solaris but it is very rare to find a savvy linux user on my campus. As a future direction your point is well taken. The Mother Doyen (perhaps even the MathAcion? wiki) can be the primary collection, distribution and update site. The Lindows distribution (a debian desktop) has institutionalized apt-get in such a way that the naive user doesn't know they are using it. Most mathematicians care only for the result, not the underlying machinery We need to make the whole process very "user-affectionate" if we are to succeed in promoting a standardized platform. {apt-get, yum, update} also need to extend their range so they support more than just the software. Computational science is going to need online archives of papers which are trivially accessed and cross-indexed. These tools need to know how to find and fetch bibligraphic references. I've taken to rambling again. 'tis way past bedtime and the light dawns on the morrow already. Thanks for setting up the list. Tim Tim wrote:
The RosettaStone document was only loaded on http://test.axiom-developer.org since I was experimenting with conversions from LaTeX to HTML. But I think it looks pretty good, so I have transferred it here for easier reference and updating. Perhaps we should split it into several smaller pages... 6501a8c0@Asus">(Linux Science Platform) --Bill Page, 6501a8c0@Asus">Thu, 26 Aug 2004 10:42:53 -0500 reply Perhaps this only works if mathaction@axiom-developer.org is listed
in the To: field? The other thing is that for security reasons, the
sender of the message must first be subscribed to the MathAction
wiki.
-----Original Message----- From: Tim Daly Sent: Thursday, August 26, 2004 9:42 AM To: daly@idsi.net Cc: Dirk Eddelbuettel; Gilbert Baumslag; Ed Pegg Jr; Bill Page; Michael Tiemann; mathaction@axiom-developer.org; Steve Grubb; Bob McElrath?; Tim Daly Subject: doyen (Linux Science Platform) *, (see http://page.axiom-developer.org/zope/mathaction/doyen for background) I had a phone chat with Dirk Eddelbuettel last night about making Quantian the basis for the daughter doyen. He's also email-introduced me to Ed Pegg Jr who has expressed interest in the past. STATE OF THE STATE (Aug. 26, 2004) Bill Page has set up an email mailing list. If you include mathaction@axiom-developer.org in your CC it will get journaled at: http://page.axiom-developer.org/zope/mathaction/doyen Dirk Eddelbuettel has pointed me at the build instructions for Quantian. Quantian can also be built on DVD and I now have access to a DVD burner (although no experience yet). I have set up a Fedora Core 2 box. I have set up a Quantian box. I have a DVD burner. I have the Rosetta pile of algebra systems. Steve Grubb can build Fedora liveCDs. Dirk Westfal at Linux4all was mentioned but I can't find an email addy. Please copy him if you can find him. WORK AHEAD So, the basic vision is to build a Doyen (Linux Science Platform) consisting of two parts, a "Mother Doyen" which is a wiki website and a "Daughter Doyen" which is a liveCD. The platform should support a wide range of scientific software and science activities. The following steps need to happen to put together a prototype: 1) Build the daughter doyen a) build a liveCD example Currently Quantian is built on Knoppix/Debian but is agnostic in terms of platform. 1) explode/rebuild a liveCD (just for the experience) 2) explode/rebuild a liveDVD (same) 3) make a local MathAction wiki - involves configuring Apache, setting up necessary local packages, setting up zope, tayloring the local html to look professional and be clear. 4) mod the liveCD to include the MathAction wiki 5) rework the local MathAction wiki to include links to the mother doyen on the axiom-developer version - make some decisions about what to include vs what can be downloaded. 6) work out an example of yum/apt update of a math package from the host to a local copy. - figure out how to host yum/apt. - make the packages available - test the download/install process per package 6) work out an example of "publish/upload/CVS" a research paper that includes runnable examples - zope code to communicate whole pages - CVS/DARCS backup of changed pages At this point we have a working version of the daughter doyen. A person could boot it up, start the wiki, get an updated package from the host, write a paper, and publish it back to the host. b) build a fedora liveCD example Since RedHat? is in the game we'd like to use Fedora as the basis for the platform 1) explode/rebuild a liveCD of fedora 2) perform steps 2-7 above ... 8) set up a bugzilla to handle problems 9) discuss ways to "customize" the platform for particular targets e.g. a physics platform vs a chem platform vs math Ideally we'd have a selection tool thru the wiki. At this point we have the ability to support a larger number of users, bug feedback, and smaller, more focused target markets. c) expand the MathAction site to support: 1) package collections (Rosetta, Quantian) 2) yum/apt updates of math packages 3) DARCS/CVS of pages/pamphlets/code 4) "publish/upload" of local papers to MathAction At this point scientists/students can collaborate online. If done carefully they can have private areas to collaborate and public areas to publish. d) approach societies (e.g. ACM, AMS, etc) for support 1) search capabilities of ACM digital library 2) support for distribution at conferences 3) branding/support/advertising At this point we can start a campaign to get the daughter doyen distributed at conferences, thru mailings, and thru schools. Users have the ability to search (and possibly contribute to) electronic collections. e) test market the idea We'd like to have a few live users so we could pick a conference and give out copies at the conference. Also useful would be a presentation at the conference. 1) reproduce in small quantities 2) give it out at a conference 3) get feedback. From Dirk Eddelbuettel I'm going to need support for items likebuilding a liveCD, adding items to menus, etc. Also need support for moving to a Fedora base. From Bill and Bob I'm going to need support for creating a clone ofthe MathAction site locally and ideas about publish/upload mechanisms. We may need support for modifications to allow whole page/subtree uploads from a daughter wiki to a host wiki. From Steve I'm going to need guidance about building a Fedora liveCD. From Ed I need some discussion about his offer to move the Quantiandomain name. Perhaps we can point it at the axiom-developer IP address. I need to know what support has to exist. From Michael I need some discussion of coordination with societies,hosting mechanisms for multiple "Mother Doyens" (e.g. group theory), advertising/marketing/branding issues, bugzilla support, yum support. Dirk Eddelbuettel also raised the possibility of a P2P architecture rather than a mother/daughter (hub and spoke) architecture. I haven't thought thru this comment but I throw it out for discussion and thought. Feel free to copy others who might find this effort of interest. Comments? Tim -----Original Message----- From: Tim Daly Sent: Tuesday, September 14, 2004 8:19 AM To: daly@rio.sci.ccny.cuny.edu Subject: Re: doyen (Linux Science Platform) *, Two developments of interest. Yesterday RedHat? announce information about a Stateless Linux box (basically you boot from CD and only use your hard drive as a local cache for downloaded programs and local computation). All data (your home dir, etc) resides on either the host or locally removable media. (http://people.redhat.com/~hp/stateless/StatelessLinux.pdf) and (http://people.redhat.com/dmalcolm/stateless/stateless-linux-HOWTO-en) With the proper set of programs in the bootable CD this could essentially be the Daughter Doyen. There is, however, a fair amount of work to achieve that "simple" result. For now I continue to trudge the path I previously laid out. Second, I managed to set up a system that can build and burn Live CDs?. It takes an amazingly muscular machine so I had several failed attempts before I consed together a large enough horse. The Live CDs? partially boot; the cloop argument is wrong but I don't know where this is stored yet. Once that is solved I will have a filesystem and the boot should complete. Tim This is a test reply to see if mailin still works. And if it does, then the url should now be correct. *,We talked about the Doyen idea previously. (http://page.axiom-developer.org/zope/mathaction/doyen) Patrizia Gianni is a researcher at the University of Pisa in Italy. They have been working on AJCA, an Active Journal for Computer Algebra which is an approach to writing papers which include executable content. (http://mega.dm.unipi.it/submissions.html) I think this is directly in line with the idea of a science platform we discussed and might be of interest to you. Tim |