## Paragraph Embedding from JISCPress

One of the things I was keen to explore within the context of the JISCPress project was the potential for using WordPress as a platform for publishing paragraph level fragments that could be embedded in third party web pages.

As Joss announced on the JISCPress blog, We’ve got paragraph data output switches! that expose paragraph level content through a unique URI in a variety of formats (xml, txt, html, rss and json), as well as object embed codes for each paragraph, though I’m not sure if this is going to be maintained…? e..g at the moment, I think we’re trialling literal text blockquote embeds:

(If the object embed does disappear, similar functionality could be achieved using the JSON feed and a Javascript function, though I guess we need JSON-P (i.e. support for something like &callback=foo to make that really easy.)

See also: A Quick Update for a review of the latest feature releases within the digress.it theme we’re using.

To demonstrate one possible use case for object embedding, see the post Engaging With the Issues Raised By The Google Book Settlement which includes three embedded paragraphs from the JISC’s current consultation around the Google books settlement.

Here’s the actual HTML:

Note that currently there is an issue with sizing the embed container (can any CSS gurus out there give us a fix?

Ideally we need to identify the container height and then size it automatically so there are no scrollbars? I’m guessing .scrollHeight might have a role to play in autodetecting this?)

One thing you might notice is that the URIs for the embedded consultation questions follow a similar pattern – only the paragraph number identifier changes:

What this means is that we should be able to pull in a random paragraph by constructing a URI with a randomly generated paragraph number. So for example:

If you reload the page, you have an 80% chance of seeing a different question…

Here’s the Javascript snippet:

```var n=2+Math.floor(Math.random()*5);
var o=document.createElement('object');
o.setAttribute('style','width: 100%; height:70px;');
o.setAttribute('id','61c197964762012d4819093ebeee4fcf');
p=p.replace(/#038;/,''); //get round WordPress escaping everything...
o.setAttribute('data',p);
document.getElementById('wtr_embed').appendChild(o);
```

//There’s a div with an appropriate id attribute (‘wtr_embed’) also added to the page…
//Note that the div needs to be placed before any inline Javascript in the page;-)

I’m not sure yet if we can track the use of embeds (certainly server logs should be able to track calls, but these probably can’t be captured using Google Analytics?), but it’s still early days…

/

## A quick update

A lot of development is happening right now, so I thought I’d write a very quick summary to keep people informed.

Firstly, version 2.2 of the digress.it plugin was released yesterday. Remember that the JISCPress project bootstrapped the re-development of CommentPress (which has been at v1.4.1 for over a year now, I think) and we helped Eddie release digress.it v2 back in mid-August.  We’ve had seven releases since then and v2.2 finally brings IE6 compatibility with it (IE7 came in v2.1.7). It’s feels stable now and provides pretty much the same experience across browsers. Performance is superb on a modern browser like Chrome 3, Firefox 3.5 or Safari 4. I’ve found that with wp-super-cache installed, too, pages are rendered in a snap.

I’ve also started to document the features that come with digress.it. Some of the really interesting stuff isn’t immediately obvious, like the incredible range of RSS feeds that are now available and the switches for RSS, JSON, XML, HTML and text. @paulgeraghty asked on Twitter whether this might be ‘micro-content’. I’d be interested to know if there are other CMS platforms that provide a formal method of obtaining document data at the paragraph level.

And remember that this is in addition to the full document or document section level RSS feeds that are built into WordPress.We’ve also introduced RSS feeds for each comment author and for the discussion around each paragraph, so if you want to follow one particular person or a discussion around one particular paragraph, you can.

We’re still working on ways to provide an easy way to copy and paste some code and embed a paragraph in your own site, while at the same time giving us a paragraph-level trackback. We’ve been trying various different methods but none of them have worked so far. We’re close though. If you’ve got any ideas for how this might be achieved, please leave a comment 🙂

Alex has been working hard on platform-wide features. He recently uploaded his ‘related documents’ code which looks across the entire platform of documents and makes suggestions for related document sections in the page sidebar. What’s especially interesting about this is the way this is achieved as a background service that runs periodically (you choose how often) and uses the OpenCalais API to provide contextual tags and the Yahoo! Term Extraction API to extract terms from the document. The relevancy of the tags received can be adjusted and author entered tags are also taken into account. These three different methods of mining the document ensure that the document sections that are ‘advertised’ to readers are relevant to the document they are currently reading.

Alex has also been working on integrating Triplify with JISCPress (and WordPressMU).

Triplify is a small plugin for Web applications, which reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.

In practice, this means that the semantic structures for each JISCPress document are now available as RDF triples. Click here and you’ll get an XML/RDF file for a single document. Alex has also written a plugin for WPMU which will work with Triplify and allow the document author to include a license of their own choice in the RDF. Finally, he’s been testing this with Talis’ Connected Commons triple store and now has the WPMU plugin pushing RDF triples to Talis where they can be queried and mashed up using the Talis API. His work on this should go up on our Google Code site in the next few days.

It all needs testing and tweaking a little more, but the substantial part of the work on these three plugins has been done and now it’s a matter of refining them and integrating the platform as a whole and documenting it thoroughly. We’re always interested in what you would like to see the JISCPress project achieve, so please take a look at our UserVoice site and add any suggestions you might have.  We’re also tracking Issues about JISCPress on Google Code and Issues specifically about digress.it on the digress.it Google Code site. You can also get the development code for digress.it there, too.

More soon!

/

## Introducing digress.it

It should be pointed out that while the JISCPress project is brand spanking new, the CommentPress/digress.it project is officially two years old and the product of much research, development and testing of document publishing and annotation in a networked environment. I have blogged/raved about CommentPress before, and I encourage urge you to read about the background of CommentPress/digress.it over on the Institute for the Future of the Book’s original CommentPress site.

You’ll see how digress.it has evolved from the original GAM3R 7H30RY 1.1 (Gamer Theory) book site, to Mitchell Stephen’s paper, The Holy of Holies: On the Constituents of Emptiness, which was inspired by Jack Slocum’s WordPress system built for the drafts of version 3 of the GNU General Public License. The next iteration of digress.it was the Iraq Study Group Report and The President’s Address to the Nation, January 10th, 2007. These were followed by HASTAC’s draft paper on The Future of Learning Institutions in a Digital Age and finally by Kathleen Fitzpatrick’s paper, Scholarly Publishing in the Age of the Internet (no longer available).

digress.it is a significant rewrite and development of CommentPress and I’m really pleased that the JISCPress project is not only using it as a core technology but also contributing quite heavily to its further development. CommentPress is already popular in Higher Education for the critique of texts by students, the open peer-review of manuscripts, the peer-review of published books and to solicit comment on Institutions’ policy documents. It has also been used by the UK Government looking for feedback on their Innovation Nation strategy. So just as JISCPress benefits from more than two years of open source development of CommentPress, we hope that apart from the JISCPress platform itself, Educators and the public sector will benefit from the improvements we make to digress.it. We know that difficulty meeting WCAG accessibility guidelines has meant that CommentPress couldn’t be more widely used in the Public Sector and this is one of the first tasks that we’ll be addressing in the JISCPress project.

If you want to have a say about the development of digress.it for JISCPress (remember, all code is open source and can be used for any other WordPress-based project), then post your thoughts to our UserVoice site. We’re always open to suggestions.

/

## Innovation as a Side Effect… JISCPress and the JISC Strategy Review

The eagle eyed amongst you may have noticed that we recently republished the JISC Strategy Review 2010-2012 on WriteToReply in part as a way of field testing the new digress.it theme that has been under development as part of our JISCPress project.

Some time ago, I remember reading a book by Gary Hammel (with Bill Breen) on “The Future of Management” that included a model referred to as the innovation stack.

The model was pyramidal, and comprised four layers – at the bottom, operational innovation; sitting on top of that was product or service innovation, followed by strategic innovation, and at the top, management innovation.

Now I’ve never done an MBA, so my reading of this book may be out of line with a ‘traditional’ reading of it, but here’s what came to my mind when we originally floated the idea of republishing the JISC Strategy Review on WriteToReply, offered as a straw man…

• operational innovation: Dev8D and the development approaches encouraged in JISCRI projects represent operational innovations; publishing documents on JISCPress is an operational innovation aimed at helping JISC programme managers clarify project calls and JISC project teams shape their bids and disseminate their results;
• product/service innovation: in many cases, the JISC calls for projects seek to encourage product or service innovations, as well as operational innovations; as a hosted service, the JISCPress platform can be seen as a service innovation, running either as a centrally hosted service, or as a document platform in its own right hosted by an institution itself.
• strategy innovation: to a certain extent, programmes like the #ukoer programme represent operational steps that may support a strategic innovation in the way HEIs disseminate the fruits of their scholastic endeavours. The idea of Open Repositories and Open Science also operates at the level of strategic innovation. I think I’d be pushing a little more than I already am to find a strategy innovation role for JISCPress!
• management innovation: JISC Reviews are often disseminated to PVCs and research managers on an “I2I” (institution-to-institution) basis. JISCPress breaks that… badly. JISCPress allows anyone to comment and provide their own response directly to JISC, rather than necessarily representing the traditional response from the top of the strategy/research/IT management hierarchy within the institutions.

So, with that warm up exercise over, it’s time for me to get stuck in to reading the JISC Strategy Review properly… Hmm, now I wonder, does Hamel’s innovation pyramid map in any way onto JISC’s strategy for innovation across the HE and FE sector….?!

## Scholarly publishing with WordPress

Working on the JISCPress project, I’ve been thinking quite a lot about scholarly publishing on the web, and in particular with WordPress. This morning, I read a post over on the ArchivePress blog about some WordPress plugins which are useful additions for creating a scholarly blog and it got me thinking a bit more about what features WordPress would need to support scholarly publishing.

JISCPress does away with the idea that WordPress is a blogging tool, and instead uses WordPress Multi-User as a document publishing platform, where one site or ‘blog’ is a document. The way WPMU is structured means that despite serving multiple (potentially millions) of document sites, the platform remains relatively ‘lightweight’ as each document site generates just a handful of additional database tables, while sharing the same administrative core as a single WordPress install. So, 100 WordPress blogs on WPMU is nothing like the equivalent of running 100 separate WordPress blogs, both from the point of resource requirements and administration. In fact, quite soon, there will be no such thing as WPMU as the two products are going to be merged and because they share 90%+ of the same code already, it’s not too difficult to achieve.1

Anyway, my point here is to discuss whether WordPress can be extended to accommodate most conventions found in scholarly publishing and where it is lacking, to identify the development work required to meet the needs of most academic who wish to write on and publish to the web.2

Scholarly publishing extends to a wide variety of published outputs. As a Content Management System (CMS) and technology development platform, I believe that WordPress has the potential to support any type of scholarly publishing that the web supports. It is extremely extensible, as can be seen from the 6000+ plugins that are available. However, what I’m interested in is what can be done now, by an academic wishing to publish their work through the use of WordPress acting as a CMS. What can be achieved with a few quid3 to self-host WordPress so that a few plugins can be installed and a well structured, typical, scholarly paper can be published.

### My Dissertation

For some time, I’ve been meaning to publish my MA dissertation. Back in 2002, I undertook some unique research which has not, to my knowledge, been repeated and I think there is some value in having it easily accessible on the web. I have an OpenOffice file and a PDF and, in the course of a morning, have published it under my own domain. The reason I did not publish it on the university WPMU platform is because I have been experimenting with different plugins and did not want to install plugins that were untested or we may not support long-term.  In this case, I’ve used a single WordPress installation, but ideally an individual researcher, group of researchers or research institution, would run a WPMU installation which allowed multiple documents to be authored individually or collaboratively4 and published directly to the web as XHTML.

BuddyPress, by the way, can make the experience even more natural, not only because it is based around a community of like-minded people writing together  on the same web publishing platform, but also because, with a few tweaks here and there, we can move away from the language of blogs and towards the language of documents.

Enough of BuddyPress on WPMU for now and back to my dissertation. I set up the site in ten minutes, without using FTP or a command line because I use a host that provides a one-click install of WordPress and WordPress allows you to search for and install plugins from its Dashboard, rather than having to use FTP. Once the site was installed, I then  made some basic changes to the settings, turning on XML-RPC and AtomPub, so that, if I decided to, I could publish to the site using my Word Processor.5 I didn’t use this in the end, but trust me, it works very well using recent versions of MS Word, Open Office (free) and other blogging clients such as MS Live Writer (free).

So, what are the common characteristics of an academic paper? What does WordPress have to support to provide functionality that meets most scholars’ publishing requirements? I scratched my head (and asked on Twitter) and came up with the following:

• footnotes/endnotes
• citations
• use of LaTeX (sciences)
• tables
• images
• bibliography
• annexes
• appendices
• dedication
• abstract
• index to figures
• introduction
• exposition
• conclusion

Many of these are supported in WordPress by default and don’t require any additional plugins (tables, images, sub-headings, annexes, appendices, dedication, abstract, introduction, exposition, conclusion, are all either basic literary conventions or just part of a simply structured document).

My dissertation published using digress.it

After setting this up, I installed a few more plugins:

Dublin Core for WordPress: Automatically adds ten Dublin Core metadata elements to the document mark up.

wp-footnotes: This allows you to easily add footnotes to your document by enclosing your footnote in double parentheses.6

OAI-ORE Resource Map: Automatically marks up the document sections with a OAI-ORE 1.0 resource map.

WP Calais Archive Tagger: Analyses your entire document and automatically keywords each section, using the Open Calais API.

Search API: WordPress comes with search built in, but there is a new search API which will eventually make its way into the WordPress core. I’ve installed the plugin to provide full-text search across the document. It can also add Google Search to your document site.

wp-super-cache: This is simple to install and will significantly speed up your document site, making it a pleasure to navigate through and read

#### Plugins I didn’t use

wp-latex: Although I didn’t need it for my dissertation, it’s worth noting that WordPress supports the use of $\LaTeX$.

Academic Citation: You need to add a line of code to your theme for this to display. It supports the concept of an article being a single blog post, rather than a ‘document site’ and displays a variety of citation formats for readers to use.

Do you know of any other plugins for a scholarly blog?

### The Beauty of Feeds

The other useful thing about managing a document using WordPress and in particular, using digress.it, is that you automatically get RSS/Atom feeds for the document. I’ve already discussed these in detail. It means that I was able to read my document in my feed reader, with footnotes and images displayed correctly.

See how nicely the formatting is preserved. $\LaTeX$ is also rendered correctly in feed readers.

You’ll see that the document sections are listed in order; that is, first section on top. As I noted above, blogs list posts in reverse (most recent first), so I sorted the feed items in Yahoo Pipes and sorted it in ascending order. Yahoo Pipes exports as RSS and it’s that feed that I subscribed to in Google Reader. Wouldn’t it be nice, if I could import my document feed into an Institutional Repository? Wait a minute, I can!

Click to see the item in the repository

When importing the default feed, the HTML output is accurate but in reverse order, while the RSS output from Yahoo Pipes didn’t import into EPrints very cleanly at all. I’ll work on this. UPDATE: Forget Yahoo Pipes. WordPress feeds can be sorted with a switch added to the URL: http://example.com/feed/?orderby=post_date&order=ASC

So there it is. An academic paper, published to the web using a modern CMS which supports most authoring and publishing requirements. I would favour an institutional WPMU platform for academics to author directly to, publish their pre-print to the web for open access and detailed comment, and import their RSS feed into the repository. As a proof of concept, I’m quite pleased with this. We are currently developing a widget that can be embedded in a web page or WordPress sidebar and allow a member of staff to upload a document or zipped folder of documents to the Institutional Repository. I wonder if we can also support the import of a feed from the widget, too?

So, what would your requirements be? Tell me and I’ll do my best to test WordPress against them.

1. Has anyone done a diff on the two code bases to measure exactly what percentage of the code is shared between WP and WPMU?
2. Actually, I think I’ll save the discussion of its shortfalls for my next post. This one is already long enough.
3. I pay \$5/year for my domain name and as many sub-domains as I need. I pay \$10/month for my hosting with unlimited storage and bandwidth.
4. Like any decent CMS, WordPress supports role-based authoring and editing and maintains a revision history of edits, auto-saved once per minute. Revisions can be compared alongside of each other.
5. On a scholarly WPMU installation, plugins could be pre-installed and activated, a default theme selected and settings tweaked so very little work is required by the academic author prior to writing her document.
6. I am using the plugin on this blog!

/

## Testing new site features with the Amazon Kindle License Agreement

We’re really pleased to help promote the launch of digress.it, the evolution of CommentPress which WriteToReply uses to allow you to comment on document paragraphs.

We’ve been in touch with Eddie Tejeda, the original developer of CommentPress, since March, and have been working with him to find funding for a complete rewrite and re-release of the original CommentPress project. You can read more about digress.it on the community website, but here’s a run down of the new features, a bit of a roadmap for forthcoming features and a shout out to anyone that wants to get involved in the project.

## New Features

The original features of CommentPress can be summarised as follows:

• Paragraph-level URIs
• Paragraph-level commenting
• A scrolling comment box
• Page filters that allow you to read comments by document section or by commenter

CommentPress was a WordPress theme. digress.it is a WordPress plugin and a complete rewrite of the original CommentPress code. It adds the following features:

Floating comment box. You can now resize and position the comment box anywhere on the page.

Highly configurable and accepts different stylesheets

RSS feeds for comment authors. Feeds for individual comment authors is a first for WordPress.

Paragraph embedding. You can embed a paragraph on your own site. Paragraphs content is available as HTML, JSON or TXT

There still be bugs. We’re still working on browser compatibility issues with the comment box, for example. This is a first release  using the version 2 codebase and we’d really appreciate your feedback from testing it by considering Amazon’s Kindle License Agreement.

To achieve the objectives of the JISCPress project, we’ll be continuing to fund the refinement of digress.it until November.  The features we’re currently considering can be seen on our UserVoice page (please add more as you think of them). Here are some highlights, specific to digress.it:

• Compatibility with IntenseDebate. This would provide a number of multimedia and reputational features.
• Compatibility with PollDaddy. The ability to include polls in a document would be useful for consultations.
• WCAG Accessibility. Required for use by the Public Sector.
• Compatibility with XML-RPC clients for remote document authoring. Convenient for document authors to publish from MS Word, etc.

## Content Transclusion: One Step Closer

Following a brief exchange with @lesteph last night, I thought it might be worth making a quick post about the idea of content or document transclusion.

Simply put, transclusion refers to the inclusion, or embedding, of one document or resource in another. To a certain extent, whenever you embed an image or Youtube video in a page is a form of transclusion. (Actually, I’m not sure that’s strictly true? But it gets the point across…)

Whilst doing a little digging around for references to fill out this post, I came across a nicely worked example of transclusion from Wikipedia – Transclusion in Wikipedia

The idea? You can embed the content of any Wikipedia page in any other Wikipedia page. And presumably the same is true within any Mediawiki installation.

That is, in a MediaWiki wiki:

you can embed the content of any one page in any other page.

(I’m not sure if one MediaWiki installation can transclude content from any other MediaWiki installation? I assume it can???)

It’s also possible to include, (that is, transclude) MediaWiki content in a WordPress environment using the Wiki Inc plugin. A compelling demonstration of this is provided by Jim Groom, who has shown how to republish documentation authored in a Wiki via a WordPress page, an approach we adopted in our WriteToReply Digital Britain tinkerings.

One of the things we’ve started exploring the JISCPress project is the ability to publish each separate paragraph in a document (each with its own URI), in a variety of formats – txt, JSON, HTML, XML. That is, we have (or soon will have) an engine in place that supports the “publishing” side of paragraph level transclusion of content from reports published via the JISCPress/WTR platform. Now all we need is the transclusion (re-presentation of transcluded content) part to be able to transclude content from one document in another. (See Taking the Conversation Elsewhere – Embedded Quotes; see also Image Based Quotes from WriteToReply Using Kwout for a related mashup).

(Hmm, although Joss won’t like this, I do think we need a [WTR-include=REF] shortcode handler installed by default in WTR/JISCPress that will pull in paragraph level content in to one document from a document elsewhere on the local platform?)

Now this is really what hypertext is about – URIs (that is, links), that can act as a portal that can pull content in to one location from another. It may be of course that the idea of textual transclusion is just too confusing for people. But it’s something we’re going to explore with WriteToReply.

And on of the things we’re looking at for both WriteToReply and JISCPress is the use of semantic tagging to automatically annotate parts of the document (at the paragraph level, if possible?) so that content on a particular topic (i.e. tagged in a particular way) in one document can be automatically transcluded in – or alongside – a related paragraph in a separate document. (Hmm – maybe we need a ‘related paragraphs’ panel, cf. the comments panel, that can display transcluded, related paragraphs, from elsewhere in the document or from other documents?)

PS If you have an hour, here’s the venerable Ted Nelson giving a Google Tech Talk on the topic of transclusion:

Enjoy…

PPS here’s an old library that provides a more general case framework for content transclusion: Purple Include. I’m not sure if it still works though?

PPPS Here’s the scarey W3C take on linking and transclusion 😉 This is also interesting: auto/embed is not node transclusion

PPPPS for another take on including content by reference, see Email By Reference, Not By Value, or “how I came up with the idea for Google Wave first”;-)

PPPPPS Seems like eprints may also support transclusion… E-prints – VLit Transclusion Support.

## Image Based Quotes from WriteToReply Using Kwout

One of the things we discussed with respect to embedding WriteToReply/JISCPress quotes in third party applications was whether or not we should support an “imagified” embedding – that is, convert a paragraph to a JPG or PNG image format that can then be easily embedded in the third party site.

The advantage? Even if the third party site disallows script, object or embed tags, it will probably allow img tags…

So for example, extending the range of output formats suggested in Taking the Conversation Elsewhere – Embedded Quotes, we might consider something like an &output=png switch that allows us to construct an image embedding code along the lines of:

<img src=”http://docserver.example.com?p=POSTNUMBER&digress-embed=PARANUMBER&output=png” longdesc=”http://docserver.example.com?p=POSTNUMBER&digress-embed=PARANUMBER”>

Once again, there’s a trackback issue, although it’s easy enough to wrap the image tag in an appropriate anchor tag:

<a href=”http://docserver.example.com?p=POSTNUMBER&para=PARANUMBER”><img src=”http://docserver.example.com?p=POSTNUMBER&digress-embed=PARANUMBER&output=png” longdesc=”http://docserver.example.com?p=POSTNUMBER&digress-embed=PARANUMBER”></a>

However, this facility was seen as non-essential, so I looked on the web for a solution – and found it in the form of the kwout API which can be used to generate an image based representation of text found in a specified div tag (by ID) on a given web page, which can then in turn be embedded in an arbitrary web page. Although the image may be hard to read, this can work to our advantage: it might drive traffic back to the site that originated the quote

The following javascript snippet uses the Kwwout API to generate an image based representation of a single paragraph from a WriteToReply republished document:

In the API call, “contentblock_10″ is the id of the block element to be quoted. Here’s what the kwouted image looks like:

And here’s the original paragraph on WriteToReply:

Note that the link that the kwout script generates is back to the page in the above case, so to link back to the actual paragraph we’d need to specify this in the link:

As a step on the road to full integration (a use of the Kwout API which may or may not be in line with the stated terms and conditions? I don’t know, I haven’t read them…!) is this bookmarklet that should let you highlight a paragraph number on a WriteToReply document, and then take you straight to the Kwout embed page for that paragraph:

Actually, that looks a little cluttered, and the usability is a little off. So a better solution maybe to suggest that the user clicks on the paragraph link to get the “paragraph in focus page” page, and then click on the following bookmarklet:

(What this does is pull the paragraph identifier out of the URI and then construct the Kwout API call out of it as a result.)

Or if you want the link to go to the “paragraph in focus” page, rather than the top of the page:

(Note that neither of these bookmarklets is ideal – a production stable bookmarklet should be able to cope (or fail gracefully) with the lack of hash separated paragraph identifier in the URI.)

Hmm, maybe we need a “labs” area on WriteToReply where we can collect these micro-utilities?

## Taking the Conversation Elsewhere – Embedded Quotes

As part of the JISCPress effort, one of the things we’ve been considering is the granularity of appropriate “consultation elements” or “discussion elements”, those pieces of content that people might actually want to reference, question or chat around as compared to a whole 200 page document, for example.

The page and paragraph levels fall out of the CommentPress theme (and its descendants) quite naturally – WordPress gives us the page level (along with a single item RSS feed at the page level), and the theme gives us URIs at the paragraph level.

(Hmmm… I wonder – would it also be useful to provide a multi-item RSS feed, at the page level, with a separate item for each paragraph on that page? Or do we do that already?!)

In many cases, the paragraph level seems to be the most natural chunk for discussion, particularly in an ongoing conversation about a particular document. So a major question for us is how to put those paragraphs to work?

One of the features that Eddie’s been working on as part of the JISCPress project is the ability to embed paragraphs from a document in third party web page. This feature will allow us to increase the surface area of the document by allowing third parties to re-present that content elsewhere, whilst also (hopefully) providing a means to link that external conversation directly back to the original document.

So what benefits does embedding have to offer to:

a) the person grabbing and using the embed code;
b) the publisher/whoever’s running the consultation from which the embed code was grabbed

In a discussion on the JISCPress group, Joss suggested the following:

For the user:

1. More portable transformation of document content into raw data.
2. Personalisation, presentation and ‘ownership’ of documents within their own publishing environment (which is one of the benefits of slideshare/scribd).
3. Direct joined up quoting rather than copying. More aligned with the ideals of the web and linking data. This could also be a benefit to publishers concerned about unattributed copying.

For the publisher:

1. Greater possibilities of content dissemination
2. Greater potential of attracting engagement via trackbacks
3. Further possibility of using JISCPress as an underlying ‘document store’ where authoring, dissemination and engagement occurs mostly remotely via XML-RPC, syndication, embeds and trackbacks.
4. Possibility of site analytics being hooked into embeds so the reach is measurable???? (Analytics can track document types, I’m not sure whether they are used to track embeds…)

So where are we at? Embedding is currently in testing and has the following mechanic. Hovering your mouse cursor over one of the paragraph numbers in a document raises a floating panel that contains a link to the current paragraph, and an embed code. (The panel remains open whilst the cursor is over it, so you can easily grab a copy of the code.)

Using the embed code in a third party page embeds the corresponding paragraph in that page.

For testing purposes, the pattern we are using for the embed URL is of the form:

http://docserver.example.com?p=POSTNUMBER&digress-embed=PARANUMBER

The POSTNUMBER identifies the actual page (i.e. http://docserver.example.com?p=POSTNUMBER is a valid page URI) and the PARANUMBER identifies the paragraph to be embedded. Note that this is subject to change.

Unfortunately, the simple embed strategy does not trivially generate a linkback (such as a trackback or pingback) to the original document. For these reverse links to be generated automatically, an actual anchor tag linking back to the original page must be present in the page creating the linkback. One commonly used strategy for achieving this is to provide an embed code of the form:

<div>
<object /&gt
<a>Quoted from etc…</a>
</div>

That is, a link is explicitly included in the embed code, although it is easy enough for the person embedding the quote to strip that anchor tag out.

(Although it complicates matters, as the embedded object is being pulled from the document server, I guess that means we could, in principle, generate a linkback by observing the referrer page URIs for requests made on the server for particular embeddable objects and checking those against the current list of trackbacks? Or maybe the embedded object could generate an XML-RPC back to the trackback server itself whenever the page it is embedded in is loaded? [Note to self: can we easily get analytics on third party embeds?] I think Eddie is working on this, so I won’t embarrass myself further wittering on about things I don’t know anything about!;-)

Note that a similar problem arises when using a Javascript (<script> tag) based embed code: there is no explicit anchor link present. Script tags also have the additional problem that they are often sanitised (i.e. stripped out) of web pages in many institutional web publishing systems. (In some circumstances, a workaround for the institutional case may be possible. For example, if a variant of WTR/JISCPress was running as a white label solution in an institution, a shortcode plugin could be provided that allowed authors to embed paragraphs from documents in that environment within other documents in that environment. See the WordPress shortcode API for more details.)

As well as the straightforward embed code, we’ve also been considering other ways in which paragraph level content can be published so that third parties have convenient access to it in a format that is appropriate for their needs.

And this is what we came up with – an output switch that can be appended to the end of a paragraph URI that allows the paragraph level content to be published in a variety of formats:

• &output=html
• &output=txt
• &output=js
• &output=json

As and when these come on stream, we’ll publish use-case examples for each of them.

## WordPress Multi-User Case Study

Here’s an interesting and relevant case study of the use of WordPress Multi-User as a CMS for a non-profit company. They list benefits such as lower development costs, consistency across sites, flexibility, quick to design, easy to use and increases participation, both by content creators and the public through the use of social features.

## Agile methodologies and open source development

In the course of writing the JISCPress Project Plan, I’ve been thinking again about our project methodology. The original funding call asked for projects to adopt an agile methodology like SCRUM or XP, which I am familiar with. We attempted to use XP while I was working at Amnesty International (not long after half the IT department were trained in Prince2!) and like any methodology, it was used in part rather than in whole. We  collected user stories, held five minute stand up meetings each day and released often and iteratively so that users could feed back on the product. ((It may look like an empty repository but over 20,000 assets are available to logged in Amnesty staff.))

The JISCPress project has four team members, including myself. None of us work on JISCPress full-time, having other work and study commitments. Equally, none of us work together in the same department and only Alex and I work in the same university. Alex is a student of computing at Lincoln, Tony lives on the Isle of Wight and Eddie lives in San Francisco. In addition to this, we’re working wholly with existing open source software (WordPress) that is openly developed and it has never been an option in my mind, to enjoy the benefits of that community but not attempt to contribute back using the same transparency of process. It was also proposed in our funding bid that “the project will seek to promote openness and collaboration from the point of bid announcements onwards.” By this, I was thinking in terms of the open source development process I have seen with WordPress and other projects where asynchronous discussion and contributions take place through mailing listsIRCa code repositoryissue tracker and a wiki.

Reading the excellent OSS Watch website, I came across a page about the sustainability of projects and open development, and was particularly interested to read a quote from Gianugo Rabellino, CEO of SourceSense:

“If you think that one of the key ideas of agile is the unity of time and location – you need to be in the same place at the same time and doing a lot of discussion face-to-face – and then you have open development which is based on asynchronous, distributed working etc., then it looks like oil and water – they don’t mix”.

This is what I’ve been thinking recently, too. It’s not that they are wholly incompatible methods of developing software, but from what I know about agile methods, there is an assumption that the developers are working together in the same physical location, focused intensively on the same client driven product.

“Scrum enables the creation of self-organizing teams by encouraging colocation of all team members, and verbal communication across all team members and disciplines that are involved in the project.” ((Wikipedia: SCRUM))

Frankly, this way of working is impossible for us. On the other hand, projects that are openly developed often don’t have clients but instead have ‘communities’ of users. They rarely have short code sprints, they have open version-controlled repositories that allow anyone to test the code at any time. It’s worth noting that WordPress recently held a code sprint but given the size of the community, there were relatively few contributions. Many contributors work asynchronously and have other commitments over the course of their day, volunteering their time and effort when they can.

Likewise, JISCPress is intended to serve a community rather than a single client. We hope that it is the JISC community who lead the direction of the project through testing and feedback and who eventually benefit the most from the project. Beyond the JISC community, there is the wider community of users of WordPress and CommentPress who will likewise benefit from the project.

Ross Gardler, OSS Watch manager, describes the Open Development Methodology (ODM) as “a way for distributed team members to collaboratively develop a shared resource in a managed and sustainable way.” The ODM is characterised by:

1. User engagement
2. Transparency
3. Collaboration
4. Agility
5. Sustainability

Agility and user engagement are also found in SCRUM and XP, but there is no requirement in these methodologies to be transparent, sustainable beyond the client’s specific use for the product or cater for a diverse group of asynchronous contributors.

With this in mind, I will continue to learn about and pursue an open development methodology for JISCPress because it is appropriate for our project. It is already part of an existing (WordPress) open development community and we have, from the start of WriteToReply and then the #jiscri call, placed a great deal of emphasis on openness and transparency of process.

It is too early in the project to measure the effectiveness of this approach. Eddie and Alex only joined us in the last few days and we’re still setting up the basic platform for working with. I have noticed that the use of IRC has not taken off despite my fondness for it. This is partly because all of us use GMail and tend to use Google Chat for quick conversations when we see we’re online at the same time, rather than having an IRC client open. Tony and I have an established way of communicating with each other over Twitter, which is public but a poor method of establishing context for the project as Twitter doesn’t archive tweets long-term and searching for anything seems to be hit and miss. I would like to establish weekly IRC meetings soon though. There is also the issue of working in a significantly different timezone to Eddie. IRC is for synchronous chat and when Eddie is at work,the rest of us are thinking of sleep. Eddie is talking about visiting the UK for a few days (paid for out of his own pocket), and I hope that the four of us and anyone else that is interested, will meet up for a day’s discussion and development.

There are clearly still things to be worked out and a routine to establish that works best for us, but I am keen that if a methodology is to be identified for the project, it is one of ‘open development’ rather than ‘agile’. I intend to devote a lot of my time on the project to ensuring that the wider WordPress community are aware of what we are doing and that they are welcome to contribute in any way they can. I shall write more about how we are addressing Ross’ five characteristics of an Open Development Methodology and am keen hear from anyone who has an opinion on any of this, including members of the JISCPress team, who I haven’t consulted before writing any of this.

/

## CommentPress is not Marginalia, after all…

We blogged too soon. People have pointed out that the name ‘Marginalia’ is already in use for a similar project. We’ve been in touch with Eddie and he’s obviously going to switch names for his project (again).  Sorry Geof!

## An introduction to JISCPress

Here’s a 50 second introduction to JISCPress, made for the JISC Information Environment programme meeting on July 7-8th.

## Project SWOTing

One of the JISCRI project reporting requirements is a SWOT analysis of each project. It makes sense to attempt our first SWOT analysis sooner rather than later and update it using the comment form as we work through the project. Your comments are very welcome. Surely a bad SWOT analysis is one undertaken by a single individual (like this one!)

I think one of the main strengths of the JISCPress project is that we’ve effectively been developing it since February, when Tony and I set up WriteToReply. JISCPress is basically a re-thinking, re-working and further development of WriteToReply for a specific community and we can apply the lessons learned through WriteToReply, to the planning and development of JISCPress. In this sense, we’ve got a decent head start really and both Tony and I know where our own strengths and weaknesses, as well as our own particular interests lie in the project.

Another strength worth highlighting is the range of skills we bring to the project. I’ve been running different WordPress MU installations here at the University of Lincoln for the last year and have several years’ experience tinkering with Linux servers. I’m finding working on AWS to be an enjoyable and welcome learning experience.  Tony prefers to stay away from the bash command line, instead focusing on the way the data published on JISCPress can be repurposed, cross-referenced, syndicated and mashed up with other web services. He’ll also be looking at what value can be gained from the Google Analytics and Piwik APIs.

Anyone that knows a bit about CommentPress (now called ‘Marginalia‘), will understand that Eddie brings an excellent understanding of WordPress code to the project as well as some pretty advanced javascript skills. Eddie has always led the development of CommentPress/Marginalia and as it provides core JISCPress functionality such as paragraph level URIs and commenting, it’s a strength of the project that we have him on board. Were he not on board and we were reliant on another developer to work on CommentPress code, I would consider this a risk to the project. To quote from his site:

I believe in rapid prototyping, open-source, collaborations instead of competition, quick releases, smalls teams, debate, creative thinking, and transparency.

That’s exactly the type of person we need to work on this project.

Finally, Alex is a keen student of computing at the University of Lincoln with good PHP skills. As a student, he’s flexible with his time and not wholly reliant on the project for his income and it’s reassuring to have him working locally (and in the same time zone!). All in all, I think we’ve got a good spread of skills and interests on the team.

While I’m thinking of strengths, I’m confident that using Amazon’s infrastructure to work on the project will prove to be a strength. It allows us to work on the project in an environment that is independent of the University of Lincoln’s IT infrastructure.  I’m very lucky here at Lincoln to have root access to my own Linux server to work on, despite not being a member of the ICT department. I’ve no complaint at all about our ICT department and enjoy working with them, but on a rapid project like JISCPress with four team members working independently and the potential for contributions from the open source community, I’m pleased that we have our own space to work and I don’t need to bother my IT colleagues to restart the virtual machine or make changes to DNS records.

On the other hand, the membership of the team could be seen as a weakness. We’re not a tight team of developers working in the same institution but rather relative strangers working, for the most part, remotely and in Eddie’s case in a considerably different time zone. This could result in poor communication and lack of motivation if we let it and I hope that the pillars of communication in open source projects that we’ve set up (IRC, mailing list, code repository, wiki, blog) will help us stay in touch and motivated.  However, one of the main benefits of this project for both me and my employer, is being able to test this way of working on development projects. We don’t have an in-house team of web developers who could be pulled into this project and as much as I’d like my department to hire a researcher/developer or two, it’s not going to happen. So in order to work on JISCRI and similar funded projects, I need to show that this is an effective way of working. I hope it succeeds, because I like working on these types of projects and in order to innovate in our use of technology to support research, teaching and learning, we need to have the experience and capacity to undertake proper R&D and not just theorise about the potential of technology in the HE sector.

threat to the project is that JISCPress is principally a tool for JISC document authors to publish funding calls and JISC project managers to publish their final reports. We need their buy-in to the project, not only to make it feel worthwhile but also to steer the direction of feature development. JISCPress might be seen as complicating JISC employees’ work, pushing something on them that they never asked for. It might also be seen as yet another requirement from JISC to Project Managers. I take this threat seriously, but I don’t let it worry me too much. JISC has made the decision to fund JISCPress as a ‘demonstrator prototype’ and there’s no obligation for them to put it into production use. They also recognise that we’re building a platform that could equally be of value to other organisations. WriteToReply and JISCPress are just two examples of what we’re developing. WordPress is a popular CMS and the work on Marginalia and additional features that we’ll be developing, can be cherry-picked or taken wholesale and put to good use. All code is developed under a GPL or compatible license. (Note that this has to be the case, because we’re developing for WordPress which is licensed under the GPL, calling functions in WordPress core code – not all WP plugin and theme developers understand this!)

Finally, for now, the project provides opportunities for anyone to get involved and in turn, by working in public on an open source project, I hope we’ll attract others who like what they see and want to contribute in any way at all. Comment, test, review and contribute code, if you can. Join the mailing list and introduce yourself. Working this way, with an emphasis on openness and transparency, I hope that opportunities arise that we don’t yet know about. One that we do know about is Google Wave, due out in September, and if we keep that in the back of our mind, there might be an opportunity to exploit this new and exciting platform and protocol. Maybe we’ll develop a JISCPress gadget for Wave that allows realtime comment and discussion on a document from Wave? Maybe JISCPress will largely become a ‘hidden’ CMS that is used exclusively by via publish and subscribe protocols such as RSS, AtomPub, XML-RPC, and Wave/XMPP?

/

## Setting up JISCPress on Amazon Web Services

I’ve spent the last couple of days – about 13 hrs altogether – setting up JISCPress on Amazon Web Services (AWS). Prior to yesterday, I’d not really used AWS except for setting up the command line tools and starting and stopping a server, also known as an Amazon Machine Instance (AMI).

It’s gone pretty well and I’ve documented the outline of the process I went through on the JISCPress wiki. I used a combination of the Amazon Management Console and the command line tools to work on the Elastic Compute Cloud (EC2). To create ‘buckets’ on the Simple Storage Service (S3), I used the S3 Firefox plugin. There are a lot of third-party tools to interface with both EC2 and S3.

I work on virtual servers both at the university and on Slicehost, where WriteToReply is hosted. A server on AWS is a different kind of virtual server, which takes a little while to understand, but the documentation is good and I pretty much followed the suggested workflow.

In addition to using EC2 and S3, I am also using an Elastic IP address and the Elastic Block Store (EBS). The elastic IP address is convenient in that it allows you to ‘own’ a static IP address that you can bind your DNS A record to – in our case jiscpress.org Without the Elastic IP address, the IP address of the AMI is lost when you terminate the machine (turn it off) and so you have to change the DNS record and wait for DNS to re-propagate. It’s ‘elastic’ because while it is a persistent, external IP address, you can hot swap the machines that use the IP address. That was my first lesson.

My second lesson was to understand that there are a number of alternative, trusted public AMIs available to choose from. I don’t really mind which flavour of Linux I use and when looking at the AMIs that Amazon provide, I saw that only Fedora 8 was available so I chose that.  As it happens, there are also Ubuntu AMIs from Canonical.  Instead, I just upgraded Fedora 8 to Fedora 11 which wasn’t too much trouble, but had I known, I’d have just chosen an Ubuntu image from Canonical as they are more up-to-date. I also learned that despite upgrading the machine, the kernel remains the same. Amazon build kernels for their service and you can build kernel modules from the sources which they provide if you need to.

My third lesson was that although you can reboot an AMI just as you can reboot any virtual server, if you ‘terminate’ or turn off the AMI, you appear to lose all data that has been created since you created the AMI. In my case, that wasn’t much as I wondered about the persistence of data and had created my AMI after I’d got a basic web server with WordPress MU set up. But it’s really worth noting this as not only might you want to turn the machine off to save on running costs, but there’s also the chance that it might unexpectedly go down and you’d have lost your work. This underlines how AWS is being used. Machines are cheap and replicable. Use S3 and EBS for data you care about. Using a single machine for both production and development is never the right way to go about working long-term, but with a decent back-up strategy, it should work fine for us.

This led me sort out backups and I set up rsync to backup /var /home /root and /etc to an Elastic Block Store. EBS is a virtual block device (i.e. hard disk) which you can format and then attach and mount on your AMI. So I’ve got rsync backing up to /mnt/data

Getting the domain name and DNS sorted out was very simple. I registered jiscpress.org via Dreamhost (\$10/yr) and then used a free UK-based DNS host to host the record. WordPress MU can run using either sub-directories (i.e. http://jiscpress.org/site1 or on sub-domains (i.e. http://site1.jiscpress.org). On the whole, it’s better to set up wildcard DNS and go with sub-domains, which I did by simply adding an A record entry of *.jiscpress.org against the Elastic IP address and ‘ServerAlias *.jiscpress.org’ in the section of the apache config file.

Finally, it looks like sending mail from an AMI is not as simple as you might expect. This is because the hostname for your machine is provided dynamically when you activate it and can’t be changed. This means that you can’t add a PTR record in your DNS and therefore can’t set up reverse DNS.  Without this, most mail hosts such as Hotmail or Yahoo, will treat mail from your server as spam. So far, Google is treating mail from the server as ‘neutral’ and letting it through. The simplest way around this is to relay your mail to an external relayer which a lot of AWS users appear to be doing. For the time-being, I’m not too worried about this but I may have to do more work on it if we find that mail is regularly failing to get through.

I’ve enjoyed the learning process of setting JISCPress up on AWS. I’ve only really scratched the surface of what the platform offers but once I’d got my head around how the different services work together, it seems pretty straight forward. The basic machine (1.2  GHz 2007 Opteron or 2007 Xeon processor, 1.7 GB RAM, 160GB storage) feels very, very fast, as does the network it’s running on. WPMU can be pretty resource hungry, but for the purposes of our project, I think this will be sufficient.

Anyhow, http://jiscpress.org is now live and running a bare WPMU install. I’ll refine it over the next week in preparation for Eddie and Alex to begin work in early July.

If you’ve got experience working on AWS and can clarify or correct any of my assumptions, please do. I get the feeling that now JISCPress is in the cloud, I need to relax a bit and enjoy the flexibility of the platform and learn more about what it has to offer.

## JISCPress: A document discussion platform

We’re very pleased to announce that JISC have agreed to fund JISCPress, a six-month, £32,500 project led by the University of Lincoln, in partnership with the Open University and based on WriteToReply. JISCPress will provide a scalable community platform for publishing and discussing project calls and final reports, in order to support the grant bidding and project dissemination processes.

As you may know, WriteToReply is run in our spare time – lots of late nights and busy lunchtimes. Since launching the re-publication of the Digital Britain – Interim Report, we’ve been looking for ways to bring benefits from our work on WriteToReply, into the Higher Education community where we work. JISC fund much of the UK development and innovation in the use of ICT in teaching and research and in March, announced their Rapid Innovations funding call.

We quickly re-published the call on WriteToReply to demonstrate the benefits of publishing funding calls in this way and then went on to submit a bid which proposed a community platform for the JISC funding call process, based on our experience of setting up and running WriteToReply. As with WriteToReply, this will be an open, public project and all documentation and code will be available under open licenses.

JISCPress is a platform aimed at people working in UK Higher Education, but the platform itself could be easily adapted for other uses, just as WriteToReply is primarily focused on government consultation documents. The final platform will be available as an Amazon Machine Image so anyone will be able to host their own multi-document discussion platform with all the benefits you see on WriteToReply plus the additional features we’ll be developing throughout this project. We’re already advocating the use the platform in our own universities for the open (and closed) discussion of institutional strategies, for the critique of texts by students and for peer-review of research papers. What might you use it for?

Over on the JISCPress project blog, you’ll find links to a mailing listwiki and code repository. Feel free to join us if this WriteToReply spin-off appeals to you. If you know anyone that might be interested, please do let them know.

You’re probably already aware that WriteToReply uses WordPress Multi-User and CommentPressEddie Tejeda, the developer of CommentPress will be working with us on the project and this will result in significant further development of CommentPress 2. So, if you’re interested in WPMU and CommentPress (as many people are), please consider following, contributing to and testing JISCPress.

We should also note that while the project is a spin-off of our work on WriteToReply, neither Tony or Joss are personally receiving any funds from JISC.  The contributions from JISC to cover our time on this project are paid directly to our employers and does not result in any financial benefit to us or WriteToReply (which is in the process of being formalised as a non-profit business).  In other words, while WriteToReply is a personal project, JISCPress is part of our normal work as employees of our universities (both Tony and I are expected to routinely bid and win project funds – you get used to it after a while!). Money has been allocated to fund dedicated developer time to the project, which will pay Eddie and Alex, a student at the University of Lincoln, for their work as freelancers.

Anyway, on with the project! Here’s the outline from our original bid document:

This project will deliver a demonstrator prototype publishing platform for the JISC funding call and dissemination process. It will seek to show how WordPress Multi-User (WPMU) can be used as an effective document authoring, publishing, discussion and syndication platform for JISC’s funding calls and final project reports, and demonstrate how the cumulative effect of publishing this way will lead to an improved platform for the discovery and dissemination of grant-related information and project outputs. In so doing, we hope to provide a means by which JISC project investigators can more effectively discover, and hence build on, related JISC projects. In general, the project will seek to promote openness and collaboration from the point of bid announcements onwards.

The proposed platform is inspired and informed by WriteToReply, a service developed by the principle project staff (Joss Winn and Tony Hirst) in Spring 2009 which re-publishes consultation documents for public comment and allows anyone to re-publish a document for comment by their target community. In our view, this model of publishing meets many of the intended benefits and deliverables of the Rapid Innovation call and Information Environment Programme. The project will exploit well understood and popular open source technologies to implement an alternative infrastructure that enables new processes of funding-related content creation, improves communication around funding calls and enables web-centric methods of dissemination and content re-use. The platform will be extensible and could therefore be the object of further future development by the HE developer community through the creation of plugins that provide desired functionality in the future.

Subject to user requirements, our planned project deliverables are:

• A WordPress Multi-User based platform for authoring and publishing JISC funding calls in a form that allows paragraph-level comment and discussion either locally or remotely.
• A meta-site that aggregates all document data into a single site for search, navigation by categories and tags and can syndicate searches, tags and categories.
• Develop CommentPress to meet WCAG 2.0 accessibility guidelines, meeting public sector requirements.
• Evaluation and integration of “related content” utilities to dynamically link related project calls and reports based on content and/or semantic analysis.
• Evaluation and possible integration of remote, realtime messaging services such as Twitter and XMPP integration.
• Evaluation and possible integration of enterprise authentication services such as LDAP and Shibboleth.
• Evaluation and possible integration of OpenCalais, a semantic tagging service.
• Documentation on how to exploit the benefits of AWS and clone the project instance for other uses.
• A documented suggested workflow for document authors
• Documented examples of how to fully exploit the platform for data extraction and syndication.
• Documented ‘user stories’ for the JISC funding call process.

If this sounds interesting, please do take a look at the full project proposal and join us on the mailing list.

