WebCite
[ HOME | FAQ | NEWS | APPLY | MEMBERS | SEARCH | COMB | ARCHIVE | BOOKMARKLET ]

WebCite® Consortium FAQ

As an individual (scholar/student/researcher), do I have to become member of the consortium in order to use WebCite®?

No formal membership or even registration is required. For individual scholars who want to cite and archive (=”to webcite®”) a web document, this is a free and open service, and you can use WebCite® right away, for example by installing the bookmarklet or going to the archive form, without being a member.

Why should editors, publishers, and libraries join the WebCite® consortium?

Webpages or websites which are cited in scholarly articles are presumably important documents worth preserving for readers, yet due to the volatility of the World Wide Web they are at risk to disappear as a reference for future scholars. Adding WebCite® links to cited URLs increases the likelihood that the cited work remains accessible and ensures that readers have access to exactly the version as seen by the citing author.

How does this work?

The archiving process is simple: WebCite takes a snapshot of a cited webpage and stores a copy of the html including images (or any other files, for example pdf) on the webcitation server.

The caching (archiving) process can be initiated prospectively (before publication) by the author or the editor, copyeditor or publisher at the time he/she authors, edits, or publishes the citing document. WebCite® also has a crawler capability, allowing retrospective (after publication) archiving. Here, the publisher would submit XML files of already published articles (or point us to a website where WebCite® can access the full text articles). The WebCite® crawler will then go through the references and attempt to retrospectively archive cited webpages, although there is no guarantee that these will still be available and that what is being archived is what the author saw when he/she cited the webdocument. We therefore urge authors, editors, and publishers to prospectively archive webcitations when the write or edit articles.

What is the goal of the WebCite® consortium?

WebCite® members are editors, publishers, libraries, and other interested entities pursuing a common goal: Digital archiving of webpages or websites which are referenced in scholarly journal articles, therefore preserving the scholarly record and our cultural heritage for future generations (for the broader societal implications see below). Beyond these broader objectives, WebCite® solves some practical problems, creating some utility and value for readers, providing a solution to "404 File Not Found" errors when clicking on cited webreferences. It also avoids the problems associated with the dynamic nature of Web content, making sure that the reader sees the webdocument exactly the way the author saw it when he cited it. For example, a citing author may critique a report published online. Responding to the critique, the cited author of the online report may subsequently make changes and republish the report at the same URL, without the reader of the critique being able to retrieve the version the citing author has critiqued - the scholarly record is lost.

In addition, WebCite® will provide "impact statistics" of webpages and websites (number of citations of a given website or webpage in the scholarly literature, combined with access data), helping for example tenure and promotion committees to establish the "impact" and importance of webservices and digital objects, complementing metrics like the Journal Impact Factor developed by ISI (the latter measures the impact of a scientific journal based on how often articles from a specific journal are cited).

What are the broader societal implications and objectives?

The people behind WebCite® are social entrepreneurs, driven by the desire to instigate social change. WebCite® is a simple yet disruptive application, which has wider societal implications and a potentially significant impact on scholarship and scholarly communication.

Simply put, WebCite® aims to make Internet material (any sort of digital objects) more "citable", long-term accessible, and hence more acceptable for scholarly purposes. Without WebCite®, Internet citations are deemed ephemeral and therefore are often frowned upon by authors and editors. However, it does not make much sense to ignore opinions, ideas, draft papers, or data published on the Internet (including wikis and blogs), not acknowledging them only because they are not "formally" published, and because they are difficult to cite. The reality is that in the age of the Internet, "publication" is a continuum, and it makes little sense to not cite (therefore acknowledge) for example the idea of a scholarly blogger, the collective wisdom of a wiki, ideas from an online discussion paper, or data from an online accessible dataset only because online material is not deemed "citable". By making Internet material more "citable" (and also by creating incentives such as mechanisms and metrics for measuring the "impact" of online material by calculating and publishing WebCite® impact factor), we hope that this will encourage scholars to publish ideas and data online in a wide range of formats, which in turn should accelerate and facilitate the exchange of scientific ideas. While we do see the value of scholarly peer-reviewed journals for publishing research results (many of us are editors and publishers of peer-reviewed journals ourselves!), we also acknowledge that much of the scientific discourse takes place before it is "formally" published, and that peer-review can also take on other forms (e.g. post-publication peer-review, which is something WebCite® plans to implement).

Another broader societal aspect of the WebCite® initiative is advocacy and research in the area of copyright. We aim to develop a system which balances the legitimate rights of the copyright-holders (e.g. cited authors and publishers) against the "fair use" rights of society to archive and access important material. We also advocate and lobby for a non-restrictive interpretation of copyright which does not impede digital preservation of our cultural heritage, or free and open flow of ideas. This should not be seen as a threat by copyright-holders - we aim to keep material which is currently openly accessible online accessible for future generations without creating economic harm to the copyright holder. This is a challenging, but feasible goal, and future iterations of this service may include some sort of revenue sharing mechanism for copyright holders.

Yet another angle is that WebCite® enables "one-click self-archiving", making it very easy for scholarly authors to create a permanent, openly accessible record of their own work and their ideas. While the primary pathway in the WebCite® system is third-party initiated archiving (triggered by a citing author), WebCite® also provides a very simple mechanism for authors to self-archive their own work.

Who is behind the WebCite® Consortium?

WebCite® is run by editors for editors, and by publishers for publishers. The system was invented by Dr. Gunther Eysenbach, Editor and Publisher of the Journal of Medical Internet Research, Senior Scientist at the Centre for Global eHealth Innovation, and Associate Professor at the Department of Health Policy, Management and Evaluation at the University of Toronto.

What is the history of this service?

The WebCite® idea was first conceived in 1997 and mentioned in a 1998 article (BMJ 1998;317:1496-1502) on quality control on the Internet, alluding to the fact that such a service dubbed webcite.net would also be useful to measure the citation impact of webpages. In the same year, a pilot service was set up. However, shortly after, Google and the Internet Archive entered the market, both apparently making a service like WebCite® redundant, providing a basic mechanism to retrieve older versions of a given URL. The WebCite® idea was revived in 2003, when a study published in the journal Science concluded that there is still no appropriate and agreed on solution in the publishing world available, and that cited webreferences were to a large degree no longer accessible. In addition to the accessibility problem it is always unclear whether the reader sees exactly the version the author saw when he/she cited a webreference. Traditional Internet archives and Google do not allow for "on-demand" archiving by authors, and = more importantly - do not have dedicated services and interfaces to scholarly journals and publishers to automate the archiving of cited links. In 2005, the first journal announced using WebCite® routinely, and hundreds of other journals followed. In fact, what is visible to the public (this website)

Who owns and runs WebCite® at the moment?

WebCite® has been incubated and is still hosted at the University of Toronto / University Health Network's Centre for Global eHealth Innovation. Partners include the Faculty of Information Studies at the University of Toronto, University of Toronto Library, and Internet archives. An increasing number of journal editors and publishers are currently joining the growing WebCite® Consortium. We are currently in the process of incorporating WebCite® as a distinct entity. Many of the tools we use and develop are open source.

How can I be assured that archived material remains accessible and that webcitation.org doesn't disappear in the future?

WebCite® used to be a member a member of the International Internet Preservation Consortium (IIPC). Members of the IIPC (e.g. libraries) are concerned with the collection, preservation and long-term access of a rich body of Internet content from around the world, and develop and use common tools, techniques and standards for the creation of international archives, which may also include the exchange of data. WebCite® feeds its content to digital preservation partners such as libraries and the Internet Archive (archive.org). WebCite® is operated and supported by publishers, who are already using it for their journals and citations and therefore have a vital interest in keeping the service alive.

Finally, the DOI handle system enables a mechanism to cite and retrieve an archived copy of a webpage without having to rely on the functionality of WebCite® and the webcitation.org URL itself. WebCite® can, under certain circumstances, assign DOIs to archived copies, which are identified by their hash (a digital fingerprint). Archived copies can then be retrieved through http://dx.doi.org (either from WebCite®, or from other digital preservation organizations), providing a mechanism for cross-archive retrieval of archived material.

Doesn't something like this already exist? What about CrossRef, Google or the Wayback Machine?

No. While systems such as Digital Object Identifier (DOI) and CrossRef ensure to a certain degree the stability of cross-links to other journal articles and other materials carrying a Digital Object Identifier (DOI), cited non-journal webpages (for example links to questionnaires on the web, research reports, quotes from homepages etc.) do not usually have a DOI. To assign a DOI, a document has to be stable, and it must be formally "published". WebCite® can do both - "freezing" the status of a webdocument by taking a snapshot, and then - if desired by the author of the cited material - assign a DOI (this functionality is planned to be implemented in 2008).

Services such as the Internet Archive (Wayback Machine) or Google archive Internet documents in a shotgun-approach by a crawler, not focussing on academic references. The caching process cannot be initiated by authors, editors, or publishers wanting to archive a specific web reference as they saw it on a specific date when they quoted it. In contrast, WebCite® is a tool specifically designed to be used by authors, readers, editors and publishers of scholarly material, allowing them to permanently archive cited Internet references. It is now used by an increasing number of authors and journals, ensuring future availability of cited webreferences for scholars reading the citing article in 1, 3, 5 or 10 years from now. WebCite® has built a XML-based webservice architecture which enables for example publishers, webmasters, editors, institutions, and vendors of bibliographic software packages to exchange data (e.g. metadata) and to trigger an archiving request. As such, WebCite® can be seen as providing the "glue" between the scholarly community (authors/editors/publishers) and the digital preservation community.

What about copyright issues?

Caching and archiving webpages is widely done (e.g. by Google, Internet Archive etc.), and is not considered a copyright infringement, as long as the copyright owner has the ability to remove the archived material and to opt out. WebCite® honors robot exclusion standards, as well as no-cache and no-archive tags. Please contact us if you are the copyright owner of an archived webpage which you want to have removed.

A U.S. court has recently (Jan 19th, 2006) ruled that caching does not constitute a copyright violation, because of fair use and an implied license (Field vs Google, US District Court, District of Nevada, CV-S-04-0413-RCJ-LRL, see also news article on Government Technology). Implied license refers to the industry standards mentioned above: If the copyright holder does not use any no-archive tags and robot exclusion standards to prevent caching, WebCite® can (as Google does) assume that a license to archive has been granted. Fair use is even more obvious in the case of WebCite® than for Google, as Google uses a “shotgun” approach, whereas WebCite® archives selectively only material that is relevant for scholarly work. Fair use is therefore justifiable based on the fair-use principles of purpose (caching constitutes transformative and socially valuable use for the purposes of archiving, in the case of WebCite® also specifically for academic research), the nature of the cached material (previously made available for free on the Internet, in the case of WebCite® also mainly scholarly material), amount and substantiality (in the case of WebCite® only cited webpages, rarely entire websites), and effect of the use on the potential market for or value of the copyrighted work (in the case of Google it was ruled that there is no economic effect, the same is true for WebCite®).

Who is going to pay for this?

There are various revenue streams to cover the ongoing costs of operations. One revenue stream is from publishers who pay a membership fee (similar to PILA/CrossRef membership fees) to have their publications analyzed and cited webreferences archived. There is no fee for citing authors. In 2008, we plan to offering premium membership accounts for institutions and individual users, creating revenue streams for copyright owners ("cited authors") such as content-specific ads, enabling DOI® (Digital Object Identifier) assignment for cited non-journal material, and offering access to advanced access/webcitation statistics including a WebCite® impact factor.

I am a programmer or student looking for a project and ideas - How can I help?

If you are a programmer or a student looking for a project, here are some ideas for programming projects. We are happy to help with the architecture, use case development, specification of our XML engine etc. We will also publish the tools and (if desired) the source code developed by third parties on our website, with full credits to the developer(s).

  • develop browser plugins which enable people to cite webpages as they browse (we already offer a "bookmarklet" one-click archiving button, but a more sophisticated tool enabling the extraction/entering of metadata such as author name etc. would be nice to have)
  • develop a wikipedia bot which scans new wikipedia articles for cited URLs, submits an archiving request to WebCite®, and then adds a link to the archived URL behind the cited URL
  • develop plugins/add-ons to third-party software such as bibliographic reference management systems, word processors, content management systems, authoring tools, scientific software, publishing tools, manuscript/abstract submission and management systems such as OJS (Open Journal Systems), word-to-XML conversion tools such as Xtyles, wikis etc. These add-ons should simply detect when a URL is cited, send a archiving request to WebCite®, and analyze the XML returned by WebCite®
  • develop apps for Web 2.0 social networking sites such as Facebook (e.g., an app which allows friends to see what you webcited recently)
  • develop tools that help to extract metadata from cited webpages, using e.g. metadata extraction algorithms
  • develop crawlers or RSS feed analyzers which detect newly published articles and extract cited URLs from full text articles and XML files accessible through aggregators or archives such as Pubmed Central, ArXiv, or Highwire, as well as from open access journals, transmitting archiving requests to WebCite®.

I am working on a grant-funded project / on a proposal for a grant-funded project. Can we do something together?

Most definitively - please contact us. We are already involved in a number of collaborative projects, for example involving wiki-like scholarly projects, where we have to develop mechanisms to make the work "citable", or projects which involve the preservation of more complex citable units such as datasets and interactive programs. Speak to us.

I am working as a project officer at a foundation - can we sponsor/fund you?

Most definitively talk to us...

How can I become involved?

To become a member of the WebCite® Consortium as a publisher, editor, or digital preservation partner, fill in the membership form, or contact us at:

WebCite® Consortium
http://www.webcitation.org
c/o Centre for Global eHealth Innovation
Toronto General Hospital
R Fraser Elliott Building, 4th Floor,  Room 4S435
190 Elizabeth Street
Toronto ON, M5G 2C4
Telephone (+1) 416-340-4800 Ext. 6427
Fax (+1) 416-340-3595
geysenba@gmail.org

WebCite® Technical FAQ

What if I want to archive multiple pages, or an entire site?

We're working on a number of enhancements to the WebCite® service along these exact lines. Until such time as they are ready, the archive and comb pages are the only publicly available ways of initiating archiving operations. There are some simple workarounds, however. If you're trying to archive an article that is broken up into multiple pages (as many news sites do with longer stories), look for a 'Printable version' of the article -- many sites provide a single page version of content for printing purposes, also ideally suited for WebCite® archiving. Note that unless the copyright holder uses a liberal license such as Creative Commons, you must obey to the fair use principle. Archiving multiple pages or an entire website may not be allowable.

Why do some archived pages appear to 'pop out' of the WebCite® interface immediately after they load?

Page authors frequently use a bit of JavaScript trickery to ensure that their pages aren't viewed within a frame. This is done to discourage the 'wrapping' of their content without the permission, and is commonly used by many web sites (particularly news and reference sites). Because the WebCite® interface uses frames, the archived pages detect this as a wrapping attempt, and use JavaScript to 'pop out' of the WebCite® interface.

Why are some pages not archived successfully?

A page may not be archived for a number of reasons. The page owner may specifically prohibit archiving of their content through no-cache / no-archive tags, or via a robot exclusion policy on their site. The content may be inaccessible from the WebCite® network (this is particularly likely if you are attempting to access subscription based content which your institution subscribes to on its users' behalf). Also, the content may be unreadable by the WebCite® archiver (complex JavaScript based pages, or ones involving browser checks sometimes cause our archive engine to fail).

What if I need more detailed information about how to use WebCite® or about how WebCite® works?

The WebCite® has created a WebCite Technical Background and Best Practices Guide just for this purpose.