OpenDocument

From Example Problems
Jump to navigation Jump to search

The OpenDocument format (ODF), short for the OASIS Open Document Format for Office Applications, is an open document file format for saving and exchanging editable office documents such as text documents (including memos, reports, and books), spreadsheets, charts, and presentations. This standard was developed by the OASIS industry consortium, based upon the XML-based file format originally created by OpenOffice.org.

The standard was publicly developed by a variety of organizations, is publicly accessible, and can be implemented by anyone without restriction. The OpenDocument format is intended to provide an open alternative to proprietary document formats including the popular but undocumented DOC, XLS, and PPT formats used by Microsoft Office, as well as Microsoft Office Open XML format (this latter format has various licensing requirements that prevent some competitors from using it). Organizations and individuals that store their data in an open format such as OpenDocument avoid being locked in to a single software vendor, leaving them free to switch software if their current vendor goes out of business, raises their prices, changes their software, or changes their licensing terms to something less favorable.

OpenDocument is the only standard for editable office documents that has been vetted by an independent recognized standards body, has been implemented by multiple vendors, and can be implemented by any supplier (including proprietary software vendors as well as developers using the non-proprietary GNU GPL).

Public policy implications

Since one objective of open formats like OpenDocument is to guarantee long-term access to data without legal or technical barriers, governments have become increasingly aware of open formats as a public policy issue. For example, in 2002, Dr. Edgar David Villanueva Nuñes, a lawyer and Congressman of the Republic of Perú, wrote a letter to Microsoft Peru raising questions about free and permanent document access with proprietary formats. Europe and Massachusetts in particular have been examining the ramifications of selecting a document format.

Europe

European governments have, since at least 2003, been investigating various options for storing documents in an XML-based format, commissioning technical studies such as the "Valoris Report" (Valoris). In March 2004, European governments asked an OpenOffice team and a Microsoft team to present on the relative merits of their XML-based office document formats (Bray, September 29 2004).

In May 2004, the Telematics between Administrations Committee (TAC) issued a set of recommendations, in particular noting that, "Because of its specific role in society, the public sector must avoid [a situation where] a specific product is forced on anyone interacting with it electronically. Conversely, any document format that does not discriminate against market actors and that can be implemented across platforms should be encouraged. Likewise, the public sector should avoid any format that does not safeguard equal opportunities to market actors to implement format-processing applications, especially where this might impose product selection on the side of citizens or businesses. In this respect standardisation initiatives will ensure not only a fair and competitive market but will also help safeguard the interoperability of implementing solutions whilst preserving competition and innovation." It then issued recommendations, including:

  • Industry actors not currently involved with the OASIS Open Document Format consider participating in the standardisation process in order to encourage a wider industry consensus around the format;
  • Microsoft considers issuing a public commitment to publish and provide non-discriminatory access to future versions of its WordML specifications;
  • Microsoft should consider the merits of submitting XML formats to an international standards body of their choice;
  • The public sector is encouraged to provide its information through several formats. Where by choice or circumstance only a single revisable document format can be used this should be for a format around which there is industry consensus, as demonstrated by the format's adoption as a standard. (TAC, May 25 2004)

OpenDocument is already a standard by a recognized independent standards body (OASIS), and is being submitted to ISO for standardization, while there is no evidence that the Microsoft XML formats or the older DOC/PPT/XLS formats will go through such a process. Many expect ISO will accept and approve OpenDocument using its fast-track process, and that once ISO ratifies the standard, the European Union will require OpenDocument as the office suite standard for the European Union. (Marson, October 18 2005)

Massachusetts

Massachusetts has also been examining its options for implementing XML-based document processing. In early 2005, Eric Kriss, Secretary of Administration and Finance in Massachusetts, was the first government official in the United States to publicly connect open formats to a public policy purpose: "It is an overriding imperative of the American democratic system that we cannot have our public documents locked up in some kind of proprietary format, perhaps unreadable in the future, or subject to a proprietary system license that restricts access." [1]

At a September 16 2005 meeting with the Mass Technology Leadership Council Kriss stated that he believes this is fundamentally an issue of sovereignty. [2] While supporting the principle of private intellectual property rights, he said sovereignty trumped any private company's attempt to control the state's public records through claims of intellectual property. [3]

Subsequently, in September 2005, Massachusetts became the first state to formally endorse OpenDocument formats for its public records and, at the same time, reject Microsoft's proprietary XML format, now named Microsoft Office Open XML format (see WordprocessingML). This decision was made after a two-year examination of file formats, including many discussions with Microsoft, other vendors, and various experts. Microsoft Office, which has a nearly 100% market share among the state's employees, does not currently support OpenDocument formats. Microsoft has indicated that OpenDocument formats will not be supported in new versions of Office, even though they support many other formats (including ASCII, RTF, and WordPerfect), and analysts believe it would be easy for Microsoft to implement the standard. If Microsoft chooses not to implement OpenDocument, Microsoft will disqualify themselves from future consideration. Several analysts (such as Ovum) believe that Microsoft will eventually support OpenDocument.

After this announcement by Massachusetts supporting OpenDocument, a large number of people and organizations spoke up about the policy, both pro and con (see the references section). Adobe, Corel, IBM, and Sun all sent letters to Massachusetts supporting the measure. In contrast, Microsoft sent in a letter highly critical of the measure. A group named "Citizens Against Government Waste" (CAGW) also opposed the decision. The group claimed that Massachusetts' policy established "an arbitrary preference for open source," though both open source software and proprietary software can implement the specification, and both kinds of developers were involved in creating the standard (CAGW, 2005). Many considered this group's statement as simply a paid statement by Microsoft; InternetNews and Linux Weekly News noted that CAGW has received funding from Microsoft, and that in 2001 CAGW was caught running an astroturfing campaign on behalf of Microsoft when two letters they submitted supporting Microsoft in Microsoft's anti-trust case, were found to have the signatures of deceased persons (Linux Weekly News). James Prendergast, executive director of a coalition named "Americans for Technology Leadership" (ATL), also criticized the state's decision in a Fox News article (Prendergast 2005). In the article, Prendergast failed to disclose that Microsoft is a founding member of ATL. Fox News later published a follow-up article disclosing that fact (FOX News, 2005; Jones, September 29 2005).

Other governments

According to OASIS' OpenDocument datasheet, "Singapore's Ministry of Defense, France's Ministry of Finance and its Ministry of Economy, Finance, and Industry, Brazil's Ministry of Health, the City of Munich, Germany, UK's Bristol City Council, and the City of Vienna in Austria are all adopting applications that support OpenDocument." (OASIS, 2005b).

In November 2005 James Gallt, associate director for the National Association of State Chief Information Officers, said that a number of other state agencies are also exploring the use of OpenDocument (LaMonica, November 10, 2005).

BECTA (British Education Communication Technology Agency) is the UK agency in charge of defining information technology (IT) policy for all schools in the United Kingdom, including standards for all the schools' infrastructure. In 2005 they published a comprehensive document describing the policy for infrastructure in schools. This document requires the use of OpenDocument or a few other formats for office document data, and in particular it does not allow the use of Microsoft's binary (.doc/.xls/.ppt) or XML formats. BECTA explains this as follows: "Any office application used by institutions must be able to be saved to (and so viewed by others) using a commonly agreed format that ensures an institution is not locked into using specific software. The main aim is for all office based applications to provide functionality to meet the specifications described here (whether licensed software, open source or unlicensed freeware) and thus many application providers could supply the educational institution ICT market." (Lynch, 2005).

Standardization

Process

Version 1.0 of the OpenDocument specification was developed after lengthy development and discussion by multiple organizations. The first official OASIS meeting to discuss the standard was December 16 2002; OASIS approved OpenDocument as an OASIS standard on May 1 2005. The group decided to build on an earlier version of the OpenOffice.org format, since this was already an XML format with most of the desired properties, and had been in use since 2000 as the program's primary storage format (demonstrating its utility). Note, however, that OpenDocument is not the same as the older OpenOffice.org format; many changes and lessons learned were incorporated based on the feedback from many different individuals and companies.

According to Gary Edwards, a member of the OpenDocument TC, the specification was developed in two phases. Phase one (which lasted from November of 2002 through March of 2004), had the goal of ensuring that the OpenDocument format could capture all the data from a vast array of older legacy systems. Edwards expressed this goal as perfecting "the Open Document XML as a transformation layer" (a universal intermediate format) where "interoperability with legacy information systems was our primary concern." This considered "at least 30 years of legacy information systems that cross an incredible spectrum of information and file format types," including various versions of Microsoft Office and many other products and formats as well. Phase Two focused on Open Internet based collaboration. (Einfeldt, 2005).

Participants

The standardization process included the developers of many office suites or related document systems, including (in alphabetical order):

Notably absent from the group of active participants was Microsoft, especially since Microsoft is a member of OASIS and is the dominant vendor of office suite software. This absence was in spite of the European Union's TAC (Telematics between Administrations Committee) 2004 request for all industry actors to consider participating in the OASIS Open Document Format work (TAC, 2004). Instead, Microsoft decided to only develop their own incompatible format, without external input or review. Due to this lack of widespread independent and public review of Microsoft's format, many are concerned that Microsoft's format will be harder for others to implement or that Microsoft's format lacks important capabilities compared to OpenDocument. For example, the European Union commissioned a report (Valoris, 2004) which noted that, "It is quite trivial to add elements to an XML document that place processing requirements and restrictions on the document, thus preventing cross-platform processing capability... While properly developed XML should in theory be platform-neutral, experience has shown that vendors who wish to maintain and protect their platform's market will go to extents to encode elements that are capable of being processed only by their own application suites. The only counter-balance to this natural force is the development of open, cross-industry, widely adopted standards that serve to block the inclusion of application or platform specific encoding." Microsoft also imposes additional license conditions on users of their format; many believe these additional license conditions inhibit competition, as discussed below.

The OpenDocument standardization process also included many document users, especially those with the need to handle complex documents or to be able to retrieve documents for long periods of time after their development. Document-using organizations who initiated or were involved in the standardization process included (alphabetically):

As well as having many formal members, draft versions of the specification were released to the public and subject to worldwide review. Many others, who were not formal members of the standardization committee, submitted comments to the committee. These external comments were then adjudicated publicly by the committee.

Next Steps

OASIS has submitted the OpenDocument standard to a joint technical committee of the International Organization for Standardization ISO and the International Electrotechnical Commission (IEC) for approval as an international ISO/IEC standard. ISO spokesman Roger Frost stated that the committee will send the specification out to its members, probably at the end of this month, and they will have five months to study and vote on it (Sayer, 2005). Many expect that OpenDocument's broad support and demonstrated open development process will result in quick passage of OpenDocument as an ISO/IEC standard. OASIS is one of the few organizations which has been granted the right to propose standards directly to ISO as a proposed "publicly available specification" (PAS). This process is specifically designed to fast-track public specifications into becoming ISO standards when they have already been developed in an open manner. OpenDocument advocates note that, in contrast, there is no evidence that the competing Microsoft XML formats or the older DOC/PPT/XLS formats will go through an independent standardization process to be standardized. The older DOC/PPT/XLS formats are not even publicly specified, which is one reason why documents written in these formats sometimes cannot be read by later versions of the same office suite.

Gary Edwards, a member of the OpenDocument TC, says that after ISO standardization, "there is no doubt in my mind that OpenDocument is heading to the W3C for ratification as the successor to HTML and XHTML." (Einfeldt, 2005). The W3C has not made any public statements supporting or denying this statement, however.

Licensing

The OpenDocument specification is available for free download and use [4]. An irrevocable intellectual property covenant made by key contributor Sun Microsystems [5] is the only IPR Statement connected with the specification, providing all implementers with the guarantee that it contains no material that necessitates licensing from any author. Reciprocal, royalty-free licensing terms are being promoted by some standards developing organizations, such as the W3C and OASIS, as a method for avoiding conflict over intellectual property concerns while still promoting innovation. See also software patent debate. In short, anyone can implement OpenDocument, without restraint, and as shown below both proprietary and open source software programs implement the format.

All of this is in contrast with the competing "Microsoft Office Open XML" developed by Microsoft. Microsoft has released their format royalty-free, but with additional conditions not imposed by OpenDocument. Independent analysts have stated that Microsoft's licensing requirements will prevent many competitors from ever implementing Microsoft's format. The extent of this incompatibility is the source of significant controversy between Microsoft and other parties. The text below attempts to capture these differences, since they are often one of the reasons people consider using OpenDocument.

Microsoft states, in their FAQ, that they believe that some open source software licenses are compatible with their license, and that if a developer believes that some license is in conflict, they must "choose other forms of open source licenses." Microsoft has not publicly issued its opinion about the compatibility of any particular open source software license. However, several independent analysts have determined that the legal obligations for the Microsoft format are such so it cannot be used by competing programs licensed under the GNU General Public License (GPL), and possibly many other open source software / Free-libre software licenses as well. This is important because the GPL is the most popular license by far for open source software. In particular, the GPL is used by many competing office applications such as the entire KOffice office suite, the Gnumeric spreadsheet program, and the Abiword word processor. Microsoft is well aware of widespread use of the GPL license by many of its competitors; at one time Microsoft CEO Ballmer referred to Linux as a "cancer" because of the effects of the GPL (the license the Linux kernel uses) (Greene, 2001). Thus, many independent analysts believe that Microsoft's license terms are designed to inhibit competition, in spite of Microsoft's claims otherwise. Some of these concerns are described as follows:

  • Richard Stallman, president of the Free Software Foundation and the author of the GPL, states that Microsoft's license was "designed to prohibit all free software. It covers only code that implements, precisely, the Microsoft formats, which means that a program under this license does not permit modification... The freedom to modify the software for private use and the freedom to publish modified versions are two of the essential components in the definition of free software. If these freedoms are lacking, the program is not free software." Thus, it would violate the GPL. (Galli, 2005)
  • Jean Paoli, senior director of XML architecture for Microsoft, acknowledged that their attribution requirement might preclude any program that uses the file formats from being used in Linux and other open-source software licensed under the GPL. Microsoft's license requires developers who use Office Open XML Formats to attribute the use of the file format in their code. Paoli admitted, "The GPL may not allow code that is attributable to another company like Microsoft to be included." (Galli, 2005)
  • Dan Ravicher, executive director of the Public Patent Foundation, says that "If [Microsoft has] rights and a license is needed, then the term in the license that requires attribution by the licensee of all of its downstream licenses is, in fact, not compatible with the GPL." (Galli, 2005)
  • User "gustl" on Brian Jones' blog stated on September 6 2005, that OpenDocument was far more open than Microsoft's format. He stated that OpenDocument can be implemented by any implementor, even using the GPL or BSD licenses. He argued that the "may not sublicense" clause covering Microsoft's format "effectively prohibits any open source project from using [their] specifications." He argues that Microsoft's XML license is prohibitively restrictive, while OpenDocument's license permits any competitor to implement the format. (Jones, 2005).
  • Groklaw posted a legal analysis by Marbux, a retired lawyer, whose detailed analysis found that Microsoft's specification excluded competition, in contrast with Microsoft's public claims. "Competitors are... effectively precluded from bidding against Microsoft or its suppliers for any... contract specifying use of Microsoft's software file formats." He first noted that the patent license for the format "is structured to be read restrictively, in Microsoft's favor... it states that: 'All rights not expressly granted in this license are reserved by Microsoft. No additional rights are granted by implication or estoppel or otherwise.' This is not the customary 'all rights reserved' phrase more commonly encountered... If you cannot find words in the license explicitly stating that you have the right to do something, you don't get that right." Then, by examining the patent license in detail, he found a number of omissions and conditions that suppress competition: there is no integration clause, no license for the schemas themselves, no grant of copyright was included in the patent license, no commitment to delivering any future changes to the schemas or right to develop software implementing them under the same or more liberal license (this particular issue may have been resolved later by Microsoft), no identification of the Microsoft patents involved, no identification of third-party patents, no right to sell or sublicense implementing software, a prohibition against sale and licensing of implementing software, a prohibition against software having functions other than to read and write files using the specification without modification, no license to convert files to and from other formats, no right to write files using the schemas, vagueness and ambiguities will deter implementation by developers and adoption by end users, and a discriminatory incompatibility with F/OSS licensing, and discriminatory incompatibility with proprietary software competitors. In short, he believes Microsoft's license prohibits effective competition from using the format." (Marbux, 2005)
  • David Berlind of ZDNet notes that the technical proviso in Microsoft's license that says, "You are not licensed to sublicense or transfer your rights" is a deal breaker. "Included in the notion of state sovereignty is the right of the state's agencies, employees, contractors and citizens to choose any type of software they want to read or write public documents. By not allowing its license to be transferred or sublicensed, Microsoft's patent license automatically prevents just about all open source software -- including OpenOffice.org -- from supporting Microsoft's XML formats." Berlind notes that the Internet Engineering Task Force (IETF) e-mail sender authentication standards (to combat spam) and the OASIS specification's WS-Security have both foundered because some organizations would not permit sublicensing or transfer. (Berlind, October 17 2005)
  • Microsoft's Yates wrote, "Our license may not be compatible with the GPL, but it is compatible with many other open source licenses." (Berlind, October 17 2005)
  • Larry Rosen, author of a book on licensing of open source software, states that provisions that prevent sub-licensing and transferability are antithetical to open source. "[The Microsoft license] not only prevents transfer or sublicensing of the patent rights," said Rosen, "but it also requires that open source developers put Microsoft's patent notices in our licenses." These are terms that open source developers find to be unacceptable. For example, Rosen disputes Microsoft's claims of broad compatibility, stating that, "Open source depends on the right to sub-license... Among the licenses that are explicitly sublicenseable are the MIT, MPL, CPL, Apache 2.0, OSL/AFL, and all licenses derived from them. That's most, I believe. Microsoft's patent license is incompatible with all of them." He also stated, "The Microsoft license is incompatible with any open source license that explicitly authorizes sublicensing and is incompatible with open source processes that as of matter of practice do sublicensing. Every open source project operates on the basis that sublicensing is allowed. That's how open source works, even if not every license says so explicitly." (Berlind, October 17 2005)
  • The Computer and Communications Industry Association (CCIA) states that "Microsoft's disclosure... is inadequate for interoperability because it omits critical information necessary for full interoperability... some of the items stored in the fields of the schema are Microsoft-proprietary data, and Microsoft has not disclosed the information necessary for others to interpret and use those data... no one but Microsoft is able to create and consume the data in some parts of the schema, making the schema unusable for full interoperability." (CCIA, July 2, 2004).

Microsoft has stated that it has been granted a number of patents related to its format, and that it may have more pending. Microsoft states that it offers royalty-free rights both to its issued patents and patents that may be issued in the future as an outcome of the patent process in order to implement the Office 2003 XML Reference Schemas. However, these patents can be used to force anyone to strictly adhere to their license, and as noted above, many people have analyzed the license in detail and concluded that the license inhibits competition. The most common open source software license (the GPL) forbids these kinds of limitations; if software is included, it must be usable for any purpose. There is also concern by some that Microsoft could change its licensing terms at any time; no contract actually binds Microsoft to these terms. Microsoft did restate in a clarification that their terms were offered in perpetuity, but since no enforceable contract was signed, there appears to still be some suspicion. These concerns about patents were raised in part because formerly secret Microsoft documents (known as Halloween documents I and II), which were developed in collaboration with key people in Microsoft, recommended that Microsoft suppress competition by "de-commoditizing" protocols (creating proprietary formats that could not be used by others) and by attacking competitors through patent lawsuits.

Dan Ravicher argues that Microsoft's licenses may not be valid, saying, "we should not presume Microsoft has any valid rights here." For example, one of the relevant patents was a patent Microsoft was granted covering the conversion of programming objects into XML files, based on a filing by Microsoft on June 2001. However, only a week after the announcement of the patent, independent analysts found that SXP, an open source software library for converting C++ programming objects into XML files, was made available on Sourceforge in February 2000. Since SXP's release predates Microsoft's filing, many believe Microsoft's patent is invalidatable in court due to the existence of prior art. Ravicher and others speculate this may be true for all the patents; patent offices have no database for examining software patent claims, and spend very little time examining patent claims, so there is general consensus that many invalid software patents are granted (Galli, 2005). However, since software patent litigation typically costs millions of dollars, invalidatable patents can still be used to intimidate and inhibit competition if the patent-holder chooses to do so.

After discussions with the European Union and Massachusetts, Microsoft issued a clarification. In particular, in the clarification Microsoft stated that, "We are acknowledging that end users who merely open and read government documents that are saved as Office XML files within software programs will not violate the license." However, observers quickly noted that this exception only applied to government documents (not other documents) and only for opening and reading them (not for writing them, and possibly not for printing them or translating them to another format). Neither governments nor software developers want formats that are limited for use only by governments; it is much better to have a single format for any such data. This exemption would not by itself permit open source software implementations, since the Open Source Definition forbids discrimination against persons (including non-government personnel), groups, or fields of endeavor; this exemption also contradicts the Free Software definition, which requires as freedom 0 the "freedom to run the program for any purpose". Also, the whole point of these formats is to permit editing, not just reading them; for read-only documents, other formats such as PDF tend to be used instead. If the term "reading" is interpreted as applying only to humans, then this grant is even more limited (prohibiting printing and transforming), but even a broad interpretation is limiting since it does not grant the privilege to write the format. Thus, independent analysts reported that none of these clarifications addressed the concern that Microsoft's XML format cannot be used by many of Microsoft's competitors, while OpenDocument can be used by anyone -- both Microsoft and its competitors.

Promotion

OASIS promotes OpenDocument (since it is their work). In October 2005 the Open Document Fellowship was founded with the aim of "[supporting] the work of community volunteers in promoting, improving and providing user assistance for the OASIS Open Document Format for Office Applications (OpenDocument) and software designed to operate on data in this format." It was founded by Friends of OpenDocument Inc., an incorporated association in the State of Queensland, Australia. [6] Some early reports incorrectly stated that it was founded by OASIS [7]. Other promotional websites include friendsofopendocument.org and spreadopendocument.org.

On November 4, 2005, IBM and Sun Microsystems convened the "OpenDocument (ODF) Summit" in Armonk, N.Y., to discuss how to boost OpenDocument adoption. The ODF Summit brought together representatives from several industry groups and technology companies, including Oracle, Google, Adobe, Novell, Red Hat, Computer Associates, Corel, Nokia, Intel, and Linux e-mail company Scalix. (LaMonica, November 10, 2005). The providers committed resources to technically improve OpenDocument through existing standards bodies and to promote its usage in the marketplace, possibly through a stand-alone foundation.

Applications supporting OpenDocument

Current support

A number of office suite applications currently support OpenDocument; listed alphabetically they include:

  • Abiword 2.4 (reading)
  • Aukyla Document Management System 2.0, lightweight web-based document management system. Has OpenDocument viewer and indexing functions [8]
  • DocMgr 0.53.3, full featured document management system. Included search engine indexes OpenDocument files. [9]
  • docvert, web service software takes multiple word processor files (typically .doc) and converts them to OpenDocument (builds on OpenOffice.org). [10]
  • eZ publish 3.6, with OpenOffice extension
  • Google Desktop Search has an OpenDocument plug-in available, supporting ODT, OTT, ODG, OTG, ODP, OTP, ODS, OTS, and ODF OpenDocument formats. [11]
  • IBM Workplace
  • Knomos case management 1.0 [12]
  • KOffice 1.4.2, released on October 11 2005
  • ooo-word-filter, a plugin for Microsoft Word 2003 XML to open OpenOffice XML documents (alpha stage)
  • OpenOffice.org 1.1.5 (reading) and 2.0 (reading and writing)
  • OpenOpenOffice, a plug-in for Microsoft Office so it can read and write OpenDocument (alpha, expected end of November 2005)
  • Scribus 1.2.2, imports OpenDocument Text and Graphics
  • Sun StarOffice 8, proprietary commercially-supported product that reads and writes OpenDocument; based on OpenOffice.org
  • TextMaker 2005 beta [13]
  • Visioo Writer 0.6 [14]
  • Gnumeric Incomplete support for reading and writing OpenDocument Spreadsheet.

Microsoft's letter to Massachusetts claimed that all current OpenDocument implementations were based on OpenOffice.org and its derivatives. However, this turns out to be untrue. For example, KOffice is a completely independent implementation of OpenDocument not based on OpenOffice.org -- their main functions have been implemented independently, and even their code for reading and writing the OpenDocument format was developed independently (Wallin, 2005). This is important, because independent implementations from the same specification are generally considered the best way to find and fix any problems in a specification. For example, the IETF even requires two independent implementations for its final stage of standardization.

The first application to implement OpenDocument was KOffice. OpenDocument was developed starting from an XML format developed for OpenOffice.org; OpenOffice.org has since been updated so that it also supports OpenDocument.

Corel WordPerfect status

Corel's WordPerfect office suite may release support for OpenDocument, even though they have not yet made a formal announcement. Corel is an original member of the OASIS Technical Committee on the Open Document Format, and Paul Langille, a senior Corel developer, is one of the original four authors of the OpenDocument specification. Also, Corel sent a letter to Massachusetts supporting their selection of OpenDocument, saying, "Corel strongly supports the broad adoption of the open standards Massachusetts has outlined, including XML, the OASIS Open Document Format and PDF.... Corel remains committed to working alongside OASIS and other technology vendors to ensure the continued evolution of the ODF standard and the adoption of open standards industry-wide." [15] Many find it improbable that Corel would invest so much effort, encourage mandating the OpenDocument format, and say that they will work to ensure industry-wide adoption of OpenDocument, without implementing it themselves.

At the September 16 2005 "Town Meeting," an IBM representative said that they were implementing OpenDocument and that Corel was also actively implementing OpenDocument. Steven J. Vaughan-Nichols's eWeek article of September 26 2005, states without caveats that Corel is actively implementing OpenDocument in their WordPerfect suite. On September 28 2005, he clarified further that Corel's WordPerfect "will soon be supporting the OpenDocument format", noting that while "Corel won't commit to a date for adding OpenDocument to WordPerfect, the company made it clear that it is working towards that goal."

A month later, on October 18 2005, a Corel representative described a different position in an interview for BetaNews [16]: they do not see OpenDocument format support as a priority for them just now, and cannot even evaluate the time it would need for them to support it, if ever. This report was immediately questioned; Berlind later reported that Corel "confirms OpenDocument commitment". (Berlind, October 25 2005). In November Corel was part of the ODF Summit, which was organized to promote the use of OpenDocument (LaMonica, November 10, 2005).

Microsoft

For most of 2005 Microsoft had publicly stated that it did not plan to support OpenDocument. Its stated rationale was that OpenDocument is missing some important functionality, though it has not identified any particular missing functionality (making this claim difficult to prove or refute). Many are very sceptical of this claim; ZDNet said, "Does OpenDocument, which is the result of a lot of hard work from people fully versed in contemporary corporate computing, really fail at the very things it was designed to provide?", and closes urging Microsoft to add support for OpenDocument (ZDNet UK, September 2 2005). InfoWorld's Neil McAllister noted that even if OpenDocument were missing important functionality, this statement is inconsistent; Microsoft Office already supports formats with far less functionality than OpenDocument (such as HTML and ASCII text). Instead, he believes that the real reason Microsoft will not support OpenDocument (so far) is because "An open document standard won't help Microsoft lock in its loyal addicts -- excuse me, customers -- so an open standard isn't in Microsoft's business interests. Microsoft refuses to support OpenDocument; it doesn't get more bald-faced than that" (McAllister 2005).

A Boston Globe article quoted Peter Quinn of Massachusetts saying that the state could implement OpenDocument without abandoning Microsoft Office: "We are not asking anybody to take anything off their desktop." Instead, they plan to modify an estimated 50,000 computers with software that would let Office users store their files in the OpenDocument format, instead of Microsoft's proprietary format, if Microsoft continues to refuse to support the format (Bray, September 23 2005).

Recent reports suggest that Microsoft is considering supporting OpenDocument in the future; at this time it has not committed itself either way. Nick Tsilas, a Senior Attorney at Microsoft, said that, "features are dictated by customer demand and, until the Massachusetts-related activity occurred, Open Document was not even on our radar screens." This is a surprising revelation, because in 2004 the European Union directed all parties (including Microsoft) to get involved with the OpenDocument standard. Microsoft General Manager of Information Worker Business Strategy Alan Yates confirmed that this was the company position; "For us this has been, and will continue to be a matter of evaluating the flow of customer requirements, and this is a new issue." (Updegrove, 2005)

On 25 September 2005, Alan Joch of Federal Computer Week reported that Microsoft has changed its stance and that its next Office release will support OpenDocument, though not natively. This "means users would have to select that format option every time they save a file." (Joch, 2005) As of this time this report has not been independently confirmed, however, and other reports suggest this is still merely being considered.

On October 25 2005, Dan Farber reported on his conversation with Microsoft CTO Ray Ozzie. "Ozzie told me that supporting ODF in Office isn't a matter of principle. Microsoft isn't opposed to supporting other formats. ... Ozzie attributed the tentativeness on ODF support in Office to resource allocation issues... Microsoft is working with a French company on translators to determine the scope of the problem in exporting Office documents to ODF." Farber then speculated, "It sounds to me that support for 'Save As' ODF in Office is a 'when,' not an 'if'" (Farber, October 25 2005).

Groklaw readers believe they have traced this unnamed "French company" as Clever Age (http://www.clever-age.com), who is developing a translator named ooo-word-filter. This project translates from an OpenOffice format into WordML. It is currently very incomplete (only a few constructs are translated). It is unclear if the "OpenOffice format" it reads is OpenDocument or the old .sxi format, and it appears to generate WordML (the Office 2003 XML format) instead of the incompatible Open XML format to be used by Office 12. Note that it only reads the OpenOffice.org format (it cannot generate it), nor does it cover the OpenDocument features outside of Word Processing. (Jones, October 27 2005).

Note that there are many other mechanisms for using Microsoft Office to support OpenDocument. Any office suite that can read and write both Microsoft Office binary formats and OpenDocument can be used as a translator. docvert translates to OpenDocument, and ooo-word-filter is a plug-in for Microsoft Word for the 2003 XML format. Those who want to use Microsoft Office without exiting the suite, yet use all of OpenDocument, are likely to consider using OpenOpenOffice -- discussed next.

Phase-n OpenOpenOffice Plug-in for Microsoft Office

Phase-n is developing OpenOpenOffice ("O3"), a open source software plug-in for Microsoft Office. With this free plug-in, Microsoft Office will be able to read and write OpenDocument documents (and any other formats supported by OpenOffice.org). Instead of installing a complete office application or even a large plug-in, O3 will install a tiny plug-in to the Microsoft Office system. This tiny plug-in would automatically send the file to some server, which would then do conversions and send it back. The server could be local to an organization (so private information won't go over the Internet) or accessed via the Internet (for those who do not want to set up a server).

The plug-in is expected to be available by the end of November 2005.

Phase-n argues that the main advantage of their approach is simplicity. Their website announces that O3 "requires no new concepts to be explored, no significant development, and leverages the huge existing body of work already created by the OpenOffice.org developers, the CPAN module authors, and the Microsoft .NET and Office teams. Initial ballpark estimates are for less than 2,000 lines of code and only a few hundred hours of development time to get to an initial stable release of the O3 client and server." They also argue that this approach significantly simplifies maintenance; when a new version of OpenOffice.org is released, only the server needs to be upgraded. It developers have acknowledged that it would be easy to add support for calling a local installation of OpenOffice.org as well, and may add that capability after the plug-in's initial release.

The OpenOpenOffice project is a partnership between the software industry group Open Source Victoria, the technology company Phase N Australia, and the wider Open Source community. Open Source Victoria was convened by Con Zymaris and includes more than 100 Victorian firms and developers (Varghese, 2005).

Other Planned Support

The general manager of Software602 reports that they plan to release a new version of their commercial office suite, currently named 602PC Suite, as 602Office 2. The product 602Office 2 will be based on OpenOffice.org 2, so it will include native support for OpenDocument.

JustSystem is the producer of the Ichitaro office suite (the second most common Japanese office software). JustSystem has announced that they are developing a plug-in module for both reading and writing the OpenDocument format, for release by Summer 2006 (JustSystem, 2005).

Programmatic Support

OpenDocument is an ordinary Java archive (JAR) containing standard XML files. JAR files are simply a set of files compressed together using the zip file format. Thus, any of the vast number of tools for handling zip/jar files and XML data can be used to handle OpenDocument. Nearly all programming languages have libraries (built-in or available) for processing XML files and zip files.

The following are programs or programming libraries that provide specialized support for OpenDocument:

  • com.catcode.odf.OpenDocumentTextInputStream is a Java class by J. David Eisenberg which extracts the text information from an OpenDocument text file. It extracts only the text within <text:p> and <text:h>, unless they are in <text:tracked-changes> (i.e., it automatically handles tracked changes). The lists of "capture" and "omit" elements is user-selectable.
  • ODT_to_XHTML is a Java program by J. David Eisenberg which converts OpenDocument files directly into XHTML. Note that many office suites can do this as well, by loading the OpenDocument file and then doing as a "Save As" (X)HTML.
  • Some Perl extensions for OpenDocument file processing are available on CPAN packages, such as OpenOffice::OODoc, OpenOffice::OOCBuilder, OpenOffice::OOSheets, PBib::Document::OpenOffice, and others. These libraries allow Perl programs to retrieve, create, update or delete almost any piece of data (including text content, non-textual objects, and style definitions) in documents, and to create new documents from scratch.

File types

The recommended file extensions and MIME types are included in the official standard (OASIS, May 1 2005).

Documents

The most common file extensions used for OpenDocument documents are .odt for text documents, .ods for spreadsheets, .odp for presentation programs, .odg for graphics and .odb for database applications. These are easily remembered by considering ".od" as being short for "OpenDocument", and then noting that the last letter indicates its more specific type (such as t for text). Here is the complete list of document types, showing the type of file, the recommended file extension, and the MIME:

File type Extension MIME Type
Text .odt application/vnd.oasis.opendocument.text
Spreadsheet .ods application/vnd.oasis.opendocument.spreadsheet
Presentation .odp application/vnd.oasis.opendocument.presentation
Drawing .odg application/vnd.oasis.opendocument.graphics
Chart .odc application/vnd.oasis.opendocument.chart
Formula .odf application/vnd.oasis.opendocument.formula
Database .odb application/vnd.oasis.opendocument.database
Image .odi application/vnd.oasis.opendocument.image
Master Document .odm application/vnd.oasis.opendocument.text-master

Templates

OpenDocument also supports a set of template types. Templates represent formatting information (including styles) for documents, without the content themselves. The recommended filename extension begins with ".ot" (which can be viewed as short for "OpenDocument template"), with the last letter indicating what kind of template (such as "t" for text). The supported set are:

File type Extension MIME Type
Text .ott application/vnd.oasis.opendocument.text-template
Spreadsheet .ots application/vnd.oasis.opendocument.spreadsheet-template
Presentation .otp application/vnd.oasis.opendocument.presentation-template
Drawing .otg application/vnd.oasis.opendocument.graphics-template
Chart template .otc application/vnd.oasis.opendocument.chart-template
Formula template .otf application/vnd.oasis.opendocument.formula-template
Image template .oti application/vnd.oasis.opendocument.image-template
Web page template .oth application/vnd.oasis.opendocument.text-web

Capabilities

As noted above, the OpenDocument format can describe text documents (e.g., those typically edited by a word processor), spreadsheets, presentations, drawings/graphics, images, charts, mathematical formulas, databases, and "master documents" (which can combine them). It can also represent templates for many of them.

The official OpenDocument standard (OASIS, May 1 2005) defines OpenDocument's capabilities. Haumacher (2005) provides a hyperlinks formal specification (Haumacher, 2005) derived from the official standard. Eisenberg (2005)'s book describes the format in more detail. The text below provides a brief summary of the format's capabilities.

Metadata

The OpenDocument format supports storing metadata (data about the data) by having a set of pre-defined metadata elements, as well as allowing user-defined and custom metadata. The predefined metadata are: Generator, Title, Description, Subject, Keywords, Initial Creator, Creator, Printed By, Creation Date and Time, Modification Date and Time, Print Date and Time, Document Template, Automatic Reload, Hyperlink Behavior, Language, Editing Cycles, Editing Duration, and Document Statistics.

Content

OpenDocument's text content format supports both typical and advanced capabilities. Headings of various levels, lists of various kinds (numbered and not), numbered paragraphs, and change tracking are all supported. Page sequences and section attributes can be used to control how the text is displayed. Hyperlinks, ruby text (which provides annotations and is especially critical for some languages), bookmarks, and references are supported as well. Text fields (for autogenerated content), and mechanisms for automatically generating tables such as tables of contents, indexes, and bibliographies, are included as well.

In the OpenDocument format, spreadsheets are an example of a set of tables. Thus, there are extensive capabilities for formatting the display of tables and spreadsheets. Database ranges, filters, and data pilots (known to Excel users as "pivot tables") are also supported. Change tracking is available for spreadsheets as well.

The graphics format supports a vector graphic representation, in which a set of layers and the contents of each layer is defined. Available drawing shapes include Rectangle, Line, Polyline, Polygon, Regular Polygon, Path, Circle, Ellipse, and Connector. 3D Shapes are also available; the format includes information about the Scene, Light, Cube, Sphere, Extrude, and Rotate (it is intended for use as for office data exchange, however, and not sufficient to represent movies or other extensive 3D scenes). Custom shapes can also be defined.

Presentations are supported. Animations can be included in presentations, with control over the Sound, showing a shape or text, hiding a shape or text, or dimming something, and these can be grouped. In OpenDocument, much of the format capabilities are reused from the text format, simplifying implementations.

Charts define how to create graphical displays from numerical data. They support titles, subtitles, a footer, and a legend to explain the chart. The format defines the series of data that is to be used for the graphical display, and a number of different kinds of graphical displays (such as line charts, pie charts, and so on).

Forms are specially supported, building on the existing XForms standard.

Formatting

The style and formatting controls are numerous, providing a number of controls over how information is displayed.

Page layout is controlled by a variety of attributes. These include page size, number format, paper tray, print orientation, margins, border (and its line width), padding, shadow, background, columns, print page order, first page number, scale, table centering, maximum footnote height and separator, and many layout grid properties.

Headers and footer can have defined fixed and minimum heights, margins, border border line width, padding, background, shadow, and dynamic spacing.

There are many attributes for specific text, paragraphs, ruby text, sections, tables, columns, lists, and fills. Specific characters can have their fonts, sizes, and other properties set. Paragraphs can have their vertical space controlled through attributes on keep together, widow, and orphan, and have other attributes such as "drop caps" to provide special formatting. The list is extremely extensive; see the references (in particular the actual standard) for details.

Spreadsheet formulas issue

OpenDocument is fully capable of describing mathematical formulas that are displayed on the screen. It is also fully capable of exchanging spreadsheet data, formats, pivot tables, and other information typically included in a spreadsheet. OpenDocument can exchange spreadsheet formulas (formulas that are recalculated in the spreadsheet); formulas are exchanged as values of the attribute table:formula.

However, some believe that the allowed syntax of table:formula is not defined in sufficient detail. The OpenDocument version 1.0 specification defines spreadsheet formulas using a set of simple examples which show, for example, how to specify ranges and the SUM() function. Some critics argue that a more detailed, precise specification for spreadsheet functions, including syntax and semantics, should be created to augment these examples. The OpenDocument committee argued that this was outside their scope, since the syntax of such formulas is not in XML. Others have argued that, while the specification is less specific than one might like, the intent is fairly clear (especially since formulas tend to follow decades-long traditions), and also because the vast majority of spreadsheets only use a small set of functions (such as SUM) which are universally supported by all spreadsheet implementations anyway. In practice, many developers look to OpenOffice.org as a "canonical implementation"; since its code is public for anyone to review, and its XML output can be trivially inspected, this can resolve many questions. There is draft work proposing a more detailed specification for spreadsheet formulas (e.g. OpenFormula). Such work is expected to simply clarify in more detail what is acceptable in a spreadsheet formula; no one expects such work to invalidate any of the current OpenDocument standard. For more information, see the OpenFormula article.

Note that this is not a disadvantage compared to Microsoft Open XML, which also does not specify formulas in detail. Nor is it a disadvantage compared to Microsoft Excel binary format, whose format and semantics have never been completely defined this way in public.

Format internals

An OpenDocument file can be either a simple XML file which uses <office:document> as the root element or a JAR compressed archive containing a number of files and directories. Because the simple XML format does not directly support embedding binary content or thumbnails, the JAR-based format is used almost exclusively. Applications that use openDocument might not support saving and loading of the plain XML file, but all should support the JAR-based format. This simple compression mechanism means that OpenDocument files are normally significantly smaller than equivalent Microsoft ".doc" or ".ppt" files. This smaller size is important for organizations who store a vast number of documents for long periods of time, and to organizations those who must exchange documents over low bandwidth connections. Once uncompressed, most data is contained in simple text-based XML files, so the data contents (once uncompressed) have the typical ease of modification and processing of XML files. Directories can be included to store non-SVG images, non-SMIL animations, and other files that are used by the document but cannot be expressed directly in the XML.

The zipped set of files and directories includes the following:

  • XML files
    • content.xml
    • meta.xml
    • settings.xml
    • styles.xml
  • Other files
    • mimetype
  • Directories
    • META-INF/
    • Thumbnails/

The OpenDocument format provides a strong separation between content, layout and metadata. The most notable components of the format are described in the subsections below. The files in XML format are further defined using the RELAX NG language for defining XML schemas. RELAX NG is itself defined by an OASIS specification, as well as by part two of the international standard ISO/IEC 19757: Document Schema Definition Languages (DSDL).

content.xml

content.xml is the most important file. It carries the actual content of the document (except for binary data, like images). The base format is inspired by HTML, and though far more complex, it should be reasonably legible to humans:

<text:h text:style-name="Heading_2">This is a title</text:h>
<text:p text:style-name="Text_body"/>
<text:p text:style-name="Text_body">
   This is a paragraph. The formatting information is
   in the Text_body style. The empty text:p tag above
   is a blank paragraph (an empty line).
</text:p>

styles.xml

styles.xml contains style information. OpenDocument makes heavy use of styles for formatting and layout. Most of the style information is here (though some is in content.xml). Styles types include:

  • Paragraph styles.
  • Page Styles.
  • Character Styles.
  • Frame Styles.
  • List styles.

The OpenDocument format is somewhat unusual in that you cannot avoid using styles for formatting. Even "manual" formatting is implemented through styles (the application dynamically makes new styles as needed).

meta.xml

meta.xml contains the file metadata. For example, Author, "Last modified by", date of last modification, etc. The contents look somewhat like this:

<meta:creation-date>2003-09-10T15:31:11</meta:creation-date>
<dc:creator>Daniel Carrera</dc:creator>
<dc:date>2005-06-29T22:02:06</dc:date>
<dc:language>es-ES</dc:language>
<meta:document-statistic
      meta:table-count="6" meta:object-count="0"
      meta:page-count="59" meta:paragraph-count="676"
      meta:image-count="2" meta:word-count="16701"
      meta:character-count="98757"/>

The names of the <dc:...> tags are taken from the Dublin Core XML standard.

settings.xml

settings.xml includes settings such as the zoom factor or the cursor position. These are properties that are not content or layout.


mimetype (file)

mimetype is just a one-line file with the mimetype of the document. One implication of this is that the file extension is actually immaterial to the format. The file extension is only there for the benefit of the user.

Reuse of existing formats

OpenDocument is designed to reuse existing open XML standards whenever they are available, and it creates new tags only where no existing standard can provide the needed functionality. So, OpenDocument uses DublinCore for metadata, MathML for displayed formulas, SVG for vector graphics, SMIL for multimedia, etc.

References

These references were used to justify the article text above, but not all of them are specifically cited. Please help us modify the text above to identify which statements are supported by which references.

General:

Official Information from the Commonwealth of Massachusetts:

Formal comments to Massachusetts on their decision for Open Formats and posted by Massachusetts (alphabetical order):


Other commentary specifically about Massachusetts' decision to use OpenDocument, besides those posted by Massachusetts (note that the length of this list justifies the claim in the main text that many people and organizations discussed the Massachusetts decision):

External links

Organizations
Deployment in Europe
Debate
  • Forum Debate a lively and informative ongoing debate over whether or not a word processor application should adopt the OpenDocument format


See also

cs:OpenDocument de:OpenDocument es:OpenDocument fr:OpenDocument it:OpenDocument ja:OpenDocument pl:OpenDocument ru:OpenDocument fi:OpenDocument th:OpenDocument