2013-05-14

Adding further man pages to the HTML renderings on man7.org

As noted in my last post, I've expanded the set of HTML man page renderings at man7.org/linux/man-pages/ to include some projects other than man-pages. Currently, man pages from 37 projects are now rendered, with about 1750 pages in all. The projects that I have so far included have a bias that matches my interests: man-pages, projects related to low-level C and system programming (e.g., the ACL and extended attribute libraries), toolchain projects (e.g., gcc, gdb, Git, coreutils, binutils, util-linux), and other relevant tools (kmod, strace, ltrace, procps, expect) and tools relevant to manual pages (e.g., groff, man-db).

Although there are some other sites around that have renderings of a much larger set of pages, I am (so far) resisting the temptation to take a kitchen-sink approach on man7.org. Nevertheless, I'm open to adding further projects to the set, if they seem relevant. If you think there is a project that should be added to the rendered set, drop a note to man-pages@man7.org with the following information:
  • Name of the project.
  • Project description.
  • URL for the web site the project.
  • (If you know it:) URL of a web page that provides information on how to report bugs in the man pages (or email list address).
  • Source URL for the man pages of the project. The project should provide pages by one of the following means one of the following:
    • Ideally: the URL of a Git repository for the project.
    • The URL of a Bazaar or Mercurial repository.
    • An HTTP or FTP address for the location that is updated with the latest release tarball on each release of the project.
    • If nothing else: the URL of a CVS or Subversion repository. Note: if there is a Git read-only mirror of the CVS or Subversion repository, that is preferred.
  • Instructions on how to build the man pages for the project. These instructions should be minimal, in the sense that they require the minimum CPU effort to build just the man pages. In other words, if possible, I'd like to avoid building the entire project just to obtain the manual pages.
  • Approximate number of manual pages in the project (actual pages, excluding links).

More man pages now rendered on man7.org

There are several web sites that provide renderings of a comprehensive set of Linux man pages. However, those sites typically have a number of faults.

For example, the renderings are either for out-of-date versions of the man pages (on one site, the rendered pages are close to ten years old) or the pages provide no timestamp information indicating the age of the pages. In many other cases, sites provide no information about the origin of the rendered pages, give no information about the extraction date or the project version from which the manual pages were taken, and provide no information on how to report manual page bugs to the corresponding upstream projects. Providing that information was the main goal when I started adding the COLOPHON sections to the man-pages pages in December 2007 (man-pages-2.69).

Furthermore, the renderings on many sites are either unattractive (obviously, a subjective judgement) or simply broken (for example, it looks like none of the groff tables in the pages at http://linux.die.net/man/ are rendered, so that, for example, the table of systems calls in the syscalls(2) man page does not appear, making the page essentially useless). Finally, most of the sites provide no obvious information on how to report bugs in the man page renderings.

One of the few sites that does a reasonable job on the above criteria is the http://manpages.courier-mta.org/ site maintained by Sam Varshavchik. It is probably no coincidence that Sam has also provided numerous bug reports on formatting issues in the man-pages page set over the last several years. However, Sam provides a rather less comprehensive set of pages than on the other sites, taking pages from just seven projects.

So, it seems there's room out there for a web site that does a better job on many of the above criteria by providing a comprehensive set of page renderings that: (a) are up to date; (b) carefully document the origin of the rendered pages; (c) describe where to report bugs in the man pages; and (d) explain where to report bugs in the renderings.

With those goals in mind, I've extended the set of pages that are rendered at http://man7.org/linux/man-pages/ to include pages in addition to those provided by the man-pages project. Each page includes a COLOPHON section that shows the name and URL of the project from which the page comes, the URL of the tarball or source code repository from which the page was extracted, the date when the page was extracted, and (where I know it) information on where to report bugs in the manual page. So far, I've added about 35 projects to the set, with the pages for each project being taken either from the latest release tarball or directly from the project's source code repository. This raises the number of rendered pages at man7.org from the around 950 pages in man-pages to around 1750 with the addition of the other projects. (More projects may be added in the future; I'll say more on that later.) A full list of the projects and pages that are rendered can be found in the project page directory.

Sometimes different projects provide the same manual page. On all sites that I know of, where such conflicts occur, just one version of the page is rendered. I've dealt with such conflicts in a different way. One of the versions is treated as canonical (here, I've currently followed the lead of Fedora by choosing the page that it treats as canonical, though I may adjust that approach in the future), but I provide renderings of the other versions at different URLs, with cross page links between the various versions. Thus, for example, three of the projects that I handle provide versions of the kill(1) man page, and the three version are rendered at the following URLs:
The "@" syntax in the URLs is used to distinguish the various versions of the page. For completeness, the canonical version of each page also has an equivalent "@" link, so that the util-linux version of the page is also available at http://man7.org/linux/man-pages/man1/kill.1@util-linux.html. The intention is that the "@" URL scheme should be stable so that manual pages from specific projects can be referenced. If anyone sees any problems with the URL scheme http://.../page-name.section@project.html, please let me know (soon): I'll adjust the scheme if necessary.