用reStructuredText编写技术图书

前言

因为翻译过一些Flask相关的文档,写了不少reStructuredText格式的文本,对Sphinx这个文档生成工具也是爱不释手。所以在翻译技术书籍时,也就想继续运用这套工具链。不过还是遇到了一些问题,尤其是在转换成.docx格式的Word文档时候。这里稍微做一下笔记。

reStructuredText

reSturcturedText最早可以追溯到Zope开发的StructuredText,并基于StucturedText改进而来。reSturcturedText的缩写是reST,意为revised, reworked, and reinterpreted StructuredText。

它与Markdown类似,都是所见即所得的纯文本标记语法,功能比Markdown更强大,但也稍微复杂了一些。

对于Python界,它已经是文档标记语法的事实标准,Python本身和许多第三方库的文档都是用reStructuredText编写的。早在2002年的PEP-287中,就已经建议reSturcturedText作为Python中docstring的标记语法。

Sphinx

Sphinx是一个由pocoo团队发起的、用Python编写的文档生成工具,经过多年演进,现在已经可以生成Python、C、C++、JavaScript等多种语言的文档。在Python中,reStructuredText的解析由docutils库完成,Sphinx也正是基于docutils构建的。

编写书籍其实用不到Sphinx的高级功能(文档测试、交叉引用等)。我对Sphinx虽然有先入为主的好感,但这里其实只是用Sphinx生成一份漂亮的HTML电子书,用于寻求友人帮助审阅。而且,在Sphinx生成HTML时,Sphinx也会提示reStructuredText中的语法错误。

下面开始安装Sphinx,然后用Sphinx自带的脚本工具sphinx-quickstart ,根据提示配置基本的目录结构和构建脚本。下面是个人建议的起始配置。

$ pip install sphinx
$ mkdir mybook && cd mybook
$ sphinx-quickstart
Welcome to the Sphinx 1.8.1 quickstart utility.

Please enter values for the following settings (just press Enter to
accept a default value, if one is given in brackets).

Selected root path: .

You have two options for placing the build directory for Sphinx output.
Either, you use a directory "_build" within the root path, or you separate
"source" and "build" directories within the root path.
> Separate source and build directories (y/n) [n]:    

Inside the root directory, two more directories will be created; "_templates"
for custom HTML templates and "_static" for custom stylesheets and other static
files. You can enter another prefix (such as ".") to replace the underscore.
> Name prefix for templates and static dir [_]:  

The project name will occur in several places in the built documentation.
> Project name: My Book
> Author name(s): Marisa
> Project release []:  

If the documents are to be written in a language other than English,
you can select a language here by its language code. Sphinx will then
translate text that it generates into that language.

For a list of supported codes, see
http://sphinx-doc.org/config.html#confval-language.
> Project language [en]: zh_CN

The file name suffix for source files. Commonly, this is either ".txt"
or ".rst".  Only files with this suffix are considered documents.
> Source file suffix [.rst]:  

One document is special in that it is considered the top node of the
"contents tree", that is, it is the root of the hierarchical structure
of the documents. Normally, this is "index", but if your "index"
document is a custom template, you can also set this to another filename.
> Name of your master document (without suffix) [index]:  
Indicate which of the following Sphinx extensions should be enabled:
> autodoc: automatically insert docstrings from modules (y/n) [n]: n
> doctest: automatically test code snippets in doctest blocks (y/n) [n]: n
> intersphinx: link between Sphinx documentation of different projects (y/n) [n]: n
> todo: write "todo" entries that can be shown or hidden on build (y/n) [n]: y   
> coverage: checks for documentation coverage (y/n) [n]: n
> imgmath: include math, rendered as PNG or SVG images (y/n) [n]: y
> mathjax: include math, rendered in the browser by MathJax (y/n) [n]: n
> ifconfig: conditional inclusion of content based on config values (y/n) [n]: n
> viewcode: include links to the source code of documented Python objects (y/n) [n]: n
> githubpages: create .nojekyll file to publish the document on GitHub pages (y/n) [n]: n

A Makefile and a Windows command file can be generated for you so that you
only have to run e.g. `make html' instead of invoking sphinx-build
directly.
> Create Makefile? (y/n) [y]:  
> Create Windows command file? (y/n) [y]:  

Creating file ./conf.py.
Creating file ./index.rst.
Creating file ./Makefile.
Creating file ./make.bat.

Finished: An initial directory structure has been created.

You should now populate your master file ./index.rst and create other documentation
source files. Use the Makefile to build the docs, like so:
   make builder
where "builder" is one of the supported builders, e.g. html, latex or linkcheck.
</pre>
启用了todo扩展,让Sphinx支持<span class="lang:default decode:true crayon-inline">.. todo::</span> 指令的解析,可以用来标记待办或未完事宜。

启用了imgmath扩展,Sphinx会调用系统环境下的latex把数学公式渲染成图片插入到构建好的文档中,这里有一些额外的依赖。我是在WSL中的Ubuntu中操作的,依赖安装方法如下。
<pre class="lang:sh decode:true ">$ sudo apt-get install pdfimages poppler-utils tex-live texstudio texlive texlive-latex-extra dvipng

现在初始化工作就完成了,目录结构应该是这样。

├── _build
├── conf.py
├── index.rst
├── make.bat
├── Makefile
├── _static
└── _templates

pandoc

pandoc是一个haskell编写的万能文档转换工具,可以在Markdown、reStructuredText、textile、HTML、DocBook、LaTeX、Word等等多种格式中互相转换。这里用pandoc把reStructuredText转换成.docx格式的Word文档。基本的用法是这样,把chpater1.rst转换成chapter1.docx。

$ pandoc -o chapter1.docx -f rst+east_asian_line_breaks -s chapter1.rst

默认情况下,pandoc会把换行转换成空格,但这是为西方语言设置的默认值。对于中文,就需要开启east_asian_line_breaks ,去除换行引入的空格。

文件结构

Sphinx默认会以index.rst为入口,依次遍历读取文档,构建文档树,形成最终文档。但pandoc只是转换工具,没有构建文档树的能力,虽然可以批量转换多个文件,但其实转换每次只处理单个文件。为兼顾二者特性,我这里做了一个折衷。创建一个contents.inc文件,在其中填写TOC。TOC中的各项是各章的.rst文件名。然后在index.rst中包含它。

toc.rst

.. toctree::
   :maxdepth: 3

   chapter1
   chapter2
   chapter3

index.rst

.. Mybook documentation master file, created by
   sphinx-quickstart on Thu Oct 19 22:17:03 2018.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Mybook
=====================================

Table of contents
---------------------

.. include:: contents.inc

接下来读取contents.inc中的各章,并调用pandoc。我是用一个简单的Python文件完成这一工作的,并直接保存在mybook目录下,命名为build_docx.py。

import os
import pathlib

build_path = '_build/docx'

pathlib.Path(build_path).mkdir(parents=True, exist_ok=True)

idx_file = open('contents.inc', 'r')
within_toc_block = False
build_files = []
command = 'pandoc -o {0} -f rst+east_asian_line_breaks -s {1}'

for line in idx_file:
    if within_toc_block == False:
        if line.startswith('.. toctree::'):
            within_toc_block = True
    else:
        if line.startswith('   :'):
            continue
        elif not line.strip(' '):
            continue
        elif line.startswith('  ') and line.strip():
            build_files.append(line.strip())

file_args = []

for i, f in enumerate(build_files):
    file_args.append(f + '.rst')
    output_file = os.path.join(build_path, '{0}-{1}.docx'.format(i, f))
    os.system(command.format(output_file, f + '.rst'))
    print('{0} converted successfully'.format(f))

os.system(command.format(
    os.path.join(build_path, 'all-in-one.docx'), ' '.join(file_args)))
print('all-in-one converted successfully')

然后为Makefile添加docx入口,之后就可以用make docx直接生成Word文档。

docx: Makefile
    @python build_docx.py

标记语法

这里要写的reSturcturedText其实是Sphinx扩展的方言版本,一些常用的语法如下所示。

标题与章节
#################
Book Title
#################

*******************
Chapter 1
*******************

1.2 Section
=====================

1.2.3 Subsection
^^^^^^^^^^^^^^^^^^^^^^

1.2.3.4 Paragraph
""""""""""""""""""""""

粗体
**Bold**

斜体
*Italic*

内联代码
``inline code``

链接
``Chamber of Kagami <http:://kagami.jinkan.org>``_
``Chamber of Kagami``_

.. _Chamber of Kagami: https://domain.invalid/

脚注

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Pellentesque dignissim libero quis ipsum sagittis, vel dapibus justo dignissim [1]_.
Quisque scelerisque dictum sapien sit amet blandit.
Maecenas scelerisque feugiat urna in egestas. 

.. [1] this is a footnote

代码块
.. code-block:: python

    import antigravity

提示
.. tip::

    lorem ipsum

注解
.. note::

    lorem ipsum

另外,中文和内联语法如果没有空格之类的字符隔开,则会出现语法错误。如果直接用空格,那么最终文档中也会有额外的空格。根据reST文档规范,可以用反斜线转义空格,具体处理如下。

天地有\ **大美** \而不言,四时有明法而不议,万物\ [1]_\ 有成理而不说。圣人者,原天地之美而
达万物之理。是故至人无为,大圣不作,观于天地之谓也。

.. [1] 这是一个脚注

更多语法可以参看Sphinx的语法介绍部分或者docutills团队维护的reST语法介绍

标签
关于作者

发表评论

电子邮件地址不会被公开。 必填项已用*标注