用半小时给Hexo加个短链

用Hexo博客有好几年了，Markdown的格式、简洁的风格、静态的特性、众多的插件，一直让我用起来很舒服。

不过，有个问题困扰了我很长时间，那就是，Hexo生成的博客的永久链接比较长，在社交平台上面分享不太方便。

有不少第三方的短链服务，有的还可以帮助追踪链接的点击次数等。比如：https://www.shorturl.at/ https://app.bitly.com/bbt2/ 等。

使用这些短链服务倒是比较容易，但也会碰到不少问题，比如：

生成的url的域名改为了短链服务的域名，不利于用户识别
短链服务要是关闭了，之前发出去的链接就没法用了。（由于短链服务难以盈利，事实上很多类似业务都关闭了，比如Google的短链服务goo.gl就已经下线了）
短链服务难以支持全球多个区域的访问

于是，这就催生了我想自己实现一个短链的想法。

短链服务本身并不复杂，本质上只需要对目标链接生成一个足够短的稳定hash值并保存此值与原链接的映射即可。

短链生成

我配置的博客永久链接格式为:year/:month/:day/:title/。考虑到一天的博客数量一般不会超过一篇，所以，基于日期进行编码即可得到一个不错的hash值，剩余的部分可以选择编码为1~2个防碰撞的字符。

基于这些分析，我快速获得了一个生成稳定hash的方法，对应的Python代码如下：

def shorten(post_name: str):
    # 获取博客日期信息，并取与最早的博客时间（`2010-01-01`）的天数差
    days = (datetime.strptime(post_name[:10], '%Y-%m-%d') - datetime.strptime('2010-01-01', '%Y-%m-%d')).days
    # 剩余部分，用md5编码，并取前两位
    hash_suffix = hashlib.md5(post_name[10:].encode('utf8')).hexdigest()[:2]
    # 将天数和前两位的hash值连接起来，取Base64的值即得到短链
    return base64.b64encode(f'{days}{hash_suffix}'.encode('utf8')).decode('utf8').strip('=')

运行测试可以发现，原来的博客2022-07-28-modelling-examples被编码为了NDU5MTYx。还不错，只有8位了。

集成到Hexo

如何集成到Hexo呢？

我的基本想法是：

用户点击短链跳转到博客首页
博客首页根据短链找到博客永久地址，然后跳转到永久地址

于是最终短链URL可以设计为：https://brightliao.com/#/POST_HASH。即，将博客的hash值放到URL的Hashbang中。这就可以让浏览器直接打开首页了。

在首页中，咱们还需要嵌入一段JavaScript用于读取此hash值，并跳转到具体的博客中。

实现

前面分析了思路，以下是具体实现。

本着经济适用的原则，我想以最快的速度来实现，尽量把代码编写和调试时间控制在半小时内（实际花了约一小时o(╥﹏╥)o）。

如果你也是用的Hexo，可以直接拿走，如果不是，仅供参考。

为了便于大家理解，下图是整个短链的生成及运行流程：

涉及代码有两段：

JavaScript代码：判断短链并跳转
Python代码：读取博客，生成短链到真实地址的映射表，更新到前端可访问的文件中，并更新文件版本，防止缓存

JavaScript代码

我用了next主题，Hexo在生成静态文件时，会在每个文件中包含文件themes/next/source/js/next-boot.js，这个文件就是我们要保存映射表及对应的JavaScript代码的地方！

在文件next-boot.js的最后添加如下代码：

//...
var links = {}
try {
  (function(){
    if (window.location.hash) {
      var link = window.location.hash.substring(2);
      if (links[link]) {
        window.location.href=links[link];
      }
    }
  })()
} catch (e) {}

此处的links变量是保存映射表的地方，在有新博客时，Python代码会更新此处的代码。

代码将读取当前的URL，并查找链接表，如果找到则跳转，找不到则不做任何处理。

Python代码

import os
from datetime import datetime
import base64
import json
import re
import hashlib


curdir = os.path.dirname(os.path.abspath(__file__))


def shorten(post_name: str):
    days = (datetime.strptime(post_name[:10], '%Y-%m-%d') - datetime.strptime('2010-01-01', '%Y-%m-%d')).days
    hash_suffix = hashlib.md5(post_name[10:].encode('utf8')).hexdigest()[:2]
    return base64.b64encode(f'{days}{hash_suffix}'.encode('utf8')).decode('utf8').strip('=')


def create_links():
    links = {}
    postdir = os.path.join(curdir, 'source/_posts')
    for f in os.listdir(postdir):
        f = os.path.join(postdir, f)
        if os.path.isdir(f):
            for p in os.listdir(f):
                if (p.endswith('.md') or p.endswith('markdown') ) and not os.path.isdir(os.path.join(f, p)) and re.match(r'^[\d]{4}-[\d]{2}-[\d]{2}.*', p):
                    long_link = f'/{p[0:4]}/{p[5:7]}/{p[8:10]}/{p[11:-3] if p.endswith(".md") else p[11:-9]}/'
                    short_link = shorten(p)
                    if short_link in links:
                        raise Exception(f'found short link {short_link} -> {long_link} conflict with existing {links[short_link]}')
                    links[short_link] = long_link
    items = sorted(list(links.items()), key=lambda item: item[1])
    for item in items:
        print(f'{item[1]}: https://brightliao.com/#/{item[0]}')
    print(f'found {len(links)} links.')
    return links


def update_bootjs(links):
    bootjs = os.path.join(curdir, 'themes/next/source/js/next-boot.js')
    if not os.path.exists(bootjs):
        raise Exception('bootjs file not found at: ' + bootjs)
    with open(bootjs, 'r') as f:
        content = f.read()
        rex = r'\nvar links = {[^\n]*}\s*\n'
        if not re.search(rex, content):
            raise Exception('rex not found in bootjs')
        content = re.sub(rex, f'\nvar links = {json.dumps(links)}\n', content, count=1)
    with open(bootjs, 'w') as f:
        f.write(content)
        print('updated file: ' + bootjs)


def update_indexjs():
    indexjs = os.path.join(curdir, 'themes/next/layout/_scripts/index.swig')
    if not os.path.exists(indexjs):
        raise Exception('indexjs file not found at: ' + indexjs)
    with open(indexjs, 'r') as f:
        content = f.read()
        rex = r"'next-boot\.js\?([\d]+)'"
        if not re.search(rex, content):
            raise Exception('rex not found in indexjs')
        nextver = int(re.match(r".*'next-boot\.js\?([\d]+)'.*", content, re.DOTALL).groups()[0]) + 1
    with open(indexjs, 'w') as f:
        f.write(re.sub(rex, f"'next-boot.js?{nextver}'", content, count=1))
        print('updated file: ' + indexjs)


if __name__ == '__main__':
    links = create_links()
    update_bootjs(links)
    update_indexjs()

上述update_bootjs函数的功能即为更新next-boot.js文件中的映射表。使用正则表达式\nvar links = {[^\n]*}\s*\n可以匹配到保存映射表的代码行，更新时将其替换即可。替换时，在该位置写入JSON字符串，供JavaScript代码读取。

为防止缓存，可以在更新映射表之后，重新生成一下next-boot.js文件的版本。update_indexjs函数即可辅助完成此功能。

加载next-boot.js的文件位于themes/next/layout/_scripts/index.swig，同样，用正则表达式提取现有版本，并替换为递增之后版本即可。
注意在第一次运行此脚本之前，需要手动修改一下index.swig文件，将next-boot.js替换为next-boot.js?1。

进行上述代码修改之后，每次增加博客时，只需要运行一下上述Python文件即可添加新博客的短链，同时原有短链依然有效。

运行此Python文件会打印出所有的博客对应的短链地址，复制分享即可！

效果

最后看一下最终效果，试试看点击链接https://brightliao.com/#/NDUyNjc2是否能跳转到博客https://brightliao.com/2022/05/24/5-properties-of-good-code-cupid/？