一道简单的单表代换加密

题目描述
已知明文节选自《聊斋志异》, 转为拼音后进行单表加密
, 现要根据密文进行解密.
密文如下:
LRZRLKOBNRLVIROZNKOHUKOBRSNRVUIRSNRXVRUVOXVRXYUYSNVOHGVRLROLROLDYIRBRLYOBNROFEKLRBNRURBNRJYKOSNKOHCKOCYBNKOMKOBNRMKOHSNRPROLRURCKOHBNYXYJRKOXVRUVONYBNVLKDPRBYDJYKOSNKOHMVRIRKDTKOHJROHJYVCYGKRKOIRKJRPRLRLROHZNRLRLRNKRLDYQDYBNRIRKJRWROHWROHMKOUKOJYVEKRBNRSNYGKRMVONVSNVOXVRUVOIRKDLYVPRVLKOSNRBRUYLYJYOHYIRKOHBNRLRJRYURZNYRPROHAKOWYDGYLRBNRQDOHPRKOBKRIRKMYCYLDYEYIROHYUVOBNVTKOHIRSYRLYPROZNYUKOBNVOIRJRKOPROKRGVRBNRVUGYBNRMVRUVOXVREYGRSNROFBYDPRZVOFJRVMYEYGYWROHOFLYVJYOSYDLRGYOVOHWVOHEKBNVWYLREYVUSNRHYKOZNYOPRYGKOHSNKOHEYUYJYOBNVJRUVOUYDGYWROHPRVNKOHPYLRTKOHBKOZDOHBNRSNKDPROHMKOHPRJRKDLROSDOHCYPRLYQVSYDOFGYBNRSYDBKRSNVOBNRSKOHSNRBNYVUEKDBNRSNYMYLROHJRNYLROFSYDLROZNYPYNKOSNYIRJRKOBNRBNRBNRJRYZNYHYDEVBNRNYBNRGYEDOHCYLRKRBNYOFOKRIRKLYVJYOBKRGYWROHEKOHIRKOHLDOHJYVLROSNRBNRPRAROHZNYAYBNRJYURLYKDIRVUTKOHLRSNYGYSNYPYOFGYBKRBVPRVJYKOTRYTKOQDOHMVROFJYVLROPYNKOSNYERGKJYKOBKNYOWKSYDLRXRBNRLRUREYNKOOFBNRJROHGYBNRJYVNYEYBNRJRLKOJYKOVUOFLRMKOHLREKJYXROHSDYBNYJYKOXRKDGYQVEVJRUVOHLYNKOSNYGKJYKOBNDOHEVBNRLVSNYGYSNYKOHLROBKRGKRBNYSNRGYCYEYOFOKRIRKLYBNRLRLYVSKOURGYHDOHEKOHCYPYBNRSKOURNYLRJYLROHOFVUBROFOKRIRSNDYLRIRKOSYDIRKOMYURHDOHLRPYTKOHSNDYLROHXYBNYMYIRKWKJRJRYBNRSYRSNDYLROHJRVGYJYVHYMYOFOKRURLYLROGDTKOHSYRTVVUMKOHEYOFLDYBDOHBNRZNYXVOSNRJRVQVLDYZRWRWKOHBNRXROHGKDBNYOFLYVBRQVLRZNYVUSNRLR |
解密过程
单表加密, 最常见的做法就是做词频分析.
如果是英文文本的话, 用quipqiup很快就能得解.
但这里是中文拼音, 似乎没有现成的工具, 那就老老实实做词频分析.
统计聊斋全书的拼音词频
首先要知道中文拼音的一般频率分布, 这里我直接网上找《聊斋志异》的txt文本.
去除前言什么的, 只留下小说部分, 保存txt为sample.txt
, 运行如下代码 (GPT-4o)
import re |
得到如下排好序的频数表:
声母频数统计: |
统计密文的词频
将密文保存到ciphertext.txt
, 运行如下代码 (GPT-4o):
def getNWordList(n): |
得到如下结果 (只截取部分):
1元字母 出现次数 频率 |
2元字母 出现次数 频率 |
3元字母 出现次数 频率 |
4元字母 出现次数 频率 |
5元字母 出现次数 频率 |
6元字母 出现次数 频率 |
很容易推断出R = i
, KOH = ang
, BN = zh / sh
等.
但随即我发现, 根本没有必要根据词频猜测破解密码表.
pinyin.txt
, 也就是《聊斋志异》全文的拼音,
只有112w
个字符.
因此, 我们可以直接O(N)
暴力匹配密文片段!
对于这样的密文片段:
BNROFEKLRBNRURBNRJYKOSNKOHCKOCYBNKOMKOBNR
它的pattern
也就是
ABC {6X} ABC {2X} ABC {21X} ABC |
|
在原文中有唯一匹配, 对应的原文拼音为:
ZHINVDAYIZHIRIZHIJUANSHANGFANFUZHANWANZHI |
在pinyin.txt
中搜索该串字符, 很容易找出密文对应的原文拼音为:
YICIYANZHIYEXINCHANGRANZISHIERXISHIMEIRENMEIMURUSHENGBEIYINYINYOUXIZIYUNZHINVDAYIZHIRIZHIJUANSHANGFANFUZHANWANZHIWANGSHIQINYIRIFANGZHUMUJIANMEIRENHUZHEYAOQIZUOJUANSHANGWEIXIAOLANGJINGJUEFUBAIANXIAJIQIYIYINGCHIYIYIHAIYOUKOUZHIXIAJITINGTINGWANRANJUEDAIZHISHUBAIWENHESHENMEIRENXIAOYUEQIEYANSHIZIRUYUJUNGUXIANGZHIYIJIURICHUIQINGPANTUOBUYIZHIKONGQIANZAIXIAWUFUYOUDUXINGURENZHELANGXISUIYUQINCHURANZHENXIJIANQINAIBEIZHIERBUZHIWEIRENMEIDUBISHINVZUOQICENVJIEWUDUBUTINGNVYUEJUNSUOYIBUNENGTENGDAZHETUYIDUERSHIGUANCHUNQIUBANGSHANGDURUJUNZHEJIRENRUOBUTINGQIEXINGQUYILANGZANCONGZHISHAOQINGWANGQIJIAOYINSONGFUQIYUKESUONVBUZHISUOZAISHENZHISANGSHIZHUERDAOZHISHUWUYINGJIHUYINVSUOYINCHUQUHANSHUXIJIANZHIZHIZHIJIUCHUGUODEZHIHUZHIBUDONGFUYIAIZHUNVNAIXIAYUEJUNZAIBUTINGDANGXIANGYONGJUEYINSHIZHIQIPINGCHUPUZHIJURIYUAOXIERLANGYISHUBUSHUQUNVBUZAIZEQIEJUANLIULANKONGWEINVJUEYINQUHANSHUDIBAJUANZAHUNTASUOYIMIZHIYIRIDUHANNVZHIJINGBUZHIJUEHUDUZHIJIYANJUANERNVYIWANGYIDAJUMINGSOUZHUJUANMIAOBUKEDEJIRENGYUHANSHUBAJUANZHONGDEZHIYESHUBUSHUANGYINZAIBAIZHUSHIBUFUDUNVNAIXIAYUZHIYIYUESANRIBUGONGDANGFUQUZHISANRIHUYIJUYINGNVERZINVNAIXISHOUYIXIANSUOXIANWURIGONGYIQULANGSHOUYINGMUZHUWUXIATAJIJIUZHISUISHOUYINGJIEBUJUEGUWUNVNAIRIYUYINBOLANGSUILEERWANGDUNVYOUZONGZHICHUMENSHIJIEKEYOUCITITANGZHIMINGBAOZHUNVYUEZIKEYICHUERSHIYI |
根据拼音获取原文
水平过低, 一眼看不出原文.
只能写代码跑了…
废物GPT-4o写不出能跑的代码, 还得我自己写.
import re |
Result: 织女大异之日置卷上反复瞻玩至 |
很容易就定位到原文了:
...以此验之耶?”心怅然自失。而细视美人,眉目如生;背隐隐有细字云:“织女。”大异之。日置卷上,反复瞻玩,至忘食寝。一日方注目间,美人忽折腰起,坐卷上微笑。郎惊绝,伏拜案下。既起,已盈尺矣。益骇,又叩之。下几亭亭,宛然绝代之姝。拜问:“何神?”美人笑曰:“妾颜氏,字如玉,君固相知已久。日垂青盼,脱不一至,恐千载下无复有笃信古人者。”郎喜,遂与寝处。然枕席间亲爱倍至,而不知为人。 |
摘自《聊斋志异》卷十一·书痴
求密码表
with open("en.txt", "r") as f: |
Error at position 545: The mapping is N -> H but X is expected. |
有2个字符没对应上, 应该是多音字导致的, 无伤大雅.
密码表如下:
映射前 | 映射后 |
---|---|
A | P |
B | Z |
C | F |
D | O |
E | D |
F | V |
G | B |
H | G |
I | X |
J | J |
K | A |
L | Y |
M | W |
N | H |
O | N |
P | Q |
Q | K |
R | I |
S | S |
T | L |
U | R |
V | E |
W | T |
X | M |
Y | U |
Z | C |
找出多音字
学长希望找到这个多音字, 随便写了份代码:
import re |
输出听 妾 行
, 可知行
是多音字. lazy_pinyin
提供的拼音为XING
而密文里的拼音是HANG
. 显然正确发音应为XING
.
- 标题: 一道简单的单表代换加密
- 作者: Coast23
- 创建于 : 2024-12-19 19:16:29
- 更新于 : 2025-01-21 14:48:29
- 链接: https://coast23.github.io/2024/12/19/一道简单的单表代换加密/
- 版权声明: 本文章采用 CC BY-NC-SA 4.0 进行许可。
评论