记录: 一次 gitlab 备份失败的处理
版权声明:署名-非商业性使用-相同方式共享
最近在例行备份本地搭建的gitlab时,发现备份文件足足小了两个多G
1683790997_2023_05_11_13.7.4-ee_gitlab_backup.tar 3.2G <- 正确的备份
1686805131_2023_06_15_13.7.4-ee_gitlab_backup.tar 1.1G <- 出错的备份
经过检查备份输出,发现存在如下错误:
# gitlab-rake gitlab:backup:create && cd /var/opt/gitlab/backups/ && mv $(ls -t | head -n1) /backups
2023-06-15 04:52:57 +0000 -- Dumping database ...
Dumping PostgreSQL database gitlabhq_production ... [DONE]
2023-06-15 04:53:03 +0000 -- done
2023-06-15 04:53:03 +0000 -- Dumping repositories ...
* gitlab-instance-5044e1ff/monitoring (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b) ...
* gitlab-instance-5044e1ff/monitoring (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b) ... [SKIPPED]
* gitlab-instance-5044e1ff/monitoring.wiki (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.wiki) ...
* gitlab-instance-5044e1ff/monitoring.wiki (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.wiki) ... [SKIPPED]
* gitlab-instance-5044e1ff/monitoring.design (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.design) ...
* gitlab-instance-5044e1ff/monitoring.design (@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.design) ... [SKIPPED]
* m*****per/pc/components/uiframe (@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35) ...
* m*****per/pc/components/uiframe (@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35) ... [DONE]
* m*****per/pc/components/uiframe.wiki (@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.wiki) ...
* m*****per/pc/components/uiframe.wiki (@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.wiki) ... [SKIPPED]
* m*****per/pc/components/uiframe.design (@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.design) ...
* m*****per/pc/components/uiframe.design (@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.design) ... [SKIPPED]
...
* m*****per/pc/public_binaries (@hashed/81/17/811786ad1ae74adfdd20dd0372abaaebc6246e343aebd01da0bfc4c02bf0106c) ...
[Failed] backing up m*****per/pc/public_binaries (@hashed/81/17/811786ad1ae74adfdd20dd0372abaaebc6246e343aebd01da0bfc4c02bf0106c)
Error 13:CreateBundle: cmd wait failed: exit status 1.
...
* m*****per/pc/components/libicloud (@hashed/b1/55/b1556dea32e9d0cdbfed038fd7787275775ea40939c146a64e205bcb349ad02f) ...
[Failed] backing up m*****per/pc/components/libicloud (@hashed/b1/55/b1556dea32e9d0cdbfed038fd7787275775ea40939c146a64e205bcb349ad02f)
Error 13:CreateBundle: cmd wait failed: exit status 1.
...
2023-06-15 04:58:50 +0000 -- done
2023-06-15 04:58:50 +0000 -- Dumping uploads ...
2023-06-15 04:58:50 +0000 -- done
2023-06-15 04:58:50 +0000 -- Dumping builds ...
2023-06-15 04:58:50 +0000 -- done
2023-06-15 04:58:50 +0000 -- Dumping artifacts ...
2023-06-15 04:58:50 +0000 -- done
2023-06-15 04:58:50 +0000 -- Dumping pages ...
2023-06-15 04:58:50 +0000 -- done
2023-06-15 04:58:50 +0000 -- Dumping lfs objects ...
2023-06-15 04:58:50 +0000 -- done
2023-06-15 04:58:50 +0000 -- Dumping container registry images ...
2023-06-15 04:58:50 +0000 -- [DISABLED]
Creating backup archive: 1686805131_2023_06_15_13.7.4-ee_gitlab_backup.tar ... done
Uploading backup archive to remote storage ... skipped
Deleting tmp directories ... done
done
通过日志发现存在两个仓库备份失败了,尝试在本地推送这两个仓库也出现了错误:
$ git push origin dev
Enumerating objects: 95, done.
Counting objects: 100% (95/95), done.
Delta compression using up to 8 threads
Compressing objects: 100% (53/53), done.
remote: error: unknown object type 5 at offset 22424540 in /var/opt/gitlab/git-data/repositories/@hashed/81/17/811786ad1ae74adfdd20dd0372abaaebc6246e343aebd01da0bfc4c02bf0106c.git/objects/pack/pack-c1969b89a1a5e7dbf4285cfc1bc3703c5584a605.pack
remote: error: bad offset for revindex
remote: fatal: packed object 113bb3c686ae727da0c7baaad5d61f4a1ba6be91 (stored in /var/opt/gitlab/git-data/repositories/@hashed/81/17/811786ad1ae74adfdd20dd0372abaaebc6246e343aebd01da0bfc4c02bf0106c.git/objects/pack/pack-c1969b89a1a5e7dbf4285cfc1bc3703c5584a605.pack) is corrupt
Writing objects: 100% (54/54), 4.29 MiB | 1.86 MiB/s, done.
Total 54 (delta 38), reused 0 (delta 0), pack-reused 0
error: remote unpack failed: unpack-objects abnormal exit
To http://192.168.3.225:4080/m*****per/pc/public_binaries.git
! [remote rejected] dev -> dev (unpacker error)
error: failed to push some refs to 'http://192.168.3.225:4080/m*****per/pc/public_binaries.git'
推测可能是文件损坏了,根据经验猜测这次多半又是硬盘掉链子了。
背景
gitlab运行在一台退役的台式机上,之前通过磁盘阵列卡
+ 2 * 1T机械盘
做了一个RAID 1
来存储gitlab的数据,系统单独装在一块500GB硬盘上。
总用户大概20个左右,每天平均活跃用户约5个左右,主要用于服务团队项目开发。公司版本管理系统用的是SVN
,由于项目需要,所以自己搭建了gitlab服务。
这个配置全年无休的跑了大概1年半的样子,效果很不错,然而就在某天发现服务异常的缓慢,远程登上服务器后提示 "磁盘即将损坏",具体细节什么的因为太久记不清了,总之后来是去掉了磁盘阵列卡,只用了一块单独的硬盘来存储数据(手里面的是一块的摩羯的PCI-E阵列卡,不知道是太老了还是怎么滴,每次断电后重启都需要手动在键盘上点一下才能进入系统)。
至于数据安全什么的,就勤做备份吧,好在相关仓库基本都有多个副本,唯独gitlab中无关于仓库的数据(如用户数据)可能丢了就真的丢了,但保持每周一次例行备份的话,问题不大。
而现在距离上一次故障又过去了一年多。
解决过程
打开机箱先摸了一下硬盘,还是熟悉的手感:嗯,烫手!
说来也奇怪,似乎硬盘出问题之后总是要发热,不知道这其中有什么原理,还是说先发热再导致了磁盘故障?
拆下磁盘用工具检查了下,嗯 还好,坏了4个扇区,4*512总共2k的数据,目前主要体现在两个大仓库的文件上。
由于本地有副本我也就不去折腾扇区修复与文件恢复了,准备了一个磁盘,直接上winHEX的磁盘克隆,拷贝所有数据。
WinHex 拷贝磁盘完成后的结果报告(确实有4个扇区没有拷贝成功)
2023/06/15, 16:50:47.9
WDC WD5000AAKX-08ERMA0 --> TOSHIBA DT01ACA100
扇区 0 --> 扇区 0
976,773,168 扇区
扇区无法读取:
141,497,932
141,525,070
169,426,408
955,454,810
2023/06/15, 18:13:28.0
976,773,164 个扇区被成功复制。
4 遇到坏扇区。
Corresponding target sectors filled with: 0x00
克隆完成后装回机器,成功启动gitlab服务,然后针对问题仓库做如下操作:
- 先将本地重置到一个最早期的提交(可以先创一个分支保存最新的提交记录)
git branch dump
git reset xxx
- 再强制推送(注意要覆盖到服务器上的所有分支,master的话要关闭保护分支)
git push --force xxx
- 最后再推送目前最新的提交即可
git merge dump
git push xxxx
git branch -d dump
大概原理为: 将仓库重置到一个不依赖损坏文件的版本,强制推送到服务器迫使其剔除该失效的文件。之后还原仓库到最新的提交再推送一次,把完整的内容再推送到服务器就可以了。
处理后再次备份的结果如下:
1683790997_2023_05_11_13.7.4-ee_gitlab_backup.tar 3.2G <- 正确的备份
1686805131_2023_06_15_13.7.4-ee_gitlab_backup.tar 1.1G <- 出错的备份
1686832250_2023_06_15_13.7.4-ee_gitlab_backup.tar 3.5G <- 正确的备份
Comments ()