之前有两位朋友碰到过在对greenplum的系统构架更改后,出现全量恢复gprecoverseg -F也无法正常运行的情况。
报错信息为Unable to connect to database. Retrying 1
gprecoverseg failed. (Reason='Unable to connect to database and start transaction') exiting...
有幸拷得一份
虚拟机上的全部文件,对其进行分析。
发现其实出现这个问题只需要修改pg_changetracking下的CT_METADATA,或者说从其他正常的主事例上拷贝一份到出问题的主事例上即可。
以下为大致分析的过程,感兴趣的可以看一下。
--启动
数据库,有一个mirror出错。
[gpadmin@gpmaster ~]$ gpstart -a
20150727:22:28:21:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting gpstart with args: -a
20150727:22:28:21:001922 gpstart:gpmaster:gpadmin-[INFO]:-Gathering information and validating the environment...
20150727:22:28:28:001922 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.4.1 build 2'
20150727:22:28:28:001922 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
20150727:22:28:29:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance in admin mode
20150727:22:28:33:001922 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20150727:22:28:33:001922 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Setting new master era
20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Master Started...
20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Shutting down master
20150727:22:28:36:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-2 directory /data/mirror/gpseg0 <<<<<
20150727:22:28:36:001922 gpstart:gpmaster:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
...............................................
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Process results...
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:- Successful segment starts = 3
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:- Failed segment starts = 0
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 1 <<<<<<<<
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Successfully started 3 of 3 segment instances, skipped 1 other segments
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-****************************************************************************
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-There are 1 segment(s) marked down in the database
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
20150727:22:29:23:001922 gpstart:gpmaste |