[stork-users] Removing a node and re-adding it back causes a certificate error

Marek Hajduczenia mxhajduczenia at gmail.com
Tue May 7 16:25:05 UTC 2024


Inline, please, with [mh0507] tags 

-----Original Message-----
From: Slawek Figiel <slawek at isc.org> 
Sent: Tuesday, May 7, 2024 10:21 AM
To: Marek Hajduczenia <mxhajduczenia at gmail.com>
Cc: stork-users at lists.isc.org
Subject: Re: [stork-users] Removing a node and re-adding it back causes a certificate error

Marek,

 >     That has not solved my problem. I went through the following process
 >
 >     1. Remove the previous registration for .130 machine at Stork GUI
 >     (Action > Remove)
 >     2. Remove all content from /var/lib/stork-agen/certs and
 >     /var/lib/stork-agen/tokens
 >     3. Re-run registration

Did you re-authorize the machine? (Machines => Unathorized => Click the Authorize button). I suppose yes but I would like to double-check.

[mh0507] Correct, I did re-authorize the machine, that is part of the standard work flow covered in the documentation for Stork. 

 > I am back where I was

Hmm... Could you verify if the Stork server and Stork agent versions are the same? You can check them by `stork-server --version` and `stork-agent --version` commands.

[mh0507] As requested, they are both on 1.16.0 as shown below. 

root at server-kea-control:/etc/stork# stork-server --version
1.16.0

root at server-kea-node1:/var/lib/stork-agent# stork-agent --version
1.16.0

Slawek

On 07/05/2024 16:23, Marek Hajduczenia wrote:
> The certs have been regenerated on the node, for what it is worth
> 
> root at server-kea-node1:/var/lib/stork-agent# ls -lah certs/ total 20K
> drwx------ 2 stork-agent root        4.0K May  7 11:47 .
> drwxr-xr-x 4 stork-agent root        4.0K May  6 19:08 ..
> -rw------- 1 stork-agent stork-agent  664 May  7 11:47 ca.pem
> -rw------- 1 stork-agent stork-agent  656 May  7 11:47 cert.pem
> -rw------- 1 stork-agent stork-agent  241 May  7 11:47 key.pem
> 
> but it seems that the Stork Server side is holding onto old certs? Not 
> sure where they would be stored - likely in the backend DB, but I do 
> not want to delete things at random.
> 
> Regards
> 
> Marek
> 
> On Tue, May 7, 2024 at 5:56 AM Marek Hajduczenia 
> <mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com>> wrote:
> 
>     Hi Slawek,
> 
>     That has not solved my problem. I went through the following 
> process
> 
>     1. Remove the previous registration for .130 machine at Stork GUI
>     (Action > Remove)
>     2. Remove all content from /var/lib/stork-agen/certs and
>     /var/lib/stork-agen/tokens
>     3. Re-run registration
> 
>     root at server-kea-node1:/var/lib/stork-agent/tokens# sudo su
>     stork-agent -s /bin/sh -c 'stork-agent register --server-url
>     http://172.17.129.251:8080 <http://172.17.129.251:8080>'
>      >>>> Server access token (optional):
>      >>>> IP address or FQDN of the host with Stork Agent (for the Stork
>     Server connection) [server-kea-node1]: 172.17.129.130
>      >>>> Port number that Stork Agent will listen on [8080]:
>     INFO[2024-05-07 11:47:14]         register.go:81    There are no
>     agent certificates - they will be generated.
>     INFO[2024-05-07 11:47:14]         register.go:406  
>     =============================================================================
>     INFO[2024-05-07 11:47:14]         register.go:407   AGENT TOKEN:
>     B777710F0547C3EA237002537E4B18202F888F4D0F6C2C00BA105167DE1688CE
>     INFO[2024-05-07 11:47:14]         register.go:408  
>     =============================================================================
>     INFO[2024-05-07 11:47:14]         register.go:411   Authorize the
>     machine in the Stork web UI
>     INFO[2024-05-07 11:47:14]         register.go:425   Try to register
>     agent in Stork Server
>     INFO[2024-05-07 11:47:14]         register.go:262   Machine registered
>     INFO[2024-05-07 11:47:14]         register.go:283   Stored
>     agent-signed cert and CA cert
>     INFO[2024-05-07 11:47:14]             main.go:215   Registration
>     completed successfully
> 
>     4. I am back where I was
> 
>     image.png
> 
>     I did restart the local Stork agent but that did not change 
> anything
> 
>     root at server-kea-node1:/var/lib/stork-agent/tokens# service
>     isc-kea-ctrl-agent restart
>     root at server-kea-node1:/var/lib/stork-agent/tokens# service
>     isc-kea-ctrl-agent status
>     ● isc-kea-ctrl-agent.service - Kea Control Agent
>           Loaded: loaded
>     (/lib/systemd/system/isc-kea-ctrl-agent.service; enabled; vendor
>     preset: enabled)
>           Active: active (running) since Tue 2024-05-07 11:50:16 UTC; 3s ago
>             Docs: man:kea-ctrl-agent(8)
>         Main PID: 10543 (kea-ctrl-agent)
>            Tasks: 5 (limit: 9343)
>           Memory: 1.4M
>              CPU: 7ms
>           CGroup: /system.slice/isc-kea-ctrl-agent.service
>                   └─10543 /usr/sbin/kea-ctrl-agent -c
>     /etc/kea/kea-ctrl-agent.conf
> 
>     May 07 11:50:16 server-kea-node1 systemd[1]:
>     isc-kea-ctrl-agent.service: Deactivated successfully.
>     May 07 11:50:16 server-kea-node1 systemd[1]: Stopped Kea Control Agent.
>     May 07 11:50:16 server-kea-node1 systemd[1]:
>     isc-kea-ctrl-agent.service: Consumed 48.595s CPU time.
>     May 07 11:50:16 server-kea-node1 systemd[1]: Started Kea Control Agent.
> 
>     For what is worth, the message in the logs has changed
> 
>     May  7 11:54:39 server-kea-control stork-server[719]:
>     time="2024-05-07 11:54:39" level="info" msg="Completed pulling lease
>     stats from Kea apps: 0/1 succeeded" file="      statspuller.go:71   "
>     May  7 11:54:39 server-kea-control stork-server[719]:
>     time="2024-05-07 11:54:39" level="warning" msg="rpc error: code =
>     Unavailable desc = connection error: desc = \"error reading server
>     preface: remote error: tls: bad certificate\"" file="        
>       manager.go:124  " agent="172.17.129.130:8080
>     <http://172.17.129.130:8080>"
>     May  7 11:54:39 server-kea-control stork-server[719]:
>     time="2024-05-07 11:54:39" level="warning" msg="Failed to get state
>     from the Stork agent; the agent is still not responding" file="    
>            grpcli.go:326  " agent="172.17.129.130:8080
>     <http://172.17.129.130:8080>"
>     May  7 11:54:39 server-kea-control stork-server[719]:
>     time="2024-05-07 11:54:39" level="warning" msg="failed to get state
>     from agent 172.17.129.130:8080 <http://172.17.129.130:8080>: grpc
>     manager is unable to re-establish connection with the agent
>     172.17.129.130:8080 <http://172.17.129.130:8080>: rpc error: code =
>     Unavailable desc = connection error: desc = \"error reading server
>     preface: remote error: tls: bad certificate\"" file="    
>       statepuller.go:247  "
> 
>     Not sure whether it for the better or worse
> 
>     Regards
> 
>     Marek
> 
>     On Tue, May 7, 2024 at 4:06 AM Slawek Figiel <slawek at isc.org
>     <mailto:slawek at isc.org>> wrote:
> 
>         Hello Marek!
> 
>         Stork server reports that the agent introduced itself with a "bad
>         certificate." Several reasons may cause it. I think you should
>         remove
>         the existing cert files and re-register the agent. Please do the
>         following steps:
> 
>         1. On the agent machine, remove the files in the
>         `/var/lib/stork-agent`
>         directory (you need to remove all files from the `certs` and
>         `tokens`
>         subdirectories)
>         2. If you manually registered the agent (by the `register`
>         command, you
>         need to call it again and restart the agent. If you used the
>         self-registration flow, just restart the agent.
>         3. Open the Stork UI, go to the machines list, switch to the
>         "Unauthorized" tab, and re-authorize the agent.
> 
>         I hope it'll solve your problem.
>         Don't hesitate to ask for more details if you have any questions.
> 
>         Regards,
>         Slawek Figiel
> 
>         On 07/05/2024 00:05, mxhajduczenia at gmail.com
>         <mailto:mxhajduczenia at gmail.com> wrote:
>          > Dear Forum,
>          >
>          > I had two nodes added to Stork: .130 and .131 and they were
>         working
>          > correctly. Node .130 had a kernel failure due to changes I
>         was trying to
>          > apply and I did not make a copy, unfortunately. Long story
>         short, I had
>          > to re-install node .130, and then I wanted to add it back to
>         Stork
>          >
>          > No matter what I do, I am getting the error shown above,
>         i.e., Cannot
>          > get state of machine.
>          >
>          > Syslog review shows only one error message following two
>         warning messages.
>          >
>          > May  6 21:58:38 server-kea-control stork-server[719]:
>         time="2024-05-06
>          > 21:58:38" level="warning" msg="rpc error: code = Unavailable
>         desc =
>          > connection error: desc = \"error reading server preface:
>         remote error:
>          > tls: bad certificate\"" file="          manager.go:124  "
>          > agent="172.17.129.130:8080 <http://172.17.129.130:8080>"
>          >
>          > May  6 21:58:38 server-kea-control stork-server[719]:
>         time="2024-05-06
>          > 21:58:38" level="warning" msg="Failed to get state from the
>         Stork agent;
>          > the agent is still not responding" file="          
>         grpcli.go:326  "
>          > agent="172.17.129.130:8080 <http://172.17.129.130:8080>"
>          >
>          > May  6 21:58:38 server-kea-control stork-server[719]:
>         time="2024-05-06
>          > 21:58:38" level="warning" msg="failed to get state from agent
>          > 172.17.129.130:8080 <http://172.17.129.130:8080>: grpc
>         manager is unable to re-establish connection
>          > with the agent 172.17.129.130:8080
>         <http://172.17.129.130:8080>: rpc error: code = Unavailable desc =
>          > connection error: desc = \"error reading server preface:
>         remote error:
>          > tls: bad certificate\"" file="      statepuller.go:247  "
>          >
>          > I suspect that the TLS certificate does to get cleared when
>         the machine
>          > is removed and a machine with the same IP address is re-added.
>          >
>          > I did not find a remedy for it for now and I do not fancy a
>         complete
>          > re-install of Stork if I can avoid it.
>          >
>          > Any suggestions on how to fix it?
>          >
>          > Regards
>          >
>          > Marek
>          >
>          >
>         -- 
>         Stork-users mailing list
>         Stork-users at lists.isc.org <mailto:Stork-users at lists.isc.org>
>         https://lists.isc.org/mailman/listinfo/stork-users
>         <https://lists.isc.org/mailman/listinfo/stork-users>
> 



More information about the Stork-users mailing list