[stork-users] Removing a node and re-adding it back causes a certificate error
Marek Hajduczenia
mxhajduczenia at gmail.com
Tue May 7 16:25:05 UTC 2024
Inline, please, with [mh0507] tags
-----Original Message-----
From: Slawek Figiel <slawek at isc.org>
Sent: Tuesday, May 7, 2024 10:21 AM
To: Marek Hajduczenia <mxhajduczenia at gmail.com>
Cc: stork-users at lists.isc.org
Subject: Re: [stork-users] Removing a node and re-adding it back causes a certificate error
Marek,
> That has not solved my problem. I went through the following process
>
> 1. Remove the previous registration for .130 machine at Stork GUI
> (Action > Remove)
> 2. Remove all content from /var/lib/stork-agen/certs and
> /var/lib/stork-agen/tokens
> 3. Re-run registration
Did you re-authorize the machine? (Machines => Unathorized => Click the Authorize button). I suppose yes but I would like to double-check.
[mh0507] Correct, I did re-authorize the machine, that is part of the standard work flow covered in the documentation for Stork.
> I am back where I was
Hmm... Could you verify if the Stork server and Stork agent versions are the same? You can check them by `stork-server --version` and `stork-agent --version` commands.
[mh0507] As requested, they are both on 1.16.0 as shown below.
root at server-kea-control:/etc/stork# stork-server --version
1.16.0
root at server-kea-node1:/var/lib/stork-agent# stork-agent --version
1.16.0
Slawek
On 07/05/2024 16:23, Marek Hajduczenia wrote:
> The certs have been regenerated on the node, for what it is worth
>
> root at server-kea-node1:/var/lib/stork-agent# ls -lah certs/ total 20K
> drwx------ 2 stork-agent root 4.0K May 7 11:47 .
> drwxr-xr-x 4 stork-agent root 4.0K May 6 19:08 ..
> -rw------- 1 stork-agent stork-agent 664 May 7 11:47 ca.pem
> -rw------- 1 stork-agent stork-agent 656 May 7 11:47 cert.pem
> -rw------- 1 stork-agent stork-agent 241 May 7 11:47 key.pem
>
> but it seems that the Stork Server side is holding onto old certs? Not
> sure where they would be stored - likely in the backend DB, but I do
> not want to delete things at random.
>
> Regards
>
> Marek
>
> On Tue, May 7, 2024 at 5:56 AM Marek Hajduczenia
> <mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com>> wrote:
>
> Hi Slawek,
>
> That has not solved my problem. I went through the following
> process
>
> 1. Remove the previous registration for .130 machine at Stork GUI
> (Action > Remove)
> 2. Remove all content from /var/lib/stork-agen/certs and
> /var/lib/stork-agen/tokens
> 3. Re-run registration
>
> root at server-kea-node1:/var/lib/stork-agent/tokens# sudo su
> stork-agent -s /bin/sh -c 'stork-agent register --server-url
> http://172.17.129.251:8080 <http://172.17.129.251:8080>'
> >>>> Server access token (optional):
> >>>> IP address or FQDN of the host with Stork Agent (for the Stork
> Server connection) [server-kea-node1]: 172.17.129.130
> >>>> Port number that Stork Agent will listen on [8080]:
> INFO[2024-05-07 11:47:14] register.go:81 There are no
> agent certificates - they will be generated.
> INFO[2024-05-07 11:47:14] register.go:406
> =============================================================================
> INFO[2024-05-07 11:47:14] register.go:407 AGENT TOKEN:
> B777710F0547C3EA237002537E4B18202F888F4D0F6C2C00BA105167DE1688CE
> INFO[2024-05-07 11:47:14] register.go:408
> =============================================================================
> INFO[2024-05-07 11:47:14] register.go:411 Authorize the
> machine in the Stork web UI
> INFO[2024-05-07 11:47:14] register.go:425 Try to register
> agent in Stork Server
> INFO[2024-05-07 11:47:14] register.go:262 Machine registered
> INFO[2024-05-07 11:47:14] register.go:283 Stored
> agent-signed cert and CA cert
> INFO[2024-05-07 11:47:14] main.go:215 Registration
> completed successfully
>
> 4. I am back where I was
>
> image.png
>
> I did restart the local Stork agent but that did not change
> anything
>
> root at server-kea-node1:/var/lib/stork-agent/tokens# service
> isc-kea-ctrl-agent restart
> root at server-kea-node1:/var/lib/stork-agent/tokens# service
> isc-kea-ctrl-agent status
> ● isc-kea-ctrl-agent.service - Kea Control Agent
> Loaded: loaded
> (/lib/systemd/system/isc-kea-ctrl-agent.service; enabled; vendor
> preset: enabled)
> Active: active (running) since Tue 2024-05-07 11:50:16 UTC; 3s ago
> Docs: man:kea-ctrl-agent(8)
> Main PID: 10543 (kea-ctrl-agent)
> Tasks: 5 (limit: 9343)
> Memory: 1.4M
> CPU: 7ms
> CGroup: /system.slice/isc-kea-ctrl-agent.service
> └─10543 /usr/sbin/kea-ctrl-agent -c
> /etc/kea/kea-ctrl-agent.conf
>
> May 07 11:50:16 server-kea-node1 systemd[1]:
> isc-kea-ctrl-agent.service: Deactivated successfully.
> May 07 11:50:16 server-kea-node1 systemd[1]: Stopped Kea Control Agent.
> May 07 11:50:16 server-kea-node1 systemd[1]:
> isc-kea-ctrl-agent.service: Consumed 48.595s CPU time.
> May 07 11:50:16 server-kea-node1 systemd[1]: Started Kea Control Agent.
>
> For what is worth, the message in the logs has changed
>
> May 7 11:54:39 server-kea-control stork-server[719]:
> time="2024-05-07 11:54:39" level="info" msg="Completed pulling lease
> stats from Kea apps: 0/1 succeeded" file=" statspuller.go:71 "
> May 7 11:54:39 server-kea-control stork-server[719]:
> time="2024-05-07 11:54:39" level="warning" msg="rpc error: code =
> Unavailable desc = connection error: desc = \"error reading server
> preface: remote error: tls: bad certificate\"" file="
> manager.go:124 " agent="172.17.129.130:8080
> <http://172.17.129.130:8080>"
> May 7 11:54:39 server-kea-control stork-server[719]:
> time="2024-05-07 11:54:39" level="warning" msg="Failed to get state
> from the Stork agent; the agent is still not responding" file="
> grpcli.go:326 " agent="172.17.129.130:8080
> <http://172.17.129.130:8080>"
> May 7 11:54:39 server-kea-control stork-server[719]:
> time="2024-05-07 11:54:39" level="warning" msg="failed to get state
> from agent 172.17.129.130:8080 <http://172.17.129.130:8080>: grpc
> manager is unable to re-establish connection with the agent
> 172.17.129.130:8080 <http://172.17.129.130:8080>: rpc error: code =
> Unavailable desc = connection error: desc = \"error reading server
> preface: remote error: tls: bad certificate\"" file="
> statepuller.go:247 "
>
> Not sure whether it for the better or worse
>
> Regards
>
> Marek
>
> On Tue, May 7, 2024 at 4:06 AM Slawek Figiel <slawek at isc.org
> <mailto:slawek at isc.org>> wrote:
>
> Hello Marek!
>
> Stork server reports that the agent introduced itself with a "bad
> certificate." Several reasons may cause it. I think you should
> remove
> the existing cert files and re-register the agent. Please do the
> following steps:
>
> 1. On the agent machine, remove the files in the
> `/var/lib/stork-agent`
> directory (you need to remove all files from the `certs` and
> `tokens`
> subdirectories)
> 2. If you manually registered the agent (by the `register`
> command, you
> need to call it again and restart the agent. If you used the
> self-registration flow, just restart the agent.
> 3. Open the Stork UI, go to the machines list, switch to the
> "Unauthorized" tab, and re-authorize the agent.
>
> I hope it'll solve your problem.
> Don't hesitate to ask for more details if you have any questions.
>
> Regards,
> Slawek Figiel
>
> On 07/05/2024 00:05, mxhajduczenia at gmail.com
> <mailto:mxhajduczenia at gmail.com> wrote:
> > Dear Forum,
> >
> > I had two nodes added to Stork: .130 and .131 and they were
> working
> > correctly. Node .130 had a kernel failure due to changes I
> was trying to
> > apply and I did not make a copy, unfortunately. Long story
> short, I had
> > to re-install node .130, and then I wanted to add it back to
> Stork
> >
> > No matter what I do, I am getting the error shown above,
> i.e., Cannot
> > get state of machine.
> >
> > Syslog review shows only one error message following two
> warning messages.
> >
> > May 6 21:58:38 server-kea-control stork-server[719]:
> time="2024-05-06
> > 21:58:38" level="warning" msg="rpc error: code = Unavailable
> desc =
> > connection error: desc = \"error reading server preface:
> remote error:
> > tls: bad certificate\"" file=" manager.go:124 "
> > agent="172.17.129.130:8080 <http://172.17.129.130:8080>"
> >
> > May 6 21:58:38 server-kea-control stork-server[719]:
> time="2024-05-06
> > 21:58:38" level="warning" msg="Failed to get state from the
> Stork agent;
> > the agent is still not responding" file="
> grpcli.go:326 "
> > agent="172.17.129.130:8080 <http://172.17.129.130:8080>"
> >
> > May 6 21:58:38 server-kea-control stork-server[719]:
> time="2024-05-06
> > 21:58:38" level="warning" msg="failed to get state from agent
> > 172.17.129.130:8080 <http://172.17.129.130:8080>: grpc
> manager is unable to re-establish connection
> > with the agent 172.17.129.130:8080
> <http://172.17.129.130:8080>: rpc error: code = Unavailable desc =
> > connection error: desc = \"error reading server preface:
> remote error:
> > tls: bad certificate\"" file=" statepuller.go:247 "
> >
> > I suspect that the TLS certificate does to get cleared when
> the machine
> > is removed and a machine with the same IP address is re-added.
> >
> > I did not find a remedy for it for now and I do not fancy a
> complete
> > re-install of Stork if I can avoid it.
> >
> > Any suggestions on how to fix it?
> >
> > Regards
> >
> > Marek
> >
> >
> --
> Stork-users mailing list
> Stork-users at lists.isc.org <mailto:Stork-users at lists.isc.org>
> https://lists.isc.org/mailman/listinfo/stork-users
> <https://lists.isc.org/mailman/listinfo/stork-users>
>
More information about the Stork-users
mailing list