[stork-users] Removing a node and re-adding it back causes a certificate error
mxhajduczenia at gmail.com
mxhajduczenia at gmail.com
Wed May 8 23:27:34 UTC 2024
Just to close my saga off – I rebooted the Stork server VM today and … everything seems fine now. I am stumped, honestly, since nothing has changed in terms of networking / connectivity / configuration. It just started working out of the blue.
Not a good result, honestly, since something got fixed during the reboot but I do not know what
Marek
From: Marek Hajduczenia <mxhajduczenia at gmail.com>
Sent: Tuesday, May 7, 2024 10:47 AM
To: Slawek Figiel <slawek at isc.org>
Cc: stork-users at lists.isc.org
Subject: Re: [stork-users] Removing a node and re-adding it back causes a certificate error
I did go with the recommendation and even though I am 100% sure I have IP reachability, the registration process with server token fails.
root at server-kea-node1:/home/ace# ping 172.17.129.133
PING 172.17.129.133 (172.17.129.133) 56(84) bytes of data.
64 bytes from 172.17.129.133 <http://172.17.129.133> : icmp_seq=1 ttl=64 time=0.074 ms
64 bytes from 172.17.129.133 <http://172.17.129.133> : icmp_seq=2 ttl=64 time=0.063 ms
64 bytes from 172.17.129.133 <http://172.17.129.133> : icmp_seq=3 ttl=64 time=0.147 ms
^C
--- 172.17.129.133 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2054ms
rtt min/avg/max/mdev = 0.063/0.094/0.147/0.037 ms
root at server-kea-node1:/home/ace# sudo su stork-agent -s /bin/sh -c 'stork-agent register --server-url http://172.17.129.133:8080'
>>>> Server access token (optional):
>>>> IP address or FQDN of the host with Stork Agent (for the Stork Server connection) [server-kea-node1]: 172.17.129.130
>>>> Port number that Stork Agent will listen on [8080]:
INFO[2024-05-07 16:44:26] register.go:84 Forced agent certificates regeneration.
INFO[2024-05-07 16:44:26] register.go:406 =============================================================================
INFO[2024-05-07 16:44:26] register.go:407 AGENT TOKEN: E9EE6D836E249B0E9A8898E638DECFCAD35A6577A70672E8F639D4A46CEBC211
INFO[2024-05-07 16:44:26] register.go:408 =============================================================================
INFO[2024-05-07 16:44:26] register.go:413 Machine will be automatically registered using the server token
INFO[2024-05-07 16:44:26] register.go:414 Agent token is printed above for informational purposes only
INFO[2024-05-07 16:44:26] register.go:415 User does not need to copy or verify the agent token during registration via the server token
INFO[2024-05-07 16:44:26] register.go:416 It will be sent to the server but it is not directly used in this type of machine registration
INFO[2024-05-07 16:44:26] register.go:425 Try to register agent in Stork Server
INFO[2024-05-07 16:44:26] register.go:262 Machine registered
INFO[2024-05-07 16:44:26] register.go:283 Stored agent-signed cert and CA cert
ERRO[2024-05-07 16:44:26] register.go:454 Retrying ping 1/3 due to error error="problem pinging machine: Cannot ping machine"
ERRO[2024-05-07 16:44:28] register.go:454 Retrying ping 2/3 due to error error="problem pinging machine: Cannot ping machine"
ERRO[2024-05-07 16:44:32] register.go:459 Cannot ping machine error="problem pinging machine: Cannot ping machine"
FATA[2024-05-07 16:44:32] main.go:217 Registration failed
I did try to add the --server-token flag but the net result is the same
root at server-kea-node1:/home/ace# sudo su stork-agent -s /bin/sh -c 'stork-agent register --server-url http://172.17.129.133:8080 --server-token OQYuMxkWmc3dySolt6uytLY4NrSkLWpo'
>>>> IP address or FQDN of the host with Stork Agent (for the Stork Server connection) [server-kea-node1]: 172.17.129.130
>>>> Port number that Stork Agent will listen on [8080]:
INFO[2024-05-07 16:46:52] register.go:84 Forced agent certificates regeneration.
INFO[2024-05-07 16:46:52] register.go:406 =============================================================================
INFO[2024-05-07 16:46:52] register.go:407 AGENT TOKEN: D43AA9AA37F03B1D24A0ADC9CB23E4137FCC284429A1CC87AE397CC78E3DE4FC
INFO[2024-05-07 16:46:52] register.go:408 =============================================================================
INFO[2024-05-07 16:46:52] register.go:413 Machine will be automatically registered using the server token
INFO[2024-05-07 16:46:52] register.go:414 Agent token is printed above for informational purposes only
INFO[2024-05-07 16:46:52] register.go:415 User does not need to copy or verify the agent token during registration via the server token
INFO[2024-05-07 16:46:52] register.go:416 It will be sent to the server but it is not directly used in this type of machine registration
INFO[2024-05-07 16:46:52] register.go:425 Try to register agent in Stork Server
INFO[2024-05-07 16:46:52] register.go:262 Machine registered
INFO[2024-05-07 16:46:52] register.go:283 Stored agent-signed cert and CA cert
ERRO[2024-05-07 16:46:52] register.go:454 Retrying ping 1/3 due to error error="problem pinging machine: Cannot ping machine"
ERRO[2024-05-07 16:46:54] register.go:454 Retrying ping 2/3 due to error error="problem pinging machine: Cannot ping machine"
ERRO[2024-05-07 16:46:58] register.go:459 Cannot ping machine error="problem pinging machine: Cannot ping machine"
FATA[2024-05-07 16:46:58] main.go:217 Registration failed
Regards
Marek
On Tue, May 7, 2024 at 10:38 AM Slawek Figiel <slawek at isc.org <mailto:slawek at isc.org> > wrote:
Marek,
it is interesting case. But don't worry I'm sure we will find the cause
of the problem soon.
I see you performed the manual registration using the "register"
command. Could you use this command again, but this time provide the
`--server-token` flag? Your server token is on the machines page.
An additional check is performed when the `--server-token` flag is used.
After the successful registration, the server sends the Ping request
over the GRPC protocol to the agent. It verifies whether the provided
agent host is accessible from the server machine.
If the operation fails, you must check your network configuration and
the IP address provided as the agent host.
I'm waiting for your feedback.
Regards,
Slawek
On 07/05/2024 18:25, Marek Hajduczenia wrote:
> Inline, please, with [mh0507] tags
>
> -----Original Message-----
> From: Slawek Figiel <slawek at isc.org <mailto:slawek at isc.org> >
> Sent: Tuesday, May 7, 2024 10:21 AM
> To: Marek Hajduczenia <mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com> >
> Cc: stork-users at lists.isc.org <mailto:stork-users at lists.isc.org>
> Subject: Re: [stork-users] Removing a node and re-adding it back causes a certificate error
>
> Marek,
>
> > That has not solved my problem. I went through the following process
> >
> > 1. Remove the previous registration for .130 machine at Stork GUI
> > (Action > Remove)
> > 2. Remove all content from /var/lib/stork-agen/certs and
> > /var/lib/stork-agen/tokens
> > 3. Re-run registration
>
> Did you re-authorize the machine? (Machines => Unathorized => Click the Authorize button). I suppose yes but I would like to double-check.
>
> [mh0507] Correct, I did re-authorize the machine, that is part of the standard work flow covered in the documentation for Stork.
>
> > I am back where I was
>
> Hmm... Could you verify if the Stork server and Stork agent versions are the same? You can check them by `stork-server --version` and `stork-agent --version` commands.
>
> [mh0507] As requested, they are both on 1.16.0 as shown below.
>
> root at server-kea-control:/etc/stork# stork-server --version
> 1.16.0
>
> root at server-kea-node1:/var/lib/stork-agent# stork-agent --version
> 1.16.0
>
> Slawek
>
> On 07/05/2024 16:23, Marek Hajduczenia wrote:
>> The certs have been regenerated on the node, for what it is worth
>>
>> root at server-kea-node1:/var/lib/stork-agent# ls -lah certs/ total 20K
>> drwx------ 2 stork-agent root 4.0K May 7 11:47 .
>> drwxr-xr-x 4 stork-agent root 4.0K May 6 19:08 ..
>> -rw------- 1 stork-agent stork-agent 664 May 7 11:47 ca.pem
>> -rw------- 1 stork-agent stork-agent 656 May 7 11:47 cert.pem
>> -rw------- 1 stork-agent stork-agent 241 May 7 11:47 key.pem
>>
>> but it seems that the Stork Server side is holding onto old certs? Not
>> sure where they would be stored - likely in the backend DB, but I do
>> not want to delete things at random.
>>
>> Regards
>>
>> Marek
>>
>> On Tue, May 7, 2024 at 5:56 AM Marek Hajduczenia
>> <mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com> <mailto:mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com> >> wrote:
>>
>> Hi Slawek,
>>
>> That has not solved my problem. I went through the following
>> process
>>
>> 1. Remove the previous registration for .130 machine at Stork GUI
>> (Action > Remove)
>> 2. Remove all content from /var/lib/stork-agen/certs and
>> /var/lib/stork-agen/tokens
>> 3. Re-run registration
>>
>> root at server-kea-node1:/var/lib/stork-agent/tokens# sudo su
>> stork-agent -s /bin/sh -c 'stork-agent register --server-url
>> http://172.17.129.251:8080 <http://172.17.129.251:8080>'
>> >>>> Server access token (optional):
>> >>>> IP address or FQDN of the host with Stork Agent (for the Stork
>> Server connection) [server-kea-node1]: 172.17.129.130
>> >>>> Port number that Stork Agent will listen on [8080]:
>> INFO[2024-05-07 11:47:14] register.go:81 There are no
>> agent certificates - they will be generated.
>> INFO[2024-05-07 11:47:14] register.go:406
>> =============================================================================
>> INFO[2024-05-07 11:47:14] register.go:407 AGENT TOKEN:
>> B777710F0547C3EA237002537E4B18202F888F4D0F6C2C00BA105167DE1688CE
>> INFO[2024-05-07 11:47:14] register.go:408
>> =============================================================================
>> INFO[2024-05-07 11:47:14] register.go:411 Authorize the
>> machine in the Stork web UI
>> INFO[2024-05-07 11:47:14] register.go:425 Try to register
>> agent in Stork Server
>> INFO[2024-05-07 11:47:14] register.go:262 Machine registered
>> INFO[2024-05-07 11:47:14] register.go:283 Stored
>> agent-signed cert and CA cert
>> INFO[2024-05-07 11:47:14] main.go:215 Registration
>> completed successfully
>>
>> 4. I am back where I was
>>
>> image.png
>>
>> I did restart the local Stork agent but that did not change
>> anything
>>
>> root at server-kea-node1:/var/lib/stork-agent/tokens# service
>> isc-kea-ctrl-agent restart
>> root at server-kea-node1:/var/lib/stork-agent/tokens# service
>> isc-kea-ctrl-agent status
>> ● isc-kea-ctrl-agent.service - Kea Control Agent
>> Loaded: loaded
>> (/lib/systemd/system/isc-kea-ctrl-agent.service; enabled; vendor
>> preset: enabled)
>> Active: active (running) since Tue 2024-05-07 11:50:16 UTC; 3s ago
>> Docs: man:kea-ctrl-agent(8)
>> Main PID: 10543 (kea-ctrl-agent)
>> Tasks: 5 (limit: 9343)
>> Memory: 1.4M
>> CPU: 7ms
>> CGroup: /system.slice/isc-kea-ctrl-agent.service
>> └─10543 /usr/sbin/kea-ctrl-agent -c
>> /etc/kea/kea-ctrl-agent.conf
>>
>> May 07 11:50:16 server-kea-node1 systemd[1]:
>> isc-kea-ctrl-agent.service: Deactivated successfully.
>> May 07 11:50:16 server-kea-node1 systemd[1]: Stopped Kea Control Agent.
>> May 07 11:50:16 server-kea-node1 systemd[1]:
>> isc-kea-ctrl-agent.service: Consumed 48.595s CPU time.
>> May 07 11:50:16 server-kea-node1 systemd[1]: Started Kea Control Agent.
>>
>> For what is worth, the message in the logs has changed
>>
>> May 7 11:54:39 server-kea-control stork-server[719]:
>> time="2024-05-07 11:54:39" level="info" msg="Completed pulling lease
>> stats from Kea apps: 0/1 succeeded" file=" statspuller.go:71 "
>> May 7 11:54:39 server-kea-control stork-server[719]:
>> time="2024-05-07 11:54:39" level="warning" msg="rpc error: code =
>> Unavailable desc = connection error: desc = \"error reading server
>> preface: remote error: tls: bad certificate\"" file="
>> manager.go:124 " agent="172.17.129.130:8080 <http://172.17.129.130:8080>
>> <http://172.17.129.130:8080>"
>> May 7 11:54:39 server-kea-control stork-server[719]:
>> time="2024-05-07 11:54:39" level="warning" msg="Failed to get state
>> from the Stork agent; the agent is still not responding" file="
>> grpcli.go:326 " agent="172.17.129.130:8080
>> <http://172.17.129.130:8080>"
>> May 7 11:54:39 server-kea-control stork-server[719]:
>> time="2024-05-07 11:54:39" level="warning" msg="failed to get state
>> from agent 172.17.129.130:8080 <http://172.17.129.130:8080> <http://172.17.129.130:8080>: grpc
>> manager is unable to re-establish connection with the agent
>> 172.17.129.130:8080 <http://172.17.129.130:8080> <http://172.17.129.130:8080>: rpc error: code =
>> Unavailable desc = connection error: desc = \"error reading server
>> preface: remote error: tls: bad certificate\"" file="
>> statepuller.go:247 "
>>
>> Not sure whether it for the better or worse
>>
>> Regards
>>
>> Marek
>>
>> On Tue, May 7, 2024 at 4:06 AM Slawek Figiel <slawek at isc.org <mailto:slawek at isc.org>
>> <mailto:slawek at isc.org <mailto:slawek at isc.org> >> wrote:
>>
>> Hello Marek!
>>
>> Stork server reports that the agent introduced itself with a "bad
>> certificate." Several reasons may cause it. I think you should
>> remove
>> the existing cert files and re-register the agent. Please do the
>> following steps:
>>
>> 1. On the agent machine, remove the files in the
>> `/var/lib/stork-agent`
>> directory (you need to remove all files from the `certs` and
>> `tokens`
>> subdirectories)
>> 2. If you manually registered the agent (by the `register`
>> command, you
>> need to call it again and restart the agent. If you used the
>> self-registration flow, just restart the agent.
>> 3. Open the Stork UI, go to the machines list, switch to the
>> "Unauthorized" tab, and re-authorize the agent.
>>
>> I hope it'll solve your problem.
>> Don't hesitate to ask for more details if you have any questions.
>>
>> Regards,
>> Slawek Figiel
>>
>> On 07/05/2024 00:05, mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com>
>> <mailto:mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com> > wrote:
>> > Dear Forum,
>> >
>> > I had two nodes added to Stork: .130 and .131 and they were
>> working
>> > correctly. Node .130 had a kernel failure due to changes I
>> was trying to
>> > apply and I did not make a copy, unfortunately. Long story
>> short, I had
>> > to re-install node .130, and then I wanted to add it back to
>> Stork
>> >
>> > No matter what I do, I am getting the error shown above,
>> i.e., Cannot
>> > get state of machine.
>> >
>> > Syslog review shows only one error message following two
>> warning messages.
>> >
>> > May 6 21:58:38 server-kea-control stork-server[719]:
>> time="2024-05-06
>> > 21:58:38" level="warning" msg="rpc error: code = Unavailable
>> desc =
>> > connection error: desc = \"error reading server preface:
>> remote error:
>> > tls: bad certificate\"" file=" manager.go:124 "
>> > agent="172.17.129.130:8080 <http://172.17.129.130:8080> <http://172.17.129.130:8080>"
>> >
>> > May 6 21:58:38 server-kea-control stork-server[719]:
>> time="2024-05-06
>> > 21:58:38" level="warning" msg="Failed to get state from the
>> Stork agent;
>> > the agent is still not responding" file="
>> grpcli.go:326 "
>> > agent="172.17.129.130:8080 <http://172.17.129.130:8080> <http://172.17.129.130:8080>"
>> >
>> > May 6 21:58:38 server-kea-control stork-server[719]:
>> time="2024-05-06
>> > 21:58:38" level="warning" msg="failed to get state from agent
>> > 172.17.129.130:8080 <http://172.17.129.130:8080> <http://172.17.129.130:8080>: grpc
>> manager is unable to re-establish connection
>> > with the agent 172.17.129.130:8080 <http://172.17.129.130:8080>
>> <http://172.17.129.130:8080>: rpc error: code = Unavailable desc =
>> > connection error: desc = \"error reading server preface:
>> remote error:
>> > tls: bad certificate\"" file=" statepuller.go:247 "
>> >
>> > I suspect that the TLS certificate does to get cleared when
>> the machine
>> > is removed and a machine with the same IP address is re-added.
>> >
>> > I did not find a remedy for it for now and I do not fancy a
>> complete
>> > re-install of Stork if I can avoid it.
>> >
>> > Any suggestions on how to fix it?
>> >
>> > Regards
>> >
>> > Marek
>> >
>> >
>> --
>> Stork-users mailing list
>> Stork-users at lists.isc.org <mailto:Stork-users at lists.isc.org> <mailto:Stork-users at lists.isc.org <mailto:Stork-users at lists.isc.org> >
>> https://lists.isc.org/mailman/listinfo/stork-users
>> <https://lists.isc.org/mailman/listinfo/stork-users>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/stork-users/attachments/20240508/3711cf96/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 38733 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/stork-users/attachments/20240508/3711cf96/attachment-0001.png>
More information about the Stork-users
mailing list