[stork-users] Removing a node and re-adding it back causes a certificate error

Marek Hajduczenia mxhajduczenia at gmail.com
Tue May 7 16:47:18 UTC 2024


I did go with the recommendation and even though I am 100% sure I have IP
reachability, the registration process with server token fails.

root at server-kea-node1:/home/ace# ping 172.17.129.133
PING 172.17.129.133 (172.17.129.133) 56(84) bytes of data.
64 bytes from 172.17.129.133: icmp_seq=1 ttl=64 time=0.074 ms
64 bytes from 172.17.129.133: icmp_seq=2 ttl=64 time=0.063 ms
64 bytes from 172.17.129.133: icmp_seq=3 ttl=64 time=0.147 ms
^C
--- 172.17.129.133 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2054ms
rtt min/avg/max/mdev = 0.063/0.094/0.147/0.037 ms
root at server-kea-node1:/home/ace# sudo su stork-agent -s /bin/sh -c
'stork-agent register --server-url http://172.17.129.133:8080'
>>>> Server access token (optional):
>>>> IP address or FQDN of the host with Stork Agent (for the Stork Server
connection) [server-kea-node1]: 172.17.129.130
>>>> Port number that Stork Agent will listen on [8080]:
INFO[2024-05-07 16:44:26]         register.go:84    Forced agent
certificates regeneration.
INFO[2024-05-07 16:44:26]         register.go:406
=============================================================================

INFO[2024-05-07 16:44:26]         register.go:407   AGENT TOKEN:
E9EE6D836E249B0E9A8898E638DECFCAD35A6577A70672E8F639D4A46CEBC211
INFO[2024-05-07 16:44:26]         register.go:408
=============================================================================

INFO[2024-05-07 16:44:26]         register.go:413   Machine will be
automatically registered using the server token
INFO[2024-05-07 16:44:26]         register.go:414   Agent token is printed
above for informational purposes only
INFO[2024-05-07 16:44:26]         register.go:415   User does not need to
copy or verify the agent token during registration via the server token
INFO[2024-05-07 16:44:26]         register.go:416   It will be sent to the
server but it is not directly used in this type of machine registration
INFO[2024-05-07 16:44:26]         register.go:425   Try to register agent
in Stork Server
INFO[2024-05-07 16:44:26]         register.go:262   Machine registered

INFO[2024-05-07 16:44:26]         register.go:283   Stored agent-signed
cert and CA cert
ERRO[2024-05-07 16:44:26]         register.go:454   Retrying ping 1/3 due
to error                error="problem pinging machine: Cannot ping machine"
ERRO[2024-05-07 16:44:28]         register.go:454   Retrying ping 2/3 due
to error                error="problem pinging machine: Cannot ping machine"
ERRO[2024-05-07 16:44:32]         register.go:459   Cannot ping machine
                      error="problem pinging machine: Cannot ping machine"
FATA[2024-05-07 16:44:32]             main.go:217   Registration failed


I did try to add the --server-token flag but the net result is the same

root at server-kea-node1:/home/ace# sudo su stork-agent -s /bin/sh -c
'stork-agent register --server-url http://172.17.129.133:8080
--server-token OQYuMxkWmc3dySolt6uytLY4NrSkLWpo'
>>>> IP address or FQDN of the host with Stork Agent (for the Stork Server
connection) [server-kea-node1]: 172.17.129.130
>>>> Port number that Stork Agent will listen on [8080]:
INFO[2024-05-07 16:46:52]         register.go:84    Forced agent
certificates regeneration.
INFO[2024-05-07 16:46:52]         register.go:406
=============================================================================

INFO[2024-05-07 16:46:52]         register.go:407   AGENT TOKEN:
D43AA9AA37F03B1D24A0ADC9CB23E4137FCC284429A1CC87AE397CC78E3DE4FC
INFO[2024-05-07 16:46:52]         register.go:408
=============================================================================

INFO[2024-05-07 16:46:52]         register.go:413   Machine will be
automatically registered using the server token
INFO[2024-05-07 16:46:52]         register.go:414   Agent token is printed
above for informational purposes only
INFO[2024-05-07 16:46:52]         register.go:415   User does not need to
copy or verify the agent token during registration via the server token
INFO[2024-05-07 16:46:52]         register.go:416   It will be sent to the
server but it is not directly used in this type of machine registration
INFO[2024-05-07 16:46:52]         register.go:425   Try to register agent
in Stork Server
INFO[2024-05-07 16:46:52]         register.go:262   Machine registered

INFO[2024-05-07 16:46:52]         register.go:283   Stored agent-signed
cert and CA cert
ERRO[2024-05-07 16:46:52]         register.go:454   Retrying ping 1/3 due
to error                error="problem pinging machine: Cannot ping machine"
ERRO[2024-05-07 16:46:54]         register.go:454   Retrying ping 2/3 due
to error                error="problem pinging machine: Cannot ping machine"
ERRO[2024-05-07 16:46:58]         register.go:459   Cannot ping machine
                      error="problem pinging machine: Cannot ping machine"
FATA[2024-05-07 16:46:58]             main.go:217   Registration failed

Regards

Marek

On Tue, May 7, 2024 at 10:38 AM Slawek Figiel <slawek at isc.org> wrote:

> Marek,
>
> it is interesting case. But don't worry I'm sure we will find the cause
> of the problem soon.
>
> I see you performed the manual registration using the "register"
> command. Could you use this command again, but this time provide the
> `--server-token` flag? Your server token is on the machines page.
>
> An additional check is performed when the `--server-token` flag is used.
> After the successful registration, the server sends the Ping request
> over the GRPC protocol to the agent. It verifies whether the provided
> agent host is accessible from the server machine.
>
> If the operation fails, you must check your network configuration and
> the IP address provided as the agent host.
>
> I'm waiting for your feedback.
>
> Regards,
> Slawek
>
> On 07/05/2024 18:25, Marek Hajduczenia wrote:
> > Inline, please, with [mh0507] tags
> >
> > -----Original Message-----
> > From: Slawek Figiel <slawek at isc.org>
> > Sent: Tuesday, May 7, 2024 10:21 AM
> > To: Marek Hajduczenia <mxhajduczenia at gmail.com>
> > Cc: stork-users at lists.isc.org
> > Subject: Re: [stork-users] Removing a node and re-adding it back causes
> a certificate error
> >
> > Marek,
> >
> >   >     That has not solved my problem. I went through the following
> process
> >   >
> >   >     1. Remove the previous registration for .130 machine at Stork GUI
> >   >     (Action > Remove)
> >   >     2. Remove all content from /var/lib/stork-agen/certs and
> >   >     /var/lib/stork-agen/tokens
> >   >     3. Re-run registration
> >
> > Did you re-authorize the machine? (Machines => Unathorized => Click the
> Authorize button). I suppose yes but I would like to double-check.
> >
> > [mh0507] Correct, I did re-authorize the machine, that is part of the
> standard work flow covered in the documentation for Stork.
> >
> >   > I am back where I was
> >
> > Hmm... Could you verify if the Stork server and Stork agent versions are
> the same? You can check them by `stork-server --version` and `stork-agent
> --version` commands.
> >
> > [mh0507] As requested, they are both on 1.16.0 as shown below.
> >
> > root at server-kea-control:/etc/stork# stork-server --version
> > 1.16.0
> >
> > root at server-kea-node1:/var/lib/stork-agent# stork-agent --version
> > 1.16.0
> >
> > Slawek
> >
> > On 07/05/2024 16:23, Marek Hajduczenia wrote:
> >> The certs have been regenerated on the node, for what it is worth
> >>
> >> root at server-kea-node1:/var/lib/stork-agent# ls -lah certs/ total 20K
> >> drwx------ 2 stork-agent root        4.0K May  7 11:47 .
> >> drwxr-xr-x 4 stork-agent root        4.0K May  6 19:08 ..
> >> -rw------- 1 stork-agent stork-agent  664 May  7 11:47 ca.pem
> >> -rw------- 1 stork-agent stork-agent  656 May  7 11:47 cert.pem
> >> -rw------- 1 stork-agent stork-agent  241 May  7 11:47 key.pem
> >>
> >> but it seems that the Stork Server side is holding onto old certs? Not
> >> sure where they would be stored - likely in the backend DB, but I do
> >> not want to delete things at random.
> >>
> >> Regards
> >>
> >> Marek
> >>
> >> On Tue, May 7, 2024 at 5:56 AM Marek Hajduczenia
> >> <mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com>> wrote:
> >>
> >>      Hi Slawek,
> >>
> >>      That has not solved my problem. I went through the following
> >> process
> >>
> >>      1. Remove the previous registration for .130 machine at Stork GUI
> >>      (Action > Remove)
> >>      2. Remove all content from /var/lib/stork-agen/certs and
> >>      /var/lib/stork-agen/tokens
> >>      3. Re-run registration
> >>
> >>      root at server-kea-node1:/var/lib/stork-agent/tokens# sudo su
> >>      stork-agent -s /bin/sh -c 'stork-agent register --server-url
> >>      http://172.17.129.251:8080 <http://172.17.129.251:8080>'
> >>       >>>> Server access token (optional):
> >>       >>>> IP address or FQDN of the host with Stork Agent (for the
> Stork
> >>      Server connection) [server-kea-node1]: 172.17.129.130
> >>       >>>> Port number that Stork Agent will listen on [8080]:
> >>      INFO[2024-05-07 11:47:14]         register.go:81    There are no
> >>      agent certificates - they will be generated.
> >>      INFO[2024-05-07 11:47:14]         register.go:406
> >>
> =============================================================================
> >>      INFO[2024-05-07 11:47:14]         register.go:407   AGENT TOKEN:
> >>      B777710F0547C3EA237002537E4B18202F888F4D0F6C2C00BA105167DE1688CE
> >>      INFO[2024-05-07 11:47:14]         register.go:408
> >>
> =============================================================================
> >>      INFO[2024-05-07 11:47:14]         register.go:411   Authorize the
> >>      machine in the Stork web UI
> >>      INFO[2024-05-07 11:47:14]         register.go:425   Try to register
> >>      agent in Stork Server
> >>      INFO[2024-05-07 11:47:14]         register.go:262   Machine
> registered
> >>      INFO[2024-05-07 11:47:14]         register.go:283   Stored
> >>      agent-signed cert and CA cert
> >>      INFO[2024-05-07 11:47:14]             main.go:215   Registration
> >>      completed successfully
> >>
> >>      4. I am back where I was
> >>
> >>      image.png
> >>
> >>      I did restart the local Stork agent but that did not change
> >> anything
> >>
> >>      root at server-kea-node1:/var/lib/stork-agent/tokens# service
> >>      isc-kea-ctrl-agent restart
> >>      root at server-kea-node1:/var/lib/stork-agent/tokens# service
> >>      isc-kea-ctrl-agent status
> >>      ● isc-kea-ctrl-agent.service - Kea Control Agent
> >>            Loaded: loaded
> >>      (/lib/systemd/system/isc-kea-ctrl-agent.service; enabled; vendor
> >>      preset: enabled)
> >>            Active: active (running) since Tue 2024-05-07 11:50:16 UTC;
> 3s ago
> >>              Docs: man:kea-ctrl-agent(8)
> >>          Main PID: 10543 (kea-ctrl-agent)
> >>             Tasks: 5 (limit: 9343)
> >>            Memory: 1.4M
> >>               CPU: 7ms
> >>            CGroup: /system.slice/isc-kea-ctrl-agent.service
> >>                    └─10543 /usr/sbin/kea-ctrl-agent -c
> >>      /etc/kea/kea-ctrl-agent.conf
> >>
> >>      May 07 11:50:16 server-kea-node1 systemd[1]:
> >>      isc-kea-ctrl-agent.service: Deactivated successfully.
> >>      May 07 11:50:16 server-kea-node1 systemd[1]: Stopped Kea Control
> Agent.
> >>      May 07 11:50:16 server-kea-node1 systemd[1]:
> >>      isc-kea-ctrl-agent.service: Consumed 48.595s CPU time.
> >>      May 07 11:50:16 server-kea-node1 systemd[1]: Started Kea Control
> Agent.
> >>
> >>      For what is worth, the message in the logs has changed
> >>
> >>      May  7 11:54:39 server-kea-control stork-server[719]:
> >>      time="2024-05-07 11:54:39" level="info" msg="Completed pulling
> lease
> >>      stats from Kea apps: 0/1 succeeded" file="      statspuller.go:71
>  "
> >>      May  7 11:54:39 server-kea-control stork-server[719]:
> >>      time="2024-05-07 11:54:39" level="warning" msg="rpc error: code =
> >>      Unavailable desc = connection error: desc = \"error reading server
> >>      preface: remote error: tls: bad certificate\"" file="
> >>        manager.go:124  " agent="172.17.129.130:8080
> >>      <http://172.17.129.130:8080>"
> >>      May  7 11:54:39 server-kea-control stork-server[719]:
> >>      time="2024-05-07 11:54:39" level="warning" msg="Failed to get state
> >>      from the Stork agent; the agent is still not responding" file="
> >>             grpcli.go:326  " agent="172.17.129.130:8080
> >>      <http://172.17.129.130:8080>"
> >>      May  7 11:54:39 server-kea-control stork-server[719]:
> >>      time="2024-05-07 11:54:39" level="warning" msg="failed to get state
> >>      from agent 172.17.129.130:8080 <http://172.17.129.130:8080>: grpc
> >>      manager is unable to re-establish connection with the agent
> >>      172.17.129.130:8080 <http://172.17.129.130:8080>: rpc error: code
> =
> >>      Unavailable desc = connection error: desc = \"error reading server
> >>      preface: remote error: tls: bad certificate\"" file="
> >>        statepuller.go:247  "
> >>
> >>      Not sure whether it for the better or worse
> >>
> >>      Regards
> >>
> >>      Marek
> >>
> >>      On Tue, May 7, 2024 at 4:06 AM Slawek Figiel <slawek at isc.org
> >>      <mailto:slawek at isc.org>> wrote:
> >>
> >>          Hello Marek!
> >>
> >>          Stork server reports that the agent introduced itself with a
> "bad
> >>          certificate." Several reasons may cause it. I think you should
> >>          remove
> >>          the existing cert files and re-register the agent. Please do
> the
> >>          following steps:
> >>
> >>          1. On the agent machine, remove the files in the
> >>          `/var/lib/stork-agent`
> >>          directory (you need to remove all files from the `certs` and
> >>          `tokens`
> >>          subdirectories)
> >>          2. If you manually registered the agent (by the `register`
> >>          command, you
> >>          need to call it again and restart the agent. If you used the
> >>          self-registration flow, just restart the agent.
> >>          3. Open the Stork UI, go to the machines list, switch to the
> >>          "Unauthorized" tab, and re-authorize the agent.
> >>
> >>          I hope it'll solve your problem.
> >>          Don't hesitate to ask for more details if you have any
> questions.
> >>
> >>          Regards,
> >>          Slawek Figiel
> >>
> >>          On 07/05/2024 00:05, mxhajduczenia at gmail.com
> >>          <mailto:mxhajduczenia at gmail.com> wrote:
> >>           > Dear Forum,
> >>           >
> >>           > I had two nodes added to Stork: .130 and .131 and they were
> >>          working
> >>           > correctly. Node .130 had a kernel failure due to changes I
> >>          was trying to
> >>           > apply and I did not make a copy, unfortunately. Long story
> >>          short, I had
> >>           > to re-install node .130, and then I wanted to add it back to
> >>          Stork
> >>           >
> >>           > No matter what I do, I am getting the error shown above,
> >>          i.e., Cannot
> >>           > get state of machine.
> >>           >
> >>           > Syslog review shows only one error message following two
> >>          warning messages.
> >>           >
> >>           > May  6 21:58:38 server-kea-control stork-server[719]:
> >>          time="2024-05-06
> >>           > 21:58:38" level="warning" msg="rpc error: code = Unavailable
> >>          desc =
> >>           > connection error: desc = \"error reading server preface:
> >>          remote error:
> >>           > tls: bad certificate\"" file="          manager.go:124  "
> >>           > agent="172.17.129.130:8080 <http://172.17.129.130:8080>"
> >>           >
> >>           > May  6 21:58:38 server-kea-control stork-server[719]:
> >>          time="2024-05-06
> >>           > 21:58:38" level="warning" msg="Failed to get state from the
> >>          Stork agent;
> >>           > the agent is still not responding" file="
> >>          grpcli.go:326  "
> >>           > agent="172.17.129.130:8080 <http://172.17.129.130:8080>"
> >>           >
> >>           > May  6 21:58:38 server-kea-control stork-server[719]:
> >>          time="2024-05-06
> >>           > 21:58:38" level="warning" msg="failed to get state from
> agent
> >>           > 172.17.129.130:8080 <http://172.17.129.130:8080>: grpc
> >>          manager is unable to re-establish connection
> >>           > with the agent 172.17.129.130:8080
> >>          <http://172.17.129.130:8080>: rpc error: code = Unavailable
> desc =
> >>           > connection error: desc = \"error reading server preface:
> >>          remote error:
> >>           > tls: bad certificate\"" file="      statepuller.go:247  "
> >>           >
> >>           > I suspect that the TLS certificate does to get cleared when
> >>          the machine
> >>           > is removed and a machine with the same IP address is
> re-added.
> >>           >
> >>           > I did not find a remedy for it for now and I do not fancy a
> >>          complete
> >>           > re-install of Stork if I can avoid it.
> >>           >
> >>           > Any suggestions on how to fix it?
> >>           >
> >>           > Regards
> >>           >
> >>           > Marek
> >>           >
> >>           >
> >>          --
> >>          Stork-users mailing list
> >>          Stork-users at lists.isc.org <mailto:Stork-users at lists.isc.org>
> >>          https://lists.isc.org/mailman/listinfo/stork-users
> >>          <https://lists.isc.org/mailman/listinfo/stork-users>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/stork-users/attachments/20240507/40927136/attachment-0001.htm>


More information about the Stork-users mailing list