[stork-users] Removing a node and re-adding it back causes a certificate error
Slawek Figiel
slawek at isc.org
Wed May 8 13:43:40 UTC 2024
Marek,
your logs show the connection from the Stork agent (172.17.129.130) to
the Stork server (172.17.129.133) is established properly. The problem
is the Stork server (172.17.129.133) cannot reach the Stork agent
(172.17.129.130).
Please, try to perform the below tests:
1. From the 172.17.129.133 host ping the 172.17.129.130 host. Does it work?
2. From the 172.17.129.133 open/fetch the
http://172.17.129.130:9547/metrics . Does it return HTTP 200 OK status
and some metrics? If you specified the "--listen-stork-only" flag (or
"STORK_AGENT_LISTEN_STORK_ONLY" environment variable), remove it
temporarily.
Please verify the ports opened by your containers/VMs (default values,
adjust them if you specified the custom ones in configuration):
- Stork server: 8080 (HTTP)
- Stork agent: 8080 (GRPC), 9547 (HTTP)
Regards,
Slawek
On 07/05/2024 18:47, Marek Hajduczenia wrote:
> I did go with the recommendation and even though I am 100% sure I have
> IP reachability, the registration process with server token fails.
>
> root at server-kea-node1:/home/ace# ping 172.17.129.133
> PING 172.17.129.133 (172.17.129.133) 56(84) bytes of data.
> 64 bytes from 172.17.129.133 <http://172.17.129.133>: icmp_seq=1 ttl=64
> time=0.074 ms
> 64 bytes from 172.17.129.133 <http://172.17.129.133>: icmp_seq=2 ttl=64
> time=0.063 ms
> 64 bytes from 172.17.129.133 <http://172.17.129.133>: icmp_seq=3 ttl=64
> time=0.147 ms
> ^C
> --- 172.17.129.133 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2054ms
> rtt min/avg/max/mdev = 0.063/0.094/0.147/0.037 ms
> root at server-kea-node1:/home/ace# sudo su stork-agent -s /bin/sh -c
> 'stork-agent register --server-url http://172.17.129.133:8080
> <http://172.17.129.133:8080>'
> >>>> Server access token (optional):
> >>>> IP address or FQDN of the host with Stork Agent (for the Stork
> Server connection) [server-kea-node1]: 172.17.129.130
> >>>> Port number that Stork Agent will listen on [8080]:
> INFO[2024-05-07 16:44:26] register.go:84 Forced agent
> certificates regeneration.
> INFO[2024-05-07 16:44:26] register.go:406
> =============================================================================
> INFO[2024-05-07 16:44:26] register.go:407 AGENT TOKEN:
> E9EE6D836E249B0E9A8898E638DECFCAD35A6577A70672E8F639D4A46CEBC211
> INFO[2024-05-07 16:44:26] register.go:408
> =============================================================================
> INFO[2024-05-07 16:44:26] register.go:413 Machine will be
> automatically registered using the server token
> INFO[2024-05-07 16:44:26] register.go:414 Agent token is
> printed above for informational purposes only
> INFO[2024-05-07 16:44:26] register.go:415 User does not need
> to copy or verify the agent token during registration via the server token
> INFO[2024-05-07 16:44:26] register.go:416 It will be sent to
> the server but it is not directly used in this type of machine registration
> INFO[2024-05-07 16:44:26] register.go:425 Try to register
> agent in Stork Server
> INFO[2024-05-07 16:44:26] register.go:262 Machine registered
> INFO[2024-05-07 16:44:26] register.go:283 Stored agent-signed
> cert and CA cert
> ERRO[2024-05-07 16:44:26] register.go:454 Retrying ping 1/3
> due to error error="problem pinging machine: Cannot ping
> machine"
> ERRO[2024-05-07 16:44:28] register.go:454 Retrying ping 2/3
> due to error error="problem pinging machine: Cannot ping
> machine"
> ERRO[2024-05-07 16:44:32] register.go:459 Cannot ping machine
> error="problem pinging machine: Cannot ping
> machine"
> FATA[2024-05-07 16:44:32] main.go:217 Registration failed
>
> I did try to add the --server-token flag but the net result is the same
>
> root at server-kea-node1:/home/ace# sudo su stork-agent -s /bin/sh -c
> 'stork-agent register --server-url http://172.17.129.133:8080
> <http://172.17.129.133:8080> --server-token
> OQYuMxkWmc3dySolt6uytLY4NrSkLWpo'
> >>>> IP address or FQDN of the host with Stork Agent (for the Stork
> Server connection) [server-kea-node1]: 172.17.129.130
> >>>> Port number that Stork Agent will listen on [8080]:
> INFO[2024-05-07 16:46:52] register.go:84 Forced agent
> certificates regeneration.
> INFO[2024-05-07 16:46:52] register.go:406
> =============================================================================
> INFO[2024-05-07 16:46:52] register.go:407 AGENT TOKEN:
> D43AA9AA37F03B1D24A0ADC9CB23E4137FCC284429A1CC87AE397CC78E3DE4FC
> INFO[2024-05-07 16:46:52] register.go:408
> =============================================================================
> INFO[2024-05-07 16:46:52] register.go:413 Machine will be
> automatically registered using the server token
> INFO[2024-05-07 16:46:52] register.go:414 Agent token is
> printed above for informational purposes only
> INFO[2024-05-07 16:46:52] register.go:415 User does not need
> to copy or verify the agent token during registration via the server token
> INFO[2024-05-07 16:46:52] register.go:416 It will be sent to
> the server but it is not directly used in this type of machine registration
> INFO[2024-05-07 16:46:52] register.go:425 Try to register
> agent in Stork Server
> INFO[2024-05-07 16:46:52] register.go:262 Machine registered
> INFO[2024-05-07 16:46:52] register.go:283 Stored agent-signed
> cert and CA cert
> ERRO[2024-05-07 16:46:52] register.go:454 Retrying ping 1/3
> due to error error="problem pinging machine: Cannot ping
> machine"
> ERRO[2024-05-07 16:46:54] register.go:454 Retrying ping 2/3
> due to error error="problem pinging machine: Cannot ping
> machine"
> ERRO[2024-05-07 16:46:58] register.go:459 Cannot ping machine
> error="problem pinging machine: Cannot ping
> machine"
> FATA[2024-05-07 16:46:58] main.go:217 Registration failed
>
> Regards
>
> Marek
>
> On Tue, May 7, 2024 at 10:38 AM Slawek Figiel <slawek at isc.org
> <mailto:slawek at isc.org>> wrote:
>
> Marek,
>
> it is interesting case. But don't worry I'm sure we will find the cause
> of the problem soon.
>
> I see you performed the manual registration using the "register"
> command. Could you use this command again, but this time provide the
> `--server-token` flag? Your server token is on the machines page.
>
> An additional check is performed when the `--server-token` flag is
> used.
> After the successful registration, the server sends the Ping request
> over the GRPC protocol to the agent. It verifies whether the provided
> agent host is accessible from the server machine.
>
> If the operation fails, you must check your network configuration and
> the IP address provided as the agent host.
>
> I'm waiting for your feedback.
>
> Regards,
> Slawek
>
> On 07/05/2024 18:25, Marek Hajduczenia wrote:
> > Inline, please, with [mh0507] tags
> >
> > -----Original Message-----
> > From: Slawek Figiel <slawek at isc.org <mailto:slawek at isc.org>>
> > Sent: Tuesday, May 7, 2024 10:21 AM
> > To: Marek Hajduczenia <mxhajduczenia at gmail.com
> <mailto:mxhajduczenia at gmail.com>>
> > Cc: stork-users at lists.isc.org <mailto:stork-users at lists.isc.org>
> > Subject: Re: [stork-users] Removing a node and re-adding it back
> causes a certificate error
> >
> > Marek,
> >
> > > That has not solved my problem. I went through the
> following process
> > >
> > > 1. Remove the previous registration for .130 machine at
> Stork GUI
> > > (Action > Remove)
> > > 2. Remove all content from /var/lib/stork-agen/certs and
> > > /var/lib/stork-agen/tokens
> > > 3. Re-run registration
> >
> > Did you re-authorize the machine? (Machines => Unathorized =>
> Click the Authorize button). I suppose yes but I would like to
> double-check.
> >
> > [mh0507] Correct, I did re-authorize the machine, that is part of
> the standard work flow covered in the documentation for Stork.
> >
> > > I am back where I was
> >
> > Hmm... Could you verify if the Stork server and Stork agent
> versions are the same? You can check them by `stork-server
> --version` and `stork-agent --version` commands.
> >
> > [mh0507] As requested, they are both on 1.16.0 as shown below.
> >
> > root at server-kea-control:/etc/stork# stork-server --version
> > 1.16.0
> >
> > root at server-kea-node1:/var/lib/stork-agent# stork-agent --version
> > 1.16.0
> >
> > Slawek
> >
> > On 07/05/2024 16:23, Marek Hajduczenia wrote:
> >> The certs have been regenerated on the node, for what it is worth
> >>
> >> root at server-kea-node1:/var/lib/stork-agent# ls -lah certs/ total 20K
> >> drwx------ 2 stork-agent root 4.0K May 7 11:47 .
> >> drwxr-xr-x 4 stork-agent root 4.0K May 6 19:08 ..
> >> -rw------- 1 stork-agent stork-agent 664 May 7 11:47 ca.pem
> >> -rw------- 1 stork-agent stork-agent 656 May 7 11:47 cert.pem
> >> -rw------- 1 stork-agent stork-agent 241 May 7 11:47 key.pem
> >>
> >> but it seems that the Stork Server side is holding onto old
> certs? Not
> >> sure where they would be stored - likely in the backend DB, but I do
> >> not want to delete things at random.
> >>
> >> Regards
> >>
> >> Marek
> >>
> >> On Tue, May 7, 2024 at 5:56 AM Marek Hajduczenia
> >> <mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com>
> <mailto:mxhajduczenia at gmail.com <mailto:mxhajduczenia at gmail.com>>>
> wrote:
> >>
> >> Hi Slawek,
> >>
> >> That has not solved my problem. I went through the following
> >> process
> >>
> >> 1. Remove the previous registration for .130 machine at
> Stork GUI
> >> (Action > Remove)
> >> 2. Remove all content from /var/lib/stork-agen/certs and
> >> /var/lib/stork-agen/tokens
> >> 3. Re-run registration
> >>
> >> root at server-kea-node1:/var/lib/stork-agent/tokens# sudo su
> >> stork-agent -s /bin/sh -c 'stork-agent register --server-url
> >> http://172.17.129.251:8080 <http://172.17.129.251:8080>
> <http://172.17.129.251:8080 <http://172.17.129.251:8080>>'
> >> >>>> Server access token (optional):
> >> >>>> IP address or FQDN of the host with Stork Agent (for
> the Stork
> >> Server connection) [server-kea-node1]: 172.17.129.130
> >> >>>> Port number that Stork Agent will listen on [8080]:
> >> INFO[2024-05-07 11:47:14] register.go:81 There
> are no
> >> agent certificates - they will be generated.
> >> INFO[2024-05-07 11:47:14] register.go:406
> >>
> =============================================================================
> >> INFO[2024-05-07 11:47:14] register.go:407 AGENT
> TOKEN:
> >>
> B777710F0547C3EA237002537E4B18202F888F4D0F6C2C00BA105167DE1688CE
> >> INFO[2024-05-07 11:47:14] register.go:408
> >>
> =============================================================================
> >> INFO[2024-05-07 11:47:14] register.go:411
> Authorize the
> >> machine in the Stork web UI
> >> INFO[2024-05-07 11:47:14] register.go:425 Try to
> register
> >> agent in Stork Server
> >> INFO[2024-05-07 11:47:14] register.go:262 Machine
> registered
> >> INFO[2024-05-07 11:47:14] register.go:283 Stored
> >> agent-signed cert and CA cert
> >> INFO[2024-05-07 11:47:14] main.go:215
> Registration
> >> completed successfully
> >>
> >> 4. I am back where I was
> >>
> >> image.png
> >>
> >> I did restart the local Stork agent but that did not change
> >> anything
> >>
> >> root at server-kea-node1:/var/lib/stork-agent/tokens# service
> >> isc-kea-ctrl-agent restart
> >> root at server-kea-node1:/var/lib/stork-agent/tokens# service
> >> isc-kea-ctrl-agent status
> >> ● isc-kea-ctrl-agent.service - Kea Control Agent
> >> Loaded: loaded
> >> (/lib/systemd/system/isc-kea-ctrl-agent.service; enabled;
> vendor
> >> preset: enabled)
> >> Active: active (running) since Tue 2024-05-07
> 11:50:16 UTC; 3s ago
> >> Docs: man:kea-ctrl-agent(8)
> >> Main PID: 10543 (kea-ctrl-agent)
> >> Tasks: 5 (limit: 9343)
> >> Memory: 1.4M
> >> CPU: 7ms
> >> CGroup: /system.slice/isc-kea-ctrl-agent.service
> >> └─10543 /usr/sbin/kea-ctrl-agent -c
> >> /etc/kea/kea-ctrl-agent.conf
> >>
> >> May 07 11:50:16 server-kea-node1 systemd[1]:
> >> isc-kea-ctrl-agent.service: Deactivated successfully.
> >> May 07 11:50:16 server-kea-node1 systemd[1]: Stopped Kea
> Control Agent.
> >> May 07 11:50:16 server-kea-node1 systemd[1]:
> >> isc-kea-ctrl-agent.service: Consumed 48.595s CPU time.
> >> May 07 11:50:16 server-kea-node1 systemd[1]: Started Kea
> Control Agent.
> >>
> >> For what is worth, the message in the logs has changed
> >>
> >> May 7 11:54:39 server-kea-control stork-server[719]:
> >> time="2024-05-07 11:54:39" level="info" msg="Completed
> pulling lease
> >> stats from Kea apps: 0/1 succeeded" file="
> statspuller.go:71 "
> >> May 7 11:54:39 server-kea-control stork-server[719]:
> >> time="2024-05-07 11:54:39" level="warning" msg="rpc error:
> code =
> >> Unavailable desc = connection error: desc = \"error reading
> server
> >> preface: remote error: tls: bad certificate\"" file="
> >> manager.go:124 " agent="172.17.129.130:8080
> <http://172.17.129.130:8080>
> >> <http://172.17.129.130:8080 <http://172.17.129.130:8080>>"
> >> May 7 11:54:39 server-kea-control stork-server[719]:
> >> time="2024-05-07 11:54:39" level="warning" msg="Failed to
> get state
> >> from the Stork agent; the agent is still not responding" file="
> >> grpcli.go:326 " agent="172.17.129.130:8080
> <http://172.17.129.130:8080>
> >> <http://172.17.129.130:8080 <http://172.17.129.130:8080>>"
> >> May 7 11:54:39 server-kea-control stork-server[719]:
> >> time="2024-05-07 11:54:39" level="warning" msg="failed to
> get state
> >> from agent 172.17.129.130:8080 <http://172.17.129.130:8080>
> <http://172.17.129.130:8080 <http://172.17.129.130:8080>>: grpc
> >> manager is unable to re-establish connection with the agent
> >> 172.17.129.130:8080 <http://172.17.129.130:8080>
> <http://172.17.129.130:8080 <http://172.17.129.130:8080>>: rpc
> error: code =
> >> Unavailable desc = connection error: desc = \"error reading
> server
> >> preface: remote error: tls: bad certificate\"" file="
> >> statepuller.go:247 "
> >>
> >> Not sure whether it for the better or worse
> >>
> >> Regards
> >>
> >> Marek
> >>
> >> On Tue, May 7, 2024 at 4:06 AM Slawek Figiel
> <slawek at isc.org <mailto:slawek at isc.org>
> >> <mailto:slawek at isc.org <mailto:slawek at isc.org>>> wrote:
> >>
> >> Hello Marek!
> >>
> >> Stork server reports that the agent introduced itself
> with a "bad
> >> certificate." Several reasons may cause it. I think you
> should
> >> remove
> >> the existing cert files and re-register the agent.
> Please do the
> >> following steps:
> >>
> >> 1. On the agent machine, remove the files in the
> >> `/var/lib/stork-agent`
> >> directory (you need to remove all files from the
> `certs` and
> >> `tokens`
> >> subdirectories)
> >> 2. If you manually registered the agent (by the `register`
> >> command, you
> >> need to call it again and restart the agent. If you
> used the
> >> self-registration flow, just restart the agent.
> >> 3. Open the Stork UI, go to the machines list, switch
> to the
> >> "Unauthorized" tab, and re-authorize the agent.
> >>
> >> I hope it'll solve your problem.
> >> Don't hesitate to ask for more details if you have any
> questions.
> >>
> >> Regards,
> >> Slawek Figiel
> >>
> >> On 07/05/2024 00:05, mxhajduczenia at gmail.com
> <mailto:mxhajduczenia at gmail.com>
> >> <mailto:mxhajduczenia at gmail.com
> <mailto:mxhajduczenia at gmail.com>> wrote:
> >> > Dear Forum,
> >> >
> >> > I had two nodes added to Stork: .130 and .131 and
> they were
> >> working
> >> > correctly. Node .130 had a kernel failure due to
> changes I
> >> was trying to
> >> > apply and I did not make a copy, unfortunately. Long
> story
> >> short, I had
> >> > to re-install node .130, and then I wanted to add it
> back to
> >> Stork
> >> >
> >> > No matter what I do, I am getting the error shown above,
> >> i.e., Cannot
> >> > get state of machine.
> >> >
> >> > Syslog review shows only one error message following two
> >> warning messages.
> >> >
> >> > May 6 21:58:38 server-kea-control stork-server[719]:
> >> time="2024-05-06
> >> > 21:58:38" level="warning" msg="rpc error: code =
> Unavailable
> >> desc =
> >> > connection error: desc = \"error reading server preface:
> >> remote error:
> >> > tls: bad certificate\"" file="
> manager.go:124 "
> >> > agent="172.17.129.130:8080
> <http://172.17.129.130:8080> <http://172.17.129.130:8080
> <http://172.17.129.130:8080>>"
> >> >
> >> > May 6 21:58:38 server-kea-control stork-server[719]:
> >> time="2024-05-06
> >> > 21:58:38" level="warning" msg="Failed to get state
> from the
> >> Stork agent;
> >> > the agent is still not responding" file="
> >> grpcli.go:326 "
> >> > agent="172.17.129.130:8080
> <http://172.17.129.130:8080> <http://172.17.129.130:8080
> <http://172.17.129.130:8080>>"
> >> >
> >> > May 6 21:58:38 server-kea-control stork-server[719]:
> >> time="2024-05-06
> >> > 21:58:38" level="warning" msg="failed to get state
> from agent
> >> > 172.17.129.130:8080 <http://172.17.129.130:8080>
> <http://172.17.129.130:8080 <http://172.17.129.130:8080>>: grpc
> >> manager is unable to re-establish connection
> >> > with the agent 172.17.129.130:8080
> <http://172.17.129.130:8080>
> >> <http://172.17.129.130:8080
> <http://172.17.129.130:8080>>: rpc error: code = Unavailable desc =
> >> > connection error: desc = \"error reading server preface:
> >> remote error:
> >> > tls: bad certificate\"" file="
> statepuller.go:247 "
> >> >
> >> > I suspect that the TLS certificate does to get
> cleared when
> >> the machine
> >> > is removed and a machine with the same IP address is
> re-added.
> >> >
> >> > I did not find a remedy for it for now and I do not
> fancy a
> >> complete
> >> > re-install of Stork if I can avoid it.
> >> >
> >> > Any suggestions on how to fix it?
> >> >
> >> > Regards
> >> >
> >> > Marek
> >> >
> >> >
> >> --
> >> Stork-users mailing list
> >> Stork-users at lists.isc.org <mailto:Stork-users at lists.isc.org>
> <mailto:Stork-users at lists.isc.org <mailto:Stork-users at lists.isc.org>>
> >> https://lists.isc.org/mailman/listinfo/stork-users
> <https://lists.isc.org/mailman/listinfo/stork-users>
> >> <https://lists.isc.org/mailman/listinfo/stork-users
> <https://lists.isc.org/mailman/listinfo/stork-users>>
> >>
> >
>
More information about the Stork-users
mailing list