Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add continuous testing for Scaleway server + OS update combinations #12

Open
abitrolly opened this issue Feb 4, 2019 · 12 comments
Open

Comments

@abitrolly
Copy link
Member

Scaleway ARM64-2GB with Ubuntu 18.04 hang after updating and rebooting (unattended, with Ansible, or just pain apt-get). This is repeatable. It could have been avoided with continuous testing scenario for this case. This ticket is to confirm that Scaleway does this testing and implement that otherwise.

@abitrolly
Copy link
Member Author

@abitrolly
Copy link
Member Author

abitrolly commented May 28, 2019

Finally got time to troubleshoot this. Switched to rescue bootstrap script.

✗ podman run -it --rm --volume=$HOME/.scwrc:/.scwrc:Z scaleway/cli --region=ams1 inspect -f "{{.Bootscript.Title}}" server:xe3
arm64 rescue

@abitrolly
Copy link
Member Author

Connected to xe3 instance. Volumes are different from the article.

root@xe3:~# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda     253:0    0 46.6G  0 disk 
├─vda15 253:15   0  100M  0 part 
└─vda1  253:1    0 46.5G  0 part 

@abitrolly
Copy link
Member Author

Mounted volumes.

mkdir -p /mnt/volume1
mount /dev/vda1 /mnt/volume1
mkdir -p /mnt/volume15
mount /dev/vda15 /mnt/volume15

@abitrolly
Copy link
Member Author

/dev/vda15 is just a boot volume.

ls -lR volume15
volume15:
total 1
drwxr-xr-x 3 root root 512 Mar  5 10:14 EFI

volume15/EFI:
total 1
drwxr-xr-x 2 root root 512 Mar  5 10:16 BOOT

volume15/EFI/BOOT:
total 120
-rwxr-xr-x 1 root root 122880 Mar  5 10:16 BOOTAA64.EFI

@abitrolly
Copy link
Member Author

Inspecting last logs.

/var/log# ls -lat
total 1832
-rw-rw-r--   1 root   utmp              19200 May 12 14:54 wtmp
-rw-rw-r--   1 root   utmp             296296 May 12 14:54 lastlog
-rw-rw----   1 root   utmp            1091600 May 12 14:54 btmp
-rw-r--r--   1 root   root             533976 May 12 14:18 dpkg.log
-rw-r--r--   1 root   root              16671 May 12 14:16 alternatives.log
drwxr-xr-x   2 root   root               4096 May 12 14:10 apt
drwxr-x---   2 root   adm                4096 May 12 06:52 unattended-upgrades
drwxr-xr-x   2 syslog syslog             4096 May 12 01:42 landscape
-rw-r--r--   1 root   adm               93777 May 12 01:21 cloud-init.log
-rw-r--r--   1 root   root               4593 May 12 01:21 cloud-init-output.log
-rw-------   1 root   root              64064 May 12 01:20 tallylog
-rw-r--r--   1 root   root              32032 May 12 01:20 faillog
drwxr-xr-x   8 root   root               4096 May 12 01:20 .
drwxr-sr-x+  3 root   systemd-journal    4096 May 12 01:20 journal
drwxr-xr-x  13 root   root               4096 Mar  5 09:50 ..
drwxr-xr-x   2 root   root               4096 Jan 16 23:53 dist-upgrade
drwxr-xr-x   2 root   root               4096 Nov 23  2018 lxd

@abitrolly
Copy link
Member Author

No messages, no dmesg...

@abitrolly
Copy link
Member Author

lastlog won't help, because it is just a log of logins.

@abitrolly
Copy link
Member Author

Parsing wtmp and btmp just in case..

# last -f btmp
root     ssh:notty    218.92.0.207     Sun May 12 14:54    gone - no logout
root     ssh:notty    218.92.0.207     Sun May 12 14:54 - 14:54  (00:00)
root     ssh:notty    218.92.0.207     Sun May 12 14:54 - 14:54  (00:00)
root     ssh:notty    218.92.0.207     Sun May 12 14:53 - 14:54  (00:00)
...
# last -f wtmp 
root     pts/0        x.x.x.127    Sun May 12 14:54 - down   (00:00)
ubuntu   pts/0        x.x.x.127    Sun May 12 14:54 - 14:54  (00:00)
root     pts/0        x.x.x.127    Sun May 12 14:49 - 14:49  (00:00)
...

@abitrolly
Copy link
Member Author

/var/log/journal contains systemd logs, but rescue image can not read them.

# journalctl -D journal
Journal file journal/71e7aa5b46f048658dbfde2a92c24320/system.journal uses an unsupported feature, ignoring file.
-- No entries --

# cat /etc/os-release | grep VERSION=
VERSION="16.04.2 LTS (Xenial Xerus)"

Sent logs through https://transfer.sh

tar -czf - /var/log/journal | curl --upload-file - https://transfer.sh/journal.tar.gz

@abitrolly
Copy link
Member Author

Server rebooted and never woke up. Last lines from journalctl.

$ journalctl -D journal 
...
May 12 17:54:34 xe3 systemd[1]: Reached target Final Step.
May 12 17:54:34 xe3 systemd[1]: Starting Reboot...
May 12 17:54:34 xe3 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
May 12 17:54:34 xe3 systemd[1]: Stopping LVM2 metadata daemon...
May 12 17:54:34 xe3 systemd[1]: Stopped LVM2 metadata daemon.
May 12 17:54:34 xe3 systemd[1]: Shutting down.
May 12 17:54:34 xe3 systemd-shutdown[1]: Syncing filesystems and block devices.
May 12 17:54:34 xe3 systemd-shutdown[1]: Sending SIGTERM to remaining processes...
May 12 17:54:34 xe3 systemd-journald[7690]: Journal stopped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant