BALUG VM was down for fair while earlier today. Has now been up again for over 7 hours now.
Looks like there was an I/O hiccup on the physical host, which didn't particularly impact the physical hosts, but was enough of an interruption (delay) that the BALUG VM kernel paniced.
Did have a 3rd hard drive testing, etc. on the physical host at the time ... might've hit issues and possibly it did a bus reset? Who knows for sure. Anyway ...
Went down sometime after: 2018-09-02T01:27:36-07:00 and was brought back up around: 2018-09-02T13:39:30-07:00
Various bits I noted in log: $ curl -s --range 375155-378925 http://www.archive.balug.org/log.txt 2018-09-02 Michael Paoli host crashed sometime after: 2018-09-02T01:27:36-07:00 but probably before about: 2018-09-02T01:35:00-07:00 on console, we got: # [54894.969741] sd 0:0:0:0: [sda] tag#3 ABORT operation started [54900.078084] sd 0:0:0:0: ABORT operation timed-out. [54900.080312] sd 0:0:0:0: [sda] tag#2 ABORT operation started [54905.198438] sd 0:0:0:0: ABORT operation timed-out. [54905.200517] sd 0:0:0:0: [sda] tag#1 ABORT operation started [54905.357128] Kernel panic - not syncing: assertion "i && sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file "/build/linux-AcJpTp/linux-4.9.110/drivers/scsi/sym53c8xx_2/sym_hipd.c", line 3399 [54905.357128] [54905.367774] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4 [54905.370776] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [54905.372768] 0000000000000000 ffffffff84f31e54 ffff9e2f75d5a300 ffff9e2f7fc03e50 [54905.375471] ffffffff84d7f6ad 0000000000000020 ffff9e2f7fc03e60 ffff9e2f7fc03df8 [54905.378226] 3ea9db08406f9671 0000000100d04ae4 ffffffffc048a250 ffffffffc0489e80 [54905.380982] Call Trace: [54905.381867] <IRQ> [54905.382541] [<ffffffff84f31e54>] ? dump_stack+0x5c/0x78 [54905.384428] [<ffffffff84d7f6ad>] ? panic+0xe4/0x23f [54905.386164] [<ffffffffc048512e>] ? sym_interrupt+0x1c9e/0x1e80 [sym53c8xx] [54905.388543] [<ffffffffc03aa010>] ? usb_hcd_poll_rh_status+0x170/0x170 [usbcore] [54905.391102] [<ffffffffc03a9fc9>] ? usb_hcd_poll_rh_status+0x129/0x170 [usbcore] [54905.393627] [<ffffffffc03aa010>] ? usb_hcd_poll_rh_status+0x170/0x170 [usbcore] [54905.396144] [<ffffffff84ce7562>] ? call_timer_fn+0x32/0x120 [54905.398071] [<ffffffffc047ea4b>] ? sym53c8xx_intr+0x3b/0x70 [sym53c8xx] [54905.400386] [<ffffffff84cd418e>] ? __handle_irq_event_percpu+0x7e/0x1a0 [54905.402673] [<ffffffff84cd42e0>] ? handle_irq_event_percpu+0x30/0x70 [54905.404898] [<ffffffff84cd4359>] ? handle_irq_event+0x39/0x60 [54905.406901] [<ffffffff84cd7870>] ? handle_fasteoi_irq+0xa0/0x170 [54905.409001] [<ffffffff84c27faf>] ? handle_irq+0x1f/0x30 [54905.410834] [<ffffffff852187ee>] ? do_IRQ+0x4e/0xe0 [54905.412528] [<ffffffff85216556>] ? common_interrupt+0x96/0x96 [54905.414523] <EOI> [54905.415216] [<ffffffff852151f0>] ? __sched_text_end+0x1/0x1 [54905.417231] [<ffffffff852154c2>] ? native_safe_halt+0x2/0x10 [54905.419235] [<ffffffff8521520a>] ? default_idle+0x1a/0xd0 [54905.421137] [<ffffffff84cbc7da>] ? cpu_startup_entry+0x1ca/0x240 [54905.423215] [<ffffffff8593df5e>] ? start_kernel+0x447/0x467 [54905.425186] [<ffffffff8593d120>] ? early_idt_handler_array+0x120/0x120 [54905.427438] [<ffffffff8593d408>] ? x86_64_start_kernel+0x14c/0x170 [54905.429842] Kernel Offset: 0x3c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [54905.433484] ---[ end Kernel panic - not syncing: assertion "i && sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file "/build/linux-AcJpTp/linux-4.9.110/drivers/scsi/sym53c8xx_2/sym_hipd.c", line 3399 [54905.433484] ... also noted within that same timeframe, on physical host, there were some storage related events ... but no hard failues seen on that physical host and no outages or failures or such observed on that physical host: Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sda [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 63 to 69 Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 69 to 70 Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 31 to 30 Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sdb [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 63 to 66 $