BALUG-Talk November 2024

balug-talk@lists.balug.org

1 participants
3 discussions

device mapper! (dmsetup, etc.) (bits from BALUG meeting 2024-11-19)
by Michael Paoli 26 Nov '24

26 Nov '24

So, 2024-11-19 BALUG meeting, one of our discussion topics: device mapper - and dmsetup(8) and related. So, device mapper - it is used to create/manage block device, which in turn has a specification of how it blocks are mapped to zero or more other device(s). It operates in units of traditional 512 byte blocks, and can handle a quite wide variety of possible ways of doing mapping. It has at least: cache, clone, dust, crypt, delay, ebs, error, flakey, linear, mirror, multipath, raid, snapshot, striped, thin, zero Some of the bits I mentioned. It's relatively lower level, but many other services and the like leverage the device mapper. E.g. LVM uses device mapper for its lower level configuration, as does cryptestup e.g. for LUKS. Sometimes using device mapper more directly can be quite useful. E.g. want to test I/O issues, it has dust, error, and flakey, so one can set up device that gives errors on I/O based upon various criteria as specified. Can also specify different specific data to be read back - which may be different than that which was written, etc. Also has RAID capabilities - which brings me to example. Some while back, had case to assist someone in coming up with a solution to something they wanted to do. They had a fair sized hardware RAID array, RAID-5 with 4x12TB drives. They wanted to migrate to md raid5, with 4x12TB (new) drives. And they wanted to minimize downtime to the extent feasible. Well, easiest way to do that, would be to effectively layer RAID-1 - at least temporarily atop that - sync - then split that mirror. But LVM or md, etc. RAID-1 not so great for that - as they'd all generally want to write their relevant headers on the devices, etc., and at best non-trivial to write that data, and block status tracking data, etc., somewhere else - if they even support that at all. So, solution for that? Use device mapper. Can then just directly raid1 mirror the blocks of the two devices - the original RAID-5 hardware RAID device, and the newer replacement md raid5 software RAID device. So, first of all, documentation, etc. There's the dmsetup(8) man page. Pretty good, but it leaves a lot to be desired. Notably doesn't well cover a lot of details that are or may be necessary. So next stop - and what it quite refers to - kernel documentation. E.g.: file:///usr/share/doc/linux-doc-6.1/html/admin-guide/device-mapper/ etc. Pretty good ... but not so great on, e.g. more complete examples, etc. and even some relevant information turned out to be quite missing from the kernel documentation (though hey, could probably read the relevant source ... but that generally wouldn't be easiest way). So next, some bits of Internet searching ... and found some good resources, e.g.: https://wiki.gentoo.org/wiki/Device-mapper#Mirror_and_RAID1 So, at least taken together, there was sufficient information. So ... I earlier did a test demo run, to show how it could be done ... but didn't save all my information/notes on that, so let me repeat that. And a bit better this time - including log devices - that way even if, e.g. system were to crash while the sync was in progress, could be cleanly resumed. So, I don't have 8x14TB in spare drives sitting around, so I'll do much smaller with some space on /tmp and some files there and using losetup(8) to create block devices suitable for use. So, below, my comments on lines starting with //: // So, don't have 8x14TB to work with, but on /tmp at present, can // easily spare 64GiB. So will instead do ~24GiB to emulate the old and // 4x8GiB to emulate the new. Since the "old" source is hardware // RAID-5, will just do appropriately sized storage for that, as I don't // have hardware RAID-5 available for that. First let's create the // backing files and loop devices for our first 4 devices to represent // our target drives. # mkdir /tmp/dmr1 && cd /tmp/dmr1 # truncate -s $(expr 8 \* 1024 \* 1024 \* 1024) f{1,2,3,4} && (for n in 1 2 3 4; do losetup -f --show f"$n"; done) /dev/loop2 /dev/loop3 /dev/loop4 /dev/loop5 # // And now create our software md raid5 device: # mdadm --create --level=raid5 --raid-devices=4 /dev/md24 /dev/loop[2-5] mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md24 started. # // And now let's get its exact size: $ cat /sys/block/md24/size 50276352 $ // That's in 512 byte blocks. As for bytes $ expr 50276352 \* 512 25741492224 $ // So let's create our source device of exactly that size: # truncate -s 25741492224 f0 && losetup -f --show f0 /dev/loop6 # // In our actual case, source would need to be not larger than the // target. If source were larger, we'd need to shrink the (relevant) // data before copying, e.g. reduce the size of the filesystem a bit, // possibly repartition slightly, etc. // And let's confirm our sizes match: $ cmp /sys/block/md24/size /sys/block/loop6/size && echo MATCHED MATCHED $ // And let's create some data on our source device: # mkfs -t ext3 -L 24gr5 -m 0 /dev/loop6 && mount -o nosuid,nodev /dev/loop6 /mnt && { dd if=/dev/urandom of=/mnt/urandom bs=1048576 status=none; < /mnt/urandom sha512sum && umount /mnt; } // ... dd: error writing '/mnt/urandom': No space left on device 6e0487bed425a7bb667d169e001415a4a18c6413ee5be56f032ebac7ea827dae9caee9ab0d0801e1d1b537eabc75cee9842de00a1089fd0afd7e7630752128aa - # // Now lets set up our device mapper device, will do this with metadata // devices to track, so, e.g. if interrupted (e.g. system crash // or abrupt power down, or drives disconnected, etc.) can still safely // resume after (we'll essentially ignore that our backing store is on // the volatile /tmp - this is still just demo after all). // Unfortunately the kernel documentation doesn't say how large these // need to be. Probably relatively small % of the total space, I'll // give it more than ample space on sparse file, then we can look at // actual block usage after. So, devices 50276352 512 byte blocks, // let's say 5% of that - that's ought be much more than enough - but // sparse files, won't much matter. $ echo '50276352*512/20' | bc -l 1287074611.20000000000000000000 $ // And let's round that up to 4KiB boundary: $ echo '1287074611.2/4/1024' | bc -l 314227.20000000000000000000 $ expr 314228 \* 4 \* 1024 1287077888 $ # truncate -s 1287077888 m{0,1} # (for n in 0 1; do losetup -f --show m"$n"; done) /dev/loop7 /dev/loop8 # // Will also look at status right after creation and 30 seconds later, // and will mount right after we create it too: # dmsetup create dmr1 --table '0 50276352 raid raid1 5 0 region_size 32 rebuild 1 2 /dev/loop7 /dev/loop6 /dev/loop8 /dev/md24' && dmsetup status dmr1 && mount -o nosuid,nodev /dev/mapper/dmr1 /mnt && sleep 30 && dmsetup status dmr1 0 50276352 raid raid1 2 Aa 0/50276352 recover 0 0 - 0 50276352 raid raid1 2 Aa 2046848/50276352 recover 0 0 - # // And a while later we have: # dmsetup status dmr1 0 50276352 raid raid1 2 AA 50276352/50276352 idle 0 0 - # // That field of all uppercase "A" characters tells us the RAID-1 // devices are fully synced up. // Let's deconstruct our RAID-1 device and compare the files on the // filesystems - which have now been copied via mirroring. Will also // read and recompute hash of one of the files to also check that still // matches. Also, so they don't conflict on the filesystems, will // change the label and UUID on the "old" one, so the original data of // that remains on the "new" target one. # umount /mnt // check again that we're synced before removal: # dmsetup status dmr1 0 50276352 raid raid1 2 AA 50276352/50276352 idle 0 0 - # dmsetup remove dmr1 # tune2fs -L 24gr5.old -U random /dev/loop6 tune2fs 1.47.0 (5-Feb-2023) # mkdir mnt-old mnt-new # mount -o ro,nosuid,nodev /dev/loop6 mnt-old # mount -o ro,nosuid,nodev /dev/md24 mnt-new # cmp mnt-{old,new}/urandom && echo MATCHED MATCHED # < mnt-new/urandom sha512sum 6e0487bed425a7bb667d169e001415a4a18c6413ee5be56f032ebac7ea827dae9caee9ab0d0801e1d1b537eabc75cee9842de00a1089fd0afd7e7630752128aa - # // And we can see that the files match and the hash matches our earlier. // Let's do it one more time, except this time with the filesystem very // busy while it's mounted and doing the RAID-1 sync. // And this time we mirror from the new, to old, as the new now has // exactly the data we want. # umount mnt-old && umount mnt-new # dmsetup create dmr1 --table '0 50276352 raid raid1 5 0 region_size 32 rebuild 0 2 /dev/loop7 /dev/loop6 /dev/loop8 /dev/md24' && mount -o nosuid,nodev /dev/mapper/dmr1 /mnt && { dd if=/dev/urandom of=/mnt/urandom bs=1048576 status=none; < /mnt/urandom sha512sum; } dd: error writing '/mnt/urandom': No space left on device c818fb535d037b01868252a3f2464cc17fa70b8f4cb21436a0f7d3d9c85b4783ac7e7835f47d55c588287688013b687743886f5640ddfb91ff9e2f8177dd5b38 - # // We check until we see it's synced: # dmsetup status dmr1 0 50276352 raid raid1 2 AA 50276352/50276352 idle 0 0 - # // Then we unmount, and again reconfirm it's synced: # umount /mnt # dmsetup status dmr1 0 50276352 raid raid1 2 AA 50276352/50276352 idle 0 0 - # // Now we again deconstruct the RAID-1, update label and UUID on old, // and mount and compare, and also again compute hash on one of the // files to see that also matches: # dmsetup remove dmr1 # tune2fs -L 24gr5.old -U random /dev/loop6 tune2fs 1.47.0 (5-Feb-2023) # mount -o ro,nosuid,nodev /dev/loop6 mnt-old # mount -o ro,nosuid,nodev /dev/md24 mnt-new # cmp mnt-{old,new}/urandom && echo MATCHED MATCHED # < mnt-old/urandom sha512sum c818fb535d037b01868252a3f2464cc17fa70b8f4cb21436a0f7d3d9c85b4783ac7e7835f47d55c588287688013b687743886f5640ddfb91ff9e2f8177dd5b38 - // So, files again matched, and our hash again matches. Also, checked // how much space was actually used for the meta devices to track RAID-1 // status: # stat -c '%b %n' m[01] 400 m0 400 m1 # // So that's 400 512 byte blocks, 200 KiB. // That size apparently depends upon size of device and region_size, // and does apparently have a limit on total maximum size it will use // for that meta device. So, our original scenario to replicate the RAID-5 from hardware RAID to software (md) RAID while minimizing downtime would go about like this: o create the new target md device, note its precise size o stop all I/O on the old device (e.g. unmount it). o note size of old - must not be larger than new, if necessary shrink and/or repartition, etc. or the like as appropriate. o create dm device as RAID-1 between the old and new, syncing to new o return I/O to service (using the dm device in place of the old device) o after sync has completed (may take hours to days for larger/slower storage), as before, stop all I/O - except now on the dm device instead of the old device. Again check/wait until it's fully synced. o tear down the dm device o Adjust labels, UUIDs, etc. if/as applicable on old to not conflict with new o return I/O to service, using new in place of old o if/as desired, tear down or decommission old Key advantage is relative minimization of downtime - across the hours (or days or more) while the RAID-1 mirror is syncing to duplicate the storage data, no downtime is needed, and the storage can be used per usual. A bit of downtime to shuffle about, notably from old, to dm, then to new, but other than that, things generally remain online and actively available. Additionally, no headers or encapsulation need happen on the old or new storage itself - that's all handled externally, so that keeps it clean and relatively simple.

1 0

Web automation, e.g. AT&T's "Unified Messaging" (voicemail) and downloading all that content.
by Michael Paoli 19 Nov '24

19 Nov '24

Ah, lovely web automation! :-) So, lately had a little mini-project to give myself. AT&T's "Unified Messaging" (voicemail). Wanted to "cut the cord" - bye-bye landline - porting ye olde landline # to mobile. But first, wanted to download all of my content from AT&T's "Unified Messaging" (voicemail). AT&T's "Unified Messaging" (UM/um), in addition to ye olde phone DTMF ("Touch Tone") interface to the voicemail, also has web interface. So, web interface. Essentially works as web GUI interface to email in "INBOX", messages are stored in email, and within an email item, voicemail as .wav attachment, text attachment having transcript as body - which will generally have empty body if it wasn't able to transcribe it. And generally html attachment, an html version of that text attachment. And, "of course", Perl also has the lovely WWW::Mechanize. So ... I got to programming. mitmproxy was also handy to figure out some bits going on within the SSL/TLS communications between client (e.g. web browser) and AT&T server(s). And got the key bits of that finished up this past Sunday. And got 'er all nicely downloaded. $ um.att.com um.att.com: Inbox is empty. Exiting $ That's what it outputs at the end, when there's nothing left to download. It also handles deleting the "email" item (message and related) from the AT&T "INBOX" once it's successfully downloaded. $ cd ~/.um.att.com.d/data $ ls -A1 | sed -e 's/^.*\././' | sort | uniq -c | sort -k 1,1bnr -k 2,2 117 .eml 117 .wav 113 .txt 112 .html $ Very nicely handles it all. .eml is the full raw "original" email as AT&T has it in the "INBOX", .wav files are the raw audio portion thereof, .txt the text transcript (or no file if that part was empty), and .html the html equivalent of that text. Ah, I was wondering about why one less .html than .txt ... peeking further, the .txt has: Message too short for transcription And that original .eml has no html part, and the .wav ... yeah, no words in that audio. Alas, I didn't clean out quite all the junk before downloading everything ... and the slight mismatch makes that bit of junk pretty easy to spot ... likewise grep on the .txt files is rather handy. So, the file names start with ISO date and time, which is derived from the Date: header which is timestamp of when the end of the message was received. Likewise that same time data is used to set the mtime on the files. File names also contain data from Subject: and From: fields, generally identifying caller name/number, or when not (CNID) identified otherwise unknown caller / Identity Withheld, e.g.: ... unknown caller ... Identity withheld <unknown_caller...> https://www.mpaoli.net/~michael/bin/um.att.com Ah, one of these days I need to tweak Apache configuration so it "knows", e.g. that file (and that name and location), can be handled like plain text, not a binary. Yeah, I know there's a "magic" type option that can read the files and make intelligent guess on that, but that's excessive overhead for most cases - so really need to just configure the exceptions ... down to directory or even per-file basis. (On my to-do list ... with thousands of other items yet to be done ... at least maybe when I get around to it). And ... maybe even others might find it handy, or handy starting point. Though this one was done almost / mostly as a one-off/one-shot. Though until the number completes being ported over, very handy to still check if anything has shown up there, and download it if so. It might need some adjustments to handle some other email messages. E.g. the ones from AT&T about the INBOX being nearly full. And looks like I probably won't have need for that (nor example data to match it to and test it on). And I didn't handle the more general email case (which I think UM will also accept and have in "INBOX"), as I only ever used UM for voicemail.

1 0

+Web interface: https://www.digitalwitness.org/ Re: Announcing digitalwitness.org - free public signing (witnessing) service
by Michael Paoli 05 Nov '24

05 Nov '24

Now +Web interface: https://www.digitalwitness.org/ On Thu, Oct 31, 2024 at 12:12 AM Michael Paoli via BALUG-Talk <balug-talk(a)lists.balug.org> wrote: > ---------- Forwarded message ---------- > From: Michael Paoli <michael.paoli(a)berkeley.edu> > To: BALUG-Talk <balug-talk(a)lists.balug.org> > Date: Thu, 31 Oct 2024 00:11:26 -0700 > Subject: [BALUG-Talk] Announcing digitalwitness.org - free public signing (witnessing) service > Announcing digitalwitness.org - free public signing (witnessing) service > At present it's in open/public Beta. > > At present the ssh interface is open, other(s) (most notably web) will > likely follow in time. > > Let's say you have a (possibly) large digital artifact (or archive file > of such items). Or even possibly much smaller item. Suggested approach > is to get secure hash/digest of the item, e.g.: > $ sha512sum digital_artifact_file > 3ddd987f33a96b50777d15f7850d80d8e30badf12501289d28d5ee4857d62c25c2c700b6a1313cace8b128fe1e4d1ff4787d70c46e1f633e5e4589bf3f2343ba > digital_artifact_file > $ > Then get that hash/digest signed, e.g.: > $ ssh -nT digitalwitness(a)digitalwitness.org -- > 3ddd987f33a96b50777d15f7850d80d8e30badf12501289d28d5ee4857d62c25c2c700b6a1313cace8b128fe1e4d1ff4787d70c46e1f633e5e4589bf3f2343ba > -----BEGIN PGP SIGNATURE----- > > iQIzBAABCgAdFiEEEMoY7NqvKKtE7xkCOAS8z7K+pwUFAmcjEtkACgkQOAS8z7K+ > pwWH3A//cDtGIHokwF+GEvKnFFE+Cw2hiPVTe0PkBPuymGnLPAC9Um7YkVt1vP8u > ZZhGOVrAivFV75gVASszq6Au4OKY/2GOO0+SMkVaxd9VzprxBH+j8BVixGiDvU3L > k9JbSzzLIKNoTpPptWbphPoEO6cE4WSm1HubMFeODE7znQeVfk5UlpIEE/7XcT0r > lmwnUaoSwhZUL5HxWE3Pt6x4QSTOin38DHCS55LRfDwHC5og5fc7eiC9TSqr/kVB > goyxYg/lCNrKa7HQto03zJ2ZctZnsNj5n81WhPYzBwlpGfb1T/3htqP17PCLJ19W > amWC+kn+9vR5cwGwuEnBxfOlE//CI7d3gWXSSsvHCDhX49Drh1m7OIw4I9vnYYcV > h9fgajSR1SAMiQwo7uQjrmByZnUdXCTfRJ6ywBfuKaZdBX1Y+AGD8cUWc5qnCwrR > rX4bznL2JZhrwrNh6abmDcyqRzMmXr+jM7WqZiJbcxkhoJZ78c+UtCR6ETreVui0 > Ah7a7HSAuZ6E3dek1oBzA5zTV/bpZHEcdz8UcRYWryEF7wtzJ0gi4SXsWLLGq/LA > LaSOPgcA8KMB3A5NtSYdn0ax/MEao/r/spNrxiQI5jKTYBuoI3qSA7I8Pp1qIkkY > DS/XHzT2CNVw6T/YxvqrhqvSPfaQkQWy0lse6WVIG3CfW7Bp0Pk= > =5mCo > -----END PGP SIGNATURE----- > $ > For more information, help, public signing key, etc., just run without > any "command" or arguments thereof, e.g.: > $ ssh -nT digitalwitness(a)digitalwitness.org > And that will provide one with much relevant information, and including > important details (e.g. the signed data is signed without first adding > any implicit newline on the end, though one can explicitly provide that > if desired).

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

BALUG-Talk November 2024