Ah, lovely web automation! :-)
So, lately had a little mini-project to give myself. AT&T's "Unified Messaging" (voicemail). Wanted to "cut the cord" - bye-bye landline - porting ye olde landline # to mobile. But first, wanted to download all of my content from AT&T's "Unified Messaging" (voicemail). AT&T's "Unified Messaging" (UM/um), in addition to ye olde phone DTMF ("Touch Tone") interface to the voicemail, also has web interface.
So, web interface. Essentially works as web GUI interface to email in "INBOX", messages are stored in email, and within an email item, voicemail as .wav attachment, text attachment having transcript as body - which will generally have empty body if it wasn't able to transcribe it. And generally html attachment, an html version of that text attachment.
And, "of course", Perl also has the lovely WWW::Mechanize. So ... I got to programming. mitmproxy was also handy to figure out some bits going on within the SSL/TLS communications between client (e.g. web browser) and AT&T server(s).
And got the key bits of that finished up this past Sunday. And got 'er all nicely downloaded. $ um.att.com um.att.com: Inbox is empty. Exiting $ That's what it outputs at the end, when there's nothing left to download. It also handles deleting the "email" item (message and related) from the AT&T "INBOX" once it's successfully downloaded. $ cd ~/.um.att.com.d/data $ ls -A1 | sed -e 's/^.*././' | sort | uniq -c | sort -k 1,1bnr -k 2,2 117 .eml 117 .wav 113 .txt 112 .html $ Very nicely handles it all.
.eml is the full raw "original" email as AT&T has it in the "INBOX", .wav files are the raw audio portion thereof, .txt the text transcript (or no file if that part was empty), and .html the html equivalent of that text.
Ah, I was wondering about why one less .html than .txt ... peeking further, the .txt has: Message too short for transcription And that original .eml has no html part, and the .wav ... yeah, no words in that audio.
Alas, I didn't clean out quite all the junk before downloading everything ... and the slight mismatch makes that bit of junk pretty easy to spot ... likewise grep on the .txt files is rather handy.
So, the file names start with ISO date and time, which is derived from the Date: header which is timestamp of when the end of the message was received. Likewise that same time data is used to set the mtime on the files. File names also contain data from Subject: and From: fields, generally identifying caller name/number, or when not (CNID) identified otherwise unknown caller / Identity Withheld, e.g.: ... unknown caller ... Identity withheld <unknown_caller...>
https://www.mpaoli.net/~michael/bin/um.att.com Ah, one of these days I need to tweak Apache configuration so it "knows", e.g. that file (and that name and location), can be handled like plain text, not a binary. Yeah, I know there's a "magic" type option that can read the files and make intelligent guess on that, but that's excessive overhead for most cases - so really need to just configure the exceptions ... down to directory or even per-file basis. (On my to-do list ... with thousands of other items yet to be done ... at least maybe when I get around to it).
And ... maybe even others might find it handy, or handy starting point. Though this one was done almost / mostly as a one-off/one-shot. Though until the number completes being ported over, very handy to still check if anything has shown up there, and download it if so. It might need some adjustments to handle some other email messages. E.g. the ones from AT&T about the INBOX being nearly full. And looks like I probably won't have need for that (nor example data to match it to and test it on). And I didn't handle the more general email case (which I think UM will also accept and have in "INBOX"), as I only ever used UM for voicemail.