Sadly, it still doesn't work. I'm a bit ahead, though.
Let me retrace the steps I did to debug this.
I created a Filter step to rename the field I'm working on, and it recognizes the field and renames it, so that's not it.
To clarify, I put the scan format in there:
Regular Expression: ^\w{3}\,\s\d{1,2}\s\w{3}\s\d{4}\s\d{1,2}:\d{1,2}:\d{1,2}\s[\+\-]\d{4}(?:\s\(\w+\))?
Scan Pattern: EEE, dd MMM yyyy HH:mm:ssZ
I also tried a simplified regex:
^\w{3}\,\s\d{1,2}\s\w{3}\s\d{4}\s\d{1,2}:\d{1,2}:\d{1,2}
and scan pattern:
EEE, dd MMM yyyy HH:mm:ss
HCI still fails to recognize it.
I then tried the first format with several different scan patterns:
Scan Pattern: EEE, dd MMM yyyy HH:mm:ss Z
Scan Pattern: EEE, dd MMM yyyy HH:mm:ss(Z)
Scan Pattern: EEE, dd MMM yyyy HH:mm:ss (Z)
Still didn't work. I realized that I'm using a 1-31 format for days (instead of 01-31), so I retried with this scan pattern:
EEE, d MMM yyyy HH:mm:ssZ
Still didn't work. I realized that it's probably a first match, so I removed all duplicate regular expressions and retried with just one item. Still nothing.
Just to see if this step is actually reading the field, I put the date into two other fields of custom metadata and formatted them differently: One to yyyy-MM-dd'T'HH:mm:ssZ and another by translating the weekday and month names to my local language (just in case HCI is configured in locale pl_PL). The first one was recognized and transformed (so the step does read the custom metadata field), the second one was still ignored (so luckily, it's not about the locale).
Finally, just to rule out the possibility that the step is not doing anything for whatever reason, I added a step to transform that above format (yyyy-MM-dd'T'HH:mm:ssZ), and it worked.
The last thing I did was, I removed the literal time zone representation, from this:
Fri, 2 Feb 2018 07:57:34 +0100 (CET)
to this:
Fri, 2 Feb 2018 07:57:34 +0100
That one finally worked. However, I have 1.5 million objects to upload with associated metadata as of now, and some of them have the time zone name and some of them don't. Is there any way to make sure that both formats are recognized?
What is functionally different between "+0100" and "+0100 (CET)" that a catch-all scan pattern recognized the former, but not the latter? I don't know, should I make two separate scan formats, with one that has:
^\w{3},\s\d{1,2}\s\w{3}\s\d{4}\s\d{1,2}:\d{1,2}:\d{1,2}\s[\+\-]\d{4}$
EEE, d MMM yyyy HH:mm:ss X
And this one:
^\w{3},\s\d{1,2}\s\w{3}\s\d{4}\s\d{1,2}:\d{1,2}:\d{1,2}\s[\+\-]\d{4}(\s\(\w+\))$
EEE, d MMM yyyy HH:mm:ss X (X)
Would that actually work (with two "X" references)?