I am seeking to effect a data obfuscation step. Most of the transformations and jobs I construct revolve around dodgy csv/excel report extracts. I would like to be able to share "non-confidential" versions of these files internally and externally e.g. on this forum so what I would like to do is maintain data structure but strip out identifying content e.g.
- Refactor each parseable number by a random number between 0 and 10
- Find replace identifying text strings e.g.
- Option 1: Occurrence of specific text strings e.g. company name "XYZ" with a substitute value "ABC" or
- Option 2: build a dictionary of all strings in file and replace each with an obfuscated alternative eg.
- 1 character strings are all changed to "j"
- 5 character strings are all changed to "street"
- 8 character strings are all changed to "elephant"
Any thoughts on how I might achieve this are appreciated.