function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
KCrock88KCrock88 

REGEX formula help

. We have REGEX formulas that run to scan incoming support tickets for “sensitive data”, i.e. Credit Card #s, SSNs, CVV numbers. We have a workflow that will look at the subject/description of incoming support tickets and check a box if any sensitive information is identified, so our security team can look at it. However, the regex formula we currently have in place is WAY to hypersensitive, it literally flags almost every incoming support ticket as being sensitive.

What we would like to do, is have the regex formula scan the field, and then auto “mask” the sensitive data, i.e.
Ticket comes in, with the CC# - 4444 4444 4444 4444
New value – 4444 44** **** 4444

This is crucial to our PCI compliance, as we can’t have this data floating around in Salesforce and in emails. Every time we get one of these, we have to go around to everyone who got the email notifications and watch them permanently delete them from their computer.

Can you look at the formulas below, and:
1.)    Fix them so they are not so hypersensitive?
2.)    Let us know how we can auto-mask them as they come in.

CVV (ignore dashes and spaces)
REGEX(Description_Copy__c, ".*\\D(?:\\d[ -]*?){3}\\D.*"),

SSN# (ignore dashes and spaces)
REGEX(Description_Copy__c, ".*\\D(?:\\d[ -]*?){9}\\D.*"),

Credit Card# (ignore dashes and spaces)
REGEX(Description_Copy__c, ".*(?:\\d[ -]*?){13,19}.*"),

Any help would be appreciated!!!!
Best Answer chosen by KCrock88
pconpcon
I think you're just being too greedy with your regex.  For example, when I use your Credit Card regex on regexpal [1]

Regex
.*(?:\d[ -]?){13,19}.*
Sample data
the start 4444 1234 5643 1234 the rest
the start 1234541233241423 the rest
the start 1234-5634-2342-5341 the rest
this is just a normal 1234 line
I get the entire line.  That is because of the .* on the start and end of your regex.  If you change your regex to
(?:\d[ -]?){13,19}
I only get the suspected credit card numbers on the line hilighted.

I have a feeling that if you do the rest of your regexes without the .* you will find that it matches better.

[1] http://regexpal.com/

All Answers

pconpcon
I think you're just being too greedy with your regex.  For example, when I use your Credit Card regex on regexpal [1]

Regex
.*(?:\d[ -]?){13,19}.*
Sample data
the start 4444 1234 5643 1234 the rest
the start 1234541233241423 the rest
the start 1234-5634-2342-5341 the rest
this is just a normal 1234 line
I get the entire line.  That is because of the .* on the start and end of your regex.  If you change your regex to
(?:\d[ -]?){13,19}
I only get the suspected credit card numbers on the line hilighted.

I have a feeling that if you do the rest of your regexes without the .* you will find that it matches better.

[1] http://regexpal.com/
This was selected as the best answer
KCrock88KCrock88
Thanks @pcon! I will update the scanners. Any ideas on how to auto-mask the sensitive data as it comes in?
pconpcon
You could just do it in a before trigger.  This would then be applied before the data was ever written to the SFDC "database."