function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
RyanAtTGHRyanAtTGH 

How to create map of Attachment, EmailMessage, and Case to find duplicates

A client is over their storage limit due to large media attachments on emails which are related to cases. When exchanging emails, often the attachments are duplicated. I'd like to identify the duplicate email attachments, based on the parent Email Message assosciation with Case, size, and name. 

I can query the attachments, size, and parentId (EmailMessage) using the soql
SELECT Id, Name, ContentType, BodyLength, ParentId, Parent.Type
    FROM Attachment
    WHERE ContentType = 'image/jpeg'
    AND ParentId IN (SELECT Id FROM EmailMessage WHERE HasAttachment = TRUE)

But, I would also like the Email Message's RelatedTo and/or ParentId field(s) as well, which would give me the Case Id. If an attachment has the same CaseId, BodyLength, and Name, it's 99% a duplicate and I can delete it. 

I've tried all the SOQL I know to get it in one query, so I am thinking now I need a custom map or sObject using Apex. 

thoughts? 
AnkaiahAnkaiah (Salesforce Developers) 
Hi Ryan,

The query which you have written will be the result of case related attachement. emailMessage ParentId supports for only case.

Thanks!!
RyanAtTGHRyanAtTGH
Ankaiah, The query provided does not return information about a case, but only about the attachment to the email. I would like to get the Case ID that is in the EmailMessage ParentId or RelatedToId.
AnkaiahAnkaiah (Salesforce Developers) 
Hi Ryan,

Try with below code.
 
List<Case> cs = [Select Id from Case where origin='email'];
Map<Id,EmailMessage> caseEmails = new Map<Id,EmailMessage>([Select Id from EmailMessage where ParentId=:cs.Id and HasAttachment = true]);
Set<Id> caseRelatedIds = new Set<Id>();
if(caseEmails != null){
     caseRelatedIds.addAll(caseEmails.keySet());
}
List<Attachment> AttachedFiles = [SELECT Id, Name, Body, ContentType FROM Attachment WHERE ParentId IN :caseRelatedIds];

If this helps, Please mark it as best answer.

Thanks!!
RyanAtTGHRyanAtTGH
Yes, thank you, Ankaiah. After a couple tweaks I was able to get the code working:
List<Case> cs = [SELECT Id FROM Case WHERE origin='email' LIMIT 1000];
Map<Id,EmailMessage> caseEmails = new Map<Id,EmailMessage>([SELECT Id FROM EmailMessage WHERE ParentId IN :cs AND HasAttachment = true]);
Set<Id> caseRelatedIds = new Set<Id>();
if(caseEmails != null){
     caseRelatedIds.addAll(caseEmails.keySet());
}
List<Attachment> attachedFiles = [SELECT Id, Name, BodyLength, ContentType FROM Attachment WHERE ParentId IN :caseRelatedIds];
if(attachedFiles != null){
    for(Attachment a : attachedFiles){
        system.debug('attachmentId:' + a.Id);
        system.debug('attachmentSize:' + a.BodyLength);
    }
}

So, this returns data about the attachments. I feel I was also successful with this code:
 
List<EmailMessage> ems = [SELECT Id, ParentId,
    (SELECT Id, Name, ContentType, BodyLength 
    FROM Attachments
    ORDER BY BodyLength DESC,Name ASC)
FROM EmailMessage
WHERE HasAttachment = TRUE
LIMIT 100];


for(EmailMessage em : ems){
    //system.debug('attachments' + em.Attachments);
    for(Attachment a : em.Attachments){
        system.debug('CaseId' + em.ParentId);
        system.debug('size: ' + a.BodyLength);
    }

}

My challenge now, is to have the Case Id, Attachment Name, and Attachment BodyLength in the same row/record, ordered by Case ID and Size, so I can loop over them, and compare if the current one is 'exactly' the same as the one before it, identifying a likely duplicate...