Can we de-duplicate in Purview eDiscovery Premium now?
There was a confusing M365 roadmap item that I still don't understand and I'm salty about it.
Before we discuss what Microsoft seems to be doing with MD5 and SHA hashes in a review set, I want to share how we arrived at this topic this week.
Given the June 2024 rollout for Teams meeting transcripts moving to OneDrive for Business, I had planned to share some more details. I’ve said this before when the rollout was pushed from April to June, and despite it being July 1, it appears to be rolling out, just not to a tenant I have access to.
That’s not that big a deal. Rollouts take time across the M365 universe, and being a few days off isn’t surprising.
I have found a bit curious about the number of Microsoft Partners who’ve been telling people in blog posts, webinars, etc., that Microsoft has already made that change. No, I’m not talking about saying it during this rollout period, either. They’ve been saying it for quite some time now. Maybe with access to Preview features, they saw it and assumed it would be available to everyone by the time anyone noticed. Except it isn’t, and I wonder how many people heard that and went back to their environment only to see that what they said wasn’t true.
Or, maybe some of these partners never tested it for themselves. They heard it was coming on an MS Partner call and updated their pitch decks immediately.
Either way, it’d be nice if these folks could be more careful.
In the absence of meeting transcripts to talk about, I looked at some Roadmap items and came across this little nugget from March that I hadn’t paid much attention to.
Microsoft Purview compliance portal: eDiscovery (Premium) - Review set data minimization - Identify exact duplicates by default
Harness the built-in review set query within eDiscovery (Premium) review sets, specifically designed to pinpoint exact duplicates and encompassing email threads, to systematically reduce your review set data by an average of 30%, eliminating the need to trigger analytics manually post ingestion. Gain full visibility into all hash values utilized in de-duplication and email threading algorithms, ensuring a seamless and efficient review process.
What caught my eye in the description was the existence of a built-in review set query because I don’t recall seeing any built-in queries existing before we triggered analytics.
So, I went looking by starting a brand-new case, collecting data, and adding it to a review set.
I could not find a built-in query for duplicates and did not see any hash values in the document metadata from within a Review Set.
But wait. There’s more.
Keep reading with a 7-day free trial
Subscribe to Mike McBride on M365 to keep reading this post and get 7 days of free access to the full post archives.