Why every selected document should be in the data source for Relativity Analytics clustering

Relativity Analytics clustering relies on a complete data source so every selected document participates in grouping. If any item is missing, it cannot cluster, creating gaps in results. A complete data source yields clearer clusters and more reliable, actionable insights. This supports solid results

Why every document matters in Relativity analytics

Let’s imagine you’re sorting a giant inbox of emails, reports, and memos. You want to find clusters—groups of messages that feel similar because of what they say and how they’re described. In Relativity, that instinct translates into an analytics index that bends data into meaningful patterns. Here’s the essential bit: for the clustering to work well, every document you want included has to live somewhere in the data source. If even a single item is missing, it’s like leaving a key piece out of a puzzle. The result? Some documents won’t join a cluster and end up labeled Not Clustered. And that label isn’t a badge of honor—it’s a signal that something in your data didn’t make it into the analysis.

What the Relativity Analytics Index actually does

Think of the Analytics Index as a smart librarian who looks at content and context. It examines text, metadata, and relationships between documents to group similar items. Clustering helps you spot themes, identify outliers, and understand the landscape of your data without reading every single file cover to cover. It’s powerful because it turns chaos into a map you can navigate, audit, and refine.

But there’s a catch that trips people up if they’re not paying attention: the clusters only reflect what’s present in the data source. If a document is outside that source, it doesn’t even show up on the librarian’s radar. The whole exercise hinges on having a complete, accurate set of materials in hand before you run the clustering. In other words, completeness is not a nice-to-have—it’s the backbone of reliable analytics.

Why missing documents lead to Not Clustered

Here’s the plain-language truth: clustering works by comparing documents to one another. When a document isn’t in the data source, there’s nothing to compare it against. It sits outside the analysis, and the analytics index can’t place it into any cluster. It’s not that the document is bad or irrelevant. It’s just not visible to the clustering engine. The result is Not Clustered, which can be a quiet but costly misstep because you might assume all documents were considered when, in fact, a portion wasn’t.

That mismatch can skew results in small but meaningful ways. A handful of missing items might dampen the pressure of a cluster’s theme, hide a potential outlier, or obscure a trend you hoped to catch. And in a project setting, that can ripple into decisions, timelines, and even risk assessments. So while the label Not Clustered sounds technical, the human effect is straightforward: you don’t want to miss the forest for a few stray trees.

A mental model you can rely on

Here’s a simple way to picture it. Picture a bookshelf where you’ve drawn up a map of topics. Each book represents a document. If you don’t place every relevant book on the shelf, the map you build afterward will only cover what’s actually on the shelf. Clusters resemble neighborhoods on that map—areas where books share similar vibes: subject matter, keywords, authorship, or dates. When a book is left off the shelf, it never gets assigned to a neighborhood, so you end up with gaps in your understanding. The integrity of the data source, then, isn’t a niche concern; it’s the difference between a coherent scene and a blurred silhouette.

Practical steps to keep the data source complete

  • Map the scope before you start

  • Decide which documents should be analyzed and why they matter. Make a clear list of sources, custodians, and file types. This reduces the chance you’ll accidentally omit something important later.

  • Include related and supporting items

  • Sometimes related materials live outside the primary data source but remain relevant for clustering. If policy or business rules say “all related docs matter,” bring those into the analytics index so they can participate fully.

  • Verify the data source before running analytics

  • Do a quick inventory: count the documents, check for common gaps, and run a pre-check to confirm that all intended items are present. A little diligence up front pays off in cleaner results.

  • Check for metadata completeness

  • Clustering often leans on metadata alongside content. Missing fields can affect similarity signals. If a document is present but its metadata is incomplete, consider enriching it before indexing.

  • Reconcile duplicates and de-duplication strategies

  • Duplicate content can muddy clusters, but so can missing someone’s copy. A balanced approach helps ensure each relevant item contributes without overwhelming the results with near-duplicates.

  • Re-run when the data changes

  • If new documents get added, or if you adjust the scope, give the analytics index a fresh run. The clusters should reflect the current, complete set rather than a stale snapshot.

  • Document your assumptions

  • It’s easy to forget why a particular item wasn’t included. Keep a short log of decisions about scope and exclusions so you can audit or adjust later.

A relatable analogy that lands

Think of clustering like organizing a citywide scavenger hunt. Each clue is a document, and clusters are neighborhoods where clues share a vibe—say, a theme around “legal issues by date” or “project management terms.” If a chunk of clues never makes it into the city box you’ve set up, you’ll end up with a map that feels incomplete. The players might still have fun, but the routes and connections won’t be as accurate or insightful. The same logic applies to Relativity’s analytics: completeness unlocks the true power of clustering, and a missing document is like a stray clue that points you nowhere.

Real-world considerations (without the drama)

  • Data source hygiene matters

  • Bad or inconsistent data will undermine even the best clustering algorithms. Clean, standardized data helps the system recognize patterns rather than get hung up on quirks.

  • Metadata quality is as crucial as the content

  • Dates, authors, tags, and custodians aren’t decorative. They guide who, what, where, and when. Strong metadata helps clusters form more precise connections.

  • The human layer still counts

  • You don’t replace judgment with automation. Use clustering as a map, not a final verdict. The insights should prompt human review, especially when clusters reveal surprising or sensitive themes.

A quick, practical checklist you can use now

  • Have you identified all relevant data sources and confirmed they’re in scope for analysis?

  • Are all documents that should participate actually in the data source?

  • Is the metadata for these documents complete and consistent?

  • Have you run a pre-check to spot any obvious gaps or anomalies?

  • Will you re-run the analysis after adding missed items or updating scope?

If the answer to the first question is yes and the rest align, you’re in a strong position. When all selected documents are present, the Relativity Analytics Index can do its best work, grouping like with like and revealing patterns you might not notice at a glance.

Not clustered: what it actually means in practice

When a document isn’t clustered, it doesn’t vanish into the void. It just sits outside the neighborhood map created by the index. If you’re trying to understand a topic, you might see the dominant themes clearly and miss a corner where that missing document would contribute. In some workflows, that gap is minor; in others, it could skew the interpretation or lead to missed connections between related items. Either way, the lesson is the same: ensure visibility for every relevant document so the map reflects reality.

Bringing it together: why data integrity is your best ally

In the end, the rule is simple and surprisingly practical: for effective clustering, include all selected documents in the data source. It’s not about clicking a checkbox and calling it a day. It’s about respecting the integrity of the data you’re working with. When you do, Relativity’s analytics index can reveal coherent clusters, meaningful patterns, and actionable insights. If a document is left out, you’ll know it by the Not Clustered label, which serves as a friendly reminder to go back, check your data, and bring it into the fold.

A closing thought

If you enjoy a tidy analysis, you’ll appreciate the quiet efficiency that comes from data completeness. The moment you ensure every relevant item is present, the clustering process feels smoother, the results clearer, and the path to insight easier to tread. It’s a small step with a tangible payoff—a bit of discipline that pays off with better understanding and better decisions. And if you’re navigating Relativity’s tools, you’ll find that aligning your data source with your analytical goals isn’t just good practice; it’s the backbone of finding clarity in a sea of documents.

In sum: True. Every selected document needs to be in the data source for the analytics index to avoid Not Clustered. Treat it as a guiding principle, not a line on a checkbox, and your clustering will reflect the full story your data has to tell.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy