Embedding Similarity Calculator

About Embedding Similarity

Compare two embedding vectors using multiple distance and similarity metrics. Cosine similarity measures angular similarity (direction), while Euclidean and Manhattan measure absolute distance. Dot product captures both magnitude and direction.

All computation uses Float64Array for numerical precision. No data leaves your browser.

What This Tool Does

Embedding Similarity Calculator is built for deterministic developer and agent workflows.

Calculate cosine similarity, dot product, and distance between embedding vectors from OpenAI, Cohere, and more.

Use How to Use for execution steps and FAQ for constraints, policies, and edge cases.

Last updated:

This tool is provided as-is for convenience. Output should be verified before use in any production or critical context.

Agent Invocation

Best Path For Builders

Browser workflow

Runs instantly in the browser with private local processing and copy/export-ready output.

Browser Workflow

This tool is optimized for instant in-browser execution with local data handling. Run it here and copy/export the output directly.

/embedding-similarity-calculator/

For automation planning, fetch the canonical contract at /api/tool/embedding-similarity-calculator.json.

How to Use Embedding Similarity Calculator

  1. 1

    Calculate cosine similarity between two embeddings

    Paste two embedding vectors (comma or space-separated floats). The tool computes cosine similarity (0 = unrelated, 1 = identical). Use to verify if two pieces of text/code are semantically similar.

  2. 2

    Verify embedding quality in RAG pipelines

    Embed a query and a retrieved document. Calculate cosine similarity. If < 0.7, the retrieval ranking may be wrong. High similarity (>0.85) suggests good match for the LLM.

  3. 3

    Debug semantic search ranking issues

    Calculate similarity between user query embedding and multiple candidate document embeddings. Compare scores to understand why a 'wrong' result ranked high. Helps tune embedding model choice.

  4. 4

    Find near-duplicate content in a corpus

    Embed multiple documents, calculate pairwise similarity. Documents with similarity >0.95 are likely duplicates. Useful for deduplication before indexing or for clustering similar content.

  5. 5

    Validate embedding model performance

    Embed semantically similar sentence pairs (synonyms, paraphrases) and dissimilar pairs. Similar pairs should score >0.8, dissimilar <0.3. If not, your embedding model needs retraining or swapping.

Frequently Asked Questions

What is cosine similarity?
Cosine similarity measures the angle between two vectors, returning a value from -1 (opposite) to 1 (identical). It's the most common metric for comparing text embeddings in RAG and search applications.
What embedding dimensions are supported?
Any dimension from 1 to 10,000+. Common dimensions include 384 (MiniLM), 768 (BERT), 1024 (Cohere), 1536 (OpenAI text-embedding-3-small), and 3072 (text-embedding-3-large).
When should I use cosine similarity vs euclidean distance?
Cosine similarity measures direction and is best for normalized embeddings (most common in text search). Euclidean distance measures both magnitude and direction, better for detecting outliers or when vectors are not normalized.
Is this tool free and private?
Yes. Free to use. All calculations run in your browser using JavaScript typed arrays. Your embedding vectors are not sent to external services.
Can I compare multiple embeddings at once?
The tool supports pairwise comparison of two vectors with four metrics: cosine similarity, dot product, euclidean distance, and Manhattan distance.