← All Chapters Chapter 5

Vectors & Deep Learning

From points to vectors: the foundation of modern AI

From Classification to Vectors: A Deeper Understanding

In the previous chapter, we learned how machines draw hyperplanes (decision boundaries) to separate customers into RENEW vs CHURN. We represented each customer with two numbers: months subscribed and usage hours. Customer A was (10, 35), Customer B was (12, 38), and so on.

There's a more general way to think about this data. Those customer coordinates aren't just "points" - they're vectors, and we have mathematical tools to compare them. Let's explore how.

The Problem: How Do We Measure Similarity?

In Chapter 4, we successfully classified customers as RENEW or CHURN using a decision boundary. But we never answered a fundamental question:

The Question We Couldn't Answer

  • "Which customers are most SIMILAR to each other?"
  • "If Customer A renewed, which other customers are likely to renew?"
  • "How do we measure if two behavior patterns are the same?"

To answer these questions, we need mathematical tools to compare multi-dimensional data. That's what this chapter covers.

What We'll Cover:

Two fundamental tools used in search engines, recommendation systems, and language models: dot product and cosine similarity. We'll understand when to use each and how AI systems use them to find patterns.

Scalars: The Simple Numbers

First, let's understand what a scalar is. It's just a single number - a magnitude without direction.

Scalar Examples

Temperature: 72°F

Just a number - how hot or cold

Price: $450,000

One number representing cost

Months: 10

A single value from our customer data

Weight: 1.5 (from our model)

A single learned parameter

But What About Multiple Values Together?

What if we want to represent something that has multiple dimensions? Like Customer A who has BOTH 10 months subscribed AND 35 hours usage? That's where vectors come in!

Vectors: Everything Becomes Numbers

A vector is an ordered list of numbers that represents something in multi-dimensional space. In machine learning, everything—customers, products, text, images—gets represented as vectors.

Customer A as a Vector

Customer A =
[
10 months
,
35 hrs/week
]

This is the RENEW customer from our scatter plot!

Customer E as a Vector

Customer E =
[
2 months
,
8 hrs/week
]

This is the CHURN customer from our scatter plot!

Key Insight

Every customer in our classification problem is a vector! When we see scatter plots showing data points in 2D space, those are visualizations of vectors. Machine learning is really about finding patterns in high-dimensional vector spaces.

🎮 Interactive Vector Playground

Click anywhere in the canvas to create vectors and see them come to life!

(0,0)

Current Vector

Position: [0, 0]
Magnitude: 0.00
Angle:

Vectors Created

0

Try this: Create several vectors and notice how they're all arrows pointing from the origin (center). Longer arrows = larger magnitude, different directions = different angles!

Vector Properties: Direction and Magnitude

Every vector has two fundamental properties. Think of them like an arrow:

1. Magnitude (Length)

What it means: How long is the arrow? This tells us the "strength" or "intensity" of the customer's behavior.

x y 10 35 v₁ = 10 v₂ = 35 ||v|| ≈ 36.4 MAGNITUDE (Length of arrow) (0,0) (10,35)

Magnitude is the length of the vector arrow from origin to the endpoint

How we calculate it: Using the Pythagorean theorem (like finding the hypotenuse of a triangle):

||v|| = √(v₁² + v₂² + v₃² + ...)

Example: Customer A = [10, 35]

||A|| = √(10² + 35²) = √(100 + 1,225) = √1,325 ≈ 36.40

Example: Customer E = [2, 8]

||E|| = √(2² + 8²) = √(4 + 64) = √68 ≈ 8.25

Interpretation: Customer A has a magnitude of 36.40 while Customer E has only 8.25. This means Customer A shows much stronger engagement overall.

2. Direction

What it means: Which way is the arrow pointing? This tells us the "type" or "pattern" of behavior.

x y (0,0) A [10,35] B [12,38] SAME DIRECTION (Small angle) E [2,8] DIFFERENT DIRECTION (Large angle)

Direction shows where vectors point — A & B are similar, E is different

Why it matters: Two customers might have similar strength (magnitude) but completely different patterns (direction).

Example: Customer A [10, 35] and Customer B [12, 38] point in nearly the same direction — both have high support tickets relative to purchases (both RENEW).

But Customer A [10, 35] and Customer E [2, 8] point in different directions, indicating different behavioral patterns!

When Size and Direction Tell Different Stories

A Puzzling Situation

Let's look at a situation that reveals something important about comparing vectors.

Consider These Customers

Customer A

[10, 35]

10 support tickets, 35 hrs/week usage

Outcome: RENEW

Customer B

[12, 38]

12 support tickets, 38 hrs/week usage

Outcome: RENEW

Customer E

[2, 8]

2 support tickets, 8 hrs/week usage

Outcome: CHURN

Customer F

[1, 5]

1 support ticket, 5 hrs/week usage

Outcome: CHURN

The Puzzle

Look at these two customers:

Customer A: [10 logins, 35 support calls]
Customer B: [12 logins, 38 support calls]

You'd probably say: "These are very similar customers." Both are power users with lots of activity.

Now look at these:

Customer E: [2 logins, 8 support calls]
Customer F: [1 login, 5 support calls]

These look different, right? Customer E has twice as many logins and support calls as Customer F.

But wait—let's look at their behavior, not just the numbers.

Customer E's behavior:

  • 2 logins → 8 support calls
  • That's a ratio of 1:4 (for every login, 4 support calls)
  • This customer needs a lot of help when they use the product

Customer F's behavior:

  • 1 login → 5 support calls
  • That's a ratio of 1:5 (for every login, 5 support calls)
  • This customer also needs a lot of help when they use the product

The insight: E and F have the same type of relationship with the product. They both struggle and need lots of support relative to how often they log in.

The only difference? Customer E is more active overall. But their behavior pattern is nearly identical.

This is the key insight: Sometimes we want to find similar patterns, regardless of scale. That's when cosine similarity shines—it ignores the magnitude and focuses purely on the ratio between dimensions.

So Which Tool Should We Use?

It depends on what we're trying to find. Let's look at two different business questions:

Scenario 1: Find Best Customers

The goal: Find customers similar to Customer A who are high-value power users to invite to an exclusive beta program.

Customer A: [10 logins, 35 support calls] ← Reference

Who should we invite?

  • Customer B: [12, 38] → Also a power user ✓
  • Customer E: [2, 8] → Barely active ✗

Use Dot Product:

A · B = (10×12) + (35×38) = 120 + 1,330 = 1,450 → Very similar!
A · E = (10×2) + (35×8) = 20 + 280 = 300 → Much less similar

Why dot product works here: We want someone who is BOTH a heavy user AND has similar values. Dot product rewards high magnitude—it finds power users.

Scenario 2: Find Customers with the Same Problem

Your goal: Customer E is struggling (lots of support calls per login). You want to find others with the same struggle pattern to send them a tutorial, regardless of how active they are.

Customer E: [2 logins, 8 support calls] ← Your reference (1:4 ratio = struggling)

Who else is struggling the same way?

  • Customer F: [1, 5] → Same 1:4-5 ratio = also struggling ✓
  • Customer A: [10, 35] → Also 1:3.5 ratio = similar struggle ✓

Use Cosine Similarity:

E [2, 8] and F [1, 5]:
High similarity (0.98)!
Even though F is half as active, they have the exact same behavior pattern
E [2, 8] and A [10, 35]:
Also high similarity (0.97)!
A is 5x more active, but same struggle pattern (lots of support needed)

Why cosine similarity works here: You don't care if someone is super active or barely active. You only care if they have the same behavioral pattern—same ratio of logins to support calls.

The Key Difference:

  • Dot Product: "Find me similar high-value customers" (magnitude matters)
  • Cosine Similarity: "Find me customers with the same behavior" (pattern matters, size doesn't)

Comparing Vectors: Two Fundamental Tools

Two Tools for Two Different Questions

Now that we understand why we need two tools, let's see what each one does.

Dot Product

Measures: Direction AND magnitude together

Use when: Scale matters. You want to know if vectors align AND have strong signals.

Example: Recommendation systems where high engagement + right preferences = strong recommendation

Cosine Similarity

Measures: Direction ONLY (ignores magnitude)

Use when: You want patterns regardless of scale. Small and large vectors with the same direction are equally similar.

Example: Finding similar customer behavior patterns (E and F both churn, even though they're low-engagement)

Both operations power modern AI—from search engines to ChatGPT. Let's learn how each one works.

Visualizing Direction: The Core Concept

Each customer is an arrow (vector) starting from the origin. Similar customers have arrows pointing in the same direction.

Same Direction

(0,0) A [10,35] B [12,38]

Nearly parallel arrows
Both RENEW

Different Direction

(0,0) A [10,35] E [2,8]

Large angle between arrows
A: RENEW, E: CHURN

The key insight: Stand at the origin and look along each arrow. If two arrows point the same way, the customers are similar!

Tool #1: The Dot Product

The dot product is a fundamental way to compare two vectors. It combines information about both their direction (alignment) and their magnitude (strength). The recipe is simple: multiply matching numbers, then add them all up.

The Dot Product Recipe

a · b = (a₁ × b₁) + (a₂ × b₂) + (a₃ × b₃) + ...

Pair up the numbers, multiply each pair, add the results

Simple Example

Our Vectors

Vector a = [3, 4]

Vector b = [2, 5]

Multiply Matching Pairs

First components: 3 × 2 = 6

Second components: 4 × 5 = 20

Add Them Up

a · b = 6 + 20 = 26

What Does This Number Tell Us?

The dot product combines two things: how aligned the vectors are (direction) and how large they are (magnitude).

  • Large positive number: Vectors point in similar directions AND/OR have large magnitudes
  • Zero: Vectors are perpendicular (90° angle)
  • Negative: Vectors point in opposite directions

Key insight: The dot product value depends on BOTH the angle between vectors AND their lengths. Two vectors perfectly aligned (0° angle) will have a larger dot product if they're longer.

🎮 Interactive Dot Product Calculator

Enter two vectors and watch the dot product calculation unfold step by step!

Vector A

[3, 4]

Vector B

[2, 5]

Step-by-Step Calculation

Step 1: Multiply first components
3 × 2 = 6
Step 2: Multiply second components
4 × 5 = 20
Step 3: Add them up
6 + 20 = 26
Strong positive alignment!
The vectors point in similar directions

Tool #2: Cosine Similarity

Measuring Direction Only

Cosine similarity measures only the direction (angle) between vectors, completely ignoring their magnitude. It answers the question: "Are these vectors pointing in the same direction?"

The key idea: normalize vectors to length 1 first, then take the dot product. This removes magnitude from the equation, leaving only directional information.

Why normalize to length 1?

When all vectors have the same length, the only difference between their dot products comes from their direction (angle). Size no longer affects the result—we're purely comparing which way they point.

Before Normalization

(0,0) a length 5.0 b length 2.0

Same direction, different lengths

Normalize

After Normalization

(0,0) a b both length 1.0

Same length, now only direction matters!

Key idea: After normalization, both arrows have length 1. Any difference in their dot product now comes only from their direction, not their size.

The Cosine Similarity Formula

cosine_similarity(a, b) = (a · b) / (||a|| × ||b||)

Numerator (Top)

a · b

The dot product
Measures alignment

÷

Denominator (Bottom)

||a|| × ||b||

Product of magnitudes
Removes the effect of size

The Key Insight: Normalizing Vectors

Imagine a teacher grading how well students point in the right direction when asked "Where is north?"

Student A:

Points north with a 1.5 foot arm

Student B:

Also points north with a 2.5 foot arm

The question: Should Student B get a better grade just because they have a longer arm?
No! They both point in the same direction. Arm length shouldn't matter.

The Solution: Unit Vectors

The cosine similarity formula divides the dot product by the magnitudes (||a|| × ||b||). This is mathematically equivalent to converting both vectors to unit vectors—vectors with length exactly 1—and then comparing them.

Why this doesn't change the vector's meaning:

A vector has two properties: direction (which way it points) and magnitude (how long it is). When we normalize to a unit vector, we're scaling it down (or up) to length 1, but the direction stays exactly the same.

Think of it like this: An arrow pointing northeast is still pointing northeast whether it's 1 inch long or 10 feet long. The direction is preserved—only the scale changes.

Original Vector
v = [3, 4]
||v|| = 5
Direction: northeast
Unit Vector
û = [0.6, 0.8]
||û|| = 1
Direction: still northeast!

How to normalize: Divide each component by the magnitude
û = v / ||v|| = [3/5, 4/5] = [0.6, 0.8]
✓ Same direction (northeast)
✓ Now length = 1

What Cosine Similarity Actually Does

By dividing by (||a|| × ||b||), the formula asks: "If both vectors had length 1, what would their dot product be?"

This removes magnitude from the equation, leaving only direction. It's like making all students' arms the same length before judging which way they point. Now we can fairly compare: Do they point the same way?

Output Range: Always -1 to +1

+1.0
Same Direction
Perfect similarity
0.0
Perpendicular
No similarity
-1.0
Opposite Direction
Inverse relationship

Note: For customer data (where all values are positive), cosine similarity is typically between 0 and 1. Negative values only appear if features can be negative.

Why "Cosine"? The Angle Connection

The formula doesn't just happen to be called "cosine similarity" — it literally computes the cosine of the angle between the two vectors.

The Angle → Cosine → Similarity Mapping

angle = 0°
cos(0°) = 1.0

Same direction
Perfect similarity

angle = 90°
cos(90°) = 0.0

Perpendicular
No similarity

angle ≈ 5°
cos(5°) ≈ 0.996

Very small angle
High similarity

angle = 180°
cos(180°) = -1.0

Opposite direction
Inverse relationship

The Key Insight

Cosine Similarity = cos(angle between vectors)

  • Small angle → vectors point same way → cos close to 1 → high similarity
  • Large angle → vectors point different ways → cos close to 0 or negative → low similarity

The name isn't random — the formula literally computes the cosine of the geometric angle between arrows in vector space. This is why it perfectly measures direction while ignoring magnitude.

Calculating Cosine Similarity: Step by Step

Example Calculation

1
Define Vectors

Vector a = [3, 4]

Vector b = [6, 8]

(Note: b is exactly 2× larger than a, same direction)

2
Calculate Dot Product

a · b = (3×6) + (4×8) = 18 + 32 = 50

3
Calculate Magnitudes

||a|| = √(3² + 4²) = √(9 + 16) = √25 = 5

||b|| = √(6² + 8²) = √(36 + 64) = √100 = 10

4
Multiply Magnitudes

||a|| × ||b|| = 5 × 10 = 50

5
Divide to Get Cosine Similarity
cosine_similarity = 50 / 50 = 1.0

Vectors a and b have cosine similarity of 1.0 (perfect similarity). Even though b is twice as large, they point in exactly the same direction!

Putting It All Together: What to Remember

Similarity = Direction

Two customers are similar when their vectors (arrows) point in the same direction in feature space.

Dot Product: Measures Direction + Magnitude

Multiply matching numbers, add them up. Captures BOTH alignment and size—perfect when both matter!

a · b = (a₁ × b₁) + (a₂ × b₂) + ...

Cosine Similarity: Measures Direction Only

Normalize by dividing by magnitudes. Removes size, isolates direction.

cosine_similarity = (a · b) / (||a|| × ||b||)

Small Angle = High Similarity

Cosine similarity measures the angle between vectors:

1.0
Same direction
90°
0.0
Unrelated
180°
-1.0
Opposite

Interactive Cosine Similarity Explorer

Adjust the vectors with sliders and watch how the angle and cosine similarity change in real-time!

A B θ

Vector A

Vector B

Angle
45°
Cosine Similarity
0.71
Similar Direction

Seeing Tools in Action

Real-World Application

Let's see cosine similarity applied to a concrete problem. Dot product applications are most clear in neural networks (attention mechanisms, learned weights), which we'll explore in later chapters.

Customer Classification (Use Cosine Similarity)

The Scenario

Predict if customers will RENEW or CHURN based on their behavior: [support_tickets, usage_hours]. Different customers have different engagement levels (some use the product a lot, some don't).

Question: Should you use dot product or cosine similarity?

The Data

Let's compare what dot product and cosine similarity tell us about customer pairs:

Customer Pair Dot Product
(direction × magnitude)
Cosine Similarity
(direction only)
Actual Outcome
A · B 1,450 High Both RENEW
A · C 1,260 High Both RENEW
E · F 42 High Both CHURN
A · E 300 Low A renews, E churns

Why Cosine Similarity Works Here

Look at E · F: Their dot product is small (42) because both have low overall engagement (small vectors). But when we normalize by dividing by magnitudes, we remove the scale difference and focus purely on direction. Result: high cosine similarity—similar behavioral pattern.

E and F both churn not because they have low engagement, but because they follow a similar behavioral ratio (support tickets vs. usage hours). A machine learning classifier learns to recognize this pattern as predictive of churn. Cosine similarity lets us find customers with similar patterns regardless of their activity level.

Meanwhile, A · E have low cosine similarity—very different directions. A renews, E churns. They follow different behavioral patterns, which cosine similarity correctly identifies.

General Guidelines for Interpreting Cosine Similarity
Cosine Similarity Label Interpretation
0.90–1.00 High Same pattern
0.70–0.89 Moderate Somewhat similar
<0.70 Low Different pattern

Important: These ranges are general guidelines only. Optimal thresholds vary significantly by domain, data distribution, and task requirements. Always validate with your specific data using metrics like precision-recall curves.

Use cosine similarity when grouping by behavioral pattern, ignoring differences in scale or intensity.

The Space We've Been Exploring

Let's Revisit Our Customer Scatter Plot

Throughout Chapters 4 and 5, we've been plotting our customers on a 2D graph. Let's take a step back and really look at what we've created. Something profound is happening in this simple plot.

Usage Hours (per week)
Months Subscribed
Customer A: [10, 35] - RENEW
Customer B: [12, 38] - RENEW
Customer C: [14, 32] - RENEW
Customer D: [11, 33] - RENEW
Customer E: [2, 8] - CHURN
Customer F: [1, 5] - CHURN
Customer G: [3, 7] - CHURN
Customer H: [2, 10] - CHURN

What Do You Notice?

  • RENEW customers (green) cluster together in the upper-right
  • CHURN customers (red) cluster together in the lower-left
  • There's clear space between the two groups
  • Each cluster has a "center" where points are most dense

Understanding the Mathematical Structure

This scatter plot isn't just a convenient visualization — it represents a vector space, a mathematical structure where:

  • Each customer is represented as a point (determined by their feature vector)
  • Distance between points indicates how similar or different customers are
  • The geometry of the space naturally reveals patterns in the data

Key Insight: Clusters form automatically because customers with similar behaviors have similar vectors, which places them close together in this space. The mathematical structure reflects real-world patterns!

What Is a Vector Space?

We've been plotting customers as points in a 2D space. Each customer is a vector [months, hours], and each vector becomes a point we can plot. Similar customers cluster together because their vectors are similar. This geometric space where vectors live is called a "vector space."

The key insight: every vector is a point, and every point is a vector. The geometry of the space — distances, angles, clusters — captures real relationships in the data.

Vector Spaces Can Have Any Number of Dimensions

Our customer example uses 2 dimensions (months and hours). But vector spaces work with any number of dimensions — and the same mathematical principles apply to all of them:

1D Space
→ 1 axis

Just a number line. One feature only.

Example: Just "months subscribed"

2D Space
x y

A plane. Two features.

Example: Our customer space!

3D Space
3 axes (x,y,z)

Physical space. Three features.

Example: Add "support tickets"

ND Space
... N dimensions (e.g., 10+ features!)

Can't visualize, but same rules!

Example: 10+ customer features

The Two Rules That Make It Work

Whether dealing with 2 dimensions or 100 dimensions, vector spaces follow two simple mathematical rules. These rules are what make all the operations possible:

Two Mathematical Rules Always Work:

1. Adding any two vectors gives another valid vector

This is called "closure under addition" - the result always stays in the same space.

Example with our customers:
Customer A = [6 months, 40 hours]
Customer B = [3 months, 10 hours]
────────────────────────────────
A + B = [9 months, 50 hours] ← Still a valid customer vector!

Why this matters: When we combine customer patterns, we get meaningful results. The sum represents "combined behavior" that makes sense in our feature space.

2. Multiplying a vector by any number gives another valid vector

This is called "closure under scalar multiplication" - stretching or shrinking always stays in the space.

Example with our customers:
Customer A = [6 months, 40 hours]
────────────────────────────────
2 × A = [12 months, 80 hours] ← Still a valid customer vector!
0.5 × A = [3 months, 20 hours] ← Also valid!

Why this matters: We can scale patterns up or down. Doubling the vector represents "twice as intense" of that behavior pattern.

The Power of These Simple Rules

These two properties might seem simple, but they enable:

  • Finding patterns: Machine learning models can combine and scale feature vectors to discover customer segments
  • Making predictions: Classification algorithms use vector addition to compute decision boundaries
  • Measuring similarity: Dot product and cosine similarity work because these operations preserve the vector space structure

Looking ahead: In Chapter 7 (Embeddings), we'll see these exact same rules applied to word meanings instead of customer features. The mathematical principles are identical - only what the vectors represent changes!

What Do Dimensions Actually Represent?

We just learned that vector spaces can have 2 dimensions, 3 dimensions, or many more dimensions. But here's the question we haven't answered yet: What are these dimensions actually measuring?

In our customer example, we chose 2 specific things to measure: months subscribed and usage hours. Those became our 2 dimensions. But we could have chosen different measurements. Let's understand this crucial concept.

Dimensions = Traits/Features

Every dimension (axis) in our vector space represents one measurable trait or feature. Let's start with our simple example and build up to understand how machine learning models use these features.

Our 2D Customer Space

Dimension 1 (X-axis)
= Months Subscribed
Range: 0-20 months

This dimension captures loyalty/tenure

Dimension 2 (Y-axis)
= Usage Hours per Week
Range: 0-50 hours

This dimension captures engagement level

These 2 dimensions give us a 2D snapshot of customer behavior. But what if we added more?

Expanding to 3D: Adding Another Trait

Dimension 3 (Z-axis)
= Support Tickets Opened
Range: 0-10 tickets

Now we capture satisfaction/issues

Now each customer is a point in 3D space!

Each new dimension adds one more way to describe customers. More dimensions = richer, more nuanced descriptions.

Expanding Further: More Customer Features

We could keep adding dimensions to capture more customer characteristics:

Dimension 4
= Number of Feature Requests

Captures product engagement

Dimension 5
= Days Since Last Login

Captures recent activity

Dimension 6
= Number of Team Members Added

Captures growth/expansion

The Key Principle

Each dimension is simply one measurable feature. More dimensions = more detailed customer profile. Each dimension is like one column in a spreadsheet.

Our 2D space uses 2 features (months, usage). We could use 6 features, 10 features, or even more. The mathematics works the same way - vectors still represent points, distance still measures similarity.

Looking ahead: In Chapter 7, we'll see how this same principle applies when AI systems automatically learn which features matter for representing word meanings.

What's Next

We've covered vectors—how they represent data as points in space, how dot products and cosine similarity measure relationships, and how vector spaces work in any dimension.

But what about processing thousands of customer vectors at once? Or transforming an entire dataset from one vector space to another? That's where matrices come in.

Coming Up in Chapter 6: Matrices

We'll see how matrices let us operate on many vectors simultaneously, how neural networks use matrix multiplication to process entire batches of data, and why GPUs are essential for modern AI. The vector operations covered here become the building blocks for understanding how deep learning computes.

Key Takeaways

Vectors & Deep Learning Foundations

Core concepts from this chapter:

1

Two Tools, Two Questions

Dot product combines direction and magnitude. Cosine similarity isolates direction only. Choose based on your problem.

2

The Formulas Are Simple

Dot product: multiply matching elements, add them up. Cosine similarity: dot product ÷ magnitudes.

3

Both Tools Power Machine Learning

Classification models use dot product for decision boundaries and cosine similarity for comparing feature patterns - understanding both is essential.

4

What Each Tool Measures

Dot product: larger when vectors align AND are large. Cosine similarity: ignores size, measures angle between vectors.

5

Vector Spaces Reveal Patterns

Vectors aren't just numbers - they're points in space. Similar vectors cluster together, and this geometry naturally reveals patterns in data.

6

Dimensions = Features

Each dimension represents one measurable trait. Our 2D customer space has 2 traits (months subscribed, usage hours). Real systems add more features. Same math, more detail.

7

High Dimensions Work the Same Way

We can't visualize 10D or 100D customer spaces, but the mathematics is identical. Distance, angles, and clusters work exactly the same in any dimension.

The Foundation:
Vectors represent data as points in space. Dot product measures alignment and magnitude. Cosine similarity measures pure direction.

These operations are used in search engines, recommendation systems, and language models. Understanding these fundamentals helps make sense of more complex deep learning concepts.

Test Your Understanding

Test what we've learned about vectors and deep learning!

1. What does a vector represent in machine learning?

2. When should you use cosine similarity instead of dot product?

3. If two vectors have a cosine similarity of 1.0, what does this mean?

4. What does a "dimension" represent in vector space?

5. Why do similar items form clusters in vector space?