Understanding the Nuances of Insertion Sort Analysis
Written on
Chapter 1: Overview of Insertion Sort
The efficiency of the INSERTION-SORT algorithm is influenced significantly by the nature of its input. For instance, sorting a thousand elements will generally take more time than sorting just three. Additionally, the time complexity can vary even among inputs of the same size, depending on their initial order. As a rule, an algorithm's execution time tends to increase with input size, making it common to express the running time as a function of the input size. To clarify this, we must define "running time" and "input size" more precisely.
The most suitable measure of input size can vary depending on the specific problem at hand. For many scenarios, including sorting or calculating discrete Fourier transforms, the most intuitive metric is simply the number of items involved — for instance, the size of the array, denoted as n in sorting tasks. Conversely, for other problems, like multiplying integers, the optimal measure might be the total number of bits required to represent the input in standard binary form. In certain cases, it may be more effective to describe input size using two dimensions rather than one; for example, in graph-related algorithms, input size can be characterized by the number of vertices and edges present.
The running time for an algorithm with a specific input refers to the count of primitive operations or "steps" carried out. It is useful to frame the concept of a step in a way that minimizes machine dependence. For our current purposes, we will assume that executing any given line of pseudocode requires a constant amount of time. While some lines may take longer than others, we will treat the execution time for the i-th line as cᵢ, where cᵢ is a constant.
In the sections that follow, we will refine our expression for the running time of INSERTION-SORT from a complex formula involving all statement costs cᵢ to a more straightforward notation that is both concise and manageable. This streamlined approach will facilitate comparisons of efficiency between different algorithms.
To begin, we will outline the INSERTION-SORT procedure, detailing the time "cost" associated with each statement and the frequency of their execution. For each j = 2, 3, …, n, where n equals A.length, let tᵢ represent the number of times the condition in the while loop (line 5) is evaluated for that specific j. When a for or while loop is structured in the conventional manner (i.e., based on the condition in the loop header), the condition is checked one additional time compared to the loop body. We will disregard comments as they do not constitute executable statements, thus incurring no time cost.
The overall running time of the algorithm can be calculated as the sum of the running times for each executed statement; if a statement requires cᵢ steps to complete and runs n times, it contributes cᵢ to the total running time. To derive T(n), the running time of INSERTION-SORT for an input of n values, we sum the products of the costs and frequencies, resulting in:
Even among inputs of identical size, the running time of an algorithm can differ based on the specific input provided. In the case of INSERTION-SORT, the optimal scenario occurs when the array is already sorted. For each j = 2, 3, …, n, we find that A[j] ≤ key at line 5 when i starts with the value of j - 1. Thus, tᵢ = 1 for j = 2, 3, …, n, leading to the best-case running time being expressible as an + C, where a and b are constants dependent on the costs cᵢ; consequently, it forms a linear function of n.
Section 1.1: Video Analysis of Insertion Sort
To further enhance your understanding, check out the following video that provides a detailed algorithmic analysis of Insertion Sort, focusing on its best and worst-case scenarios.
Subsection 1.1.1: Visualizing Time Complexity
Section 1.2: Further Exploration
For an in-depth exploration of space and time complexities associated with Insertion Sort, view the next video that analyzes the algorithm comprehensively.