3.17 Algorithm Efficiency Lesson

How to use lessons

Lessons will guide you through each concept using questions.

It is vital to try your best at each question, even if they are difficult. The only way to build understanding and remember things is to actively interact with the concepts, so do not guess answers unless as a last resort.

Read everything that is shown to you and do not skip any text sections. Everything has a purpose and will further your understanding.

Click continue to continue to the next section.

Imagine you are doing a research project. You have found a digital library containing every existing published book. You want to find books that contain specific keywords. For example, if you were researching bats, you would probably want to find books containing the keyword "bat".

Assume that the words of every book in that library are stored in a list and sorted alphabetically, and you can use a procedure from an API that takes in an index from that list and returns a book corresponding to the word located at that index. Also assume that the > and < operators work on strings by comparing their alphabetical position. You have two different algorithms that can search that list for a keyword's index.

Linear Search

PROCEDURE linearSearch(list, targetValue)
{
    index ← 1
    REPEAT UNTIL(index > LENGTH(list))
    {
        IF(list[index] = targetValue)
        {
            RETURN(index)
        }
        index ← index + 1
    }
}

Binary Search

PROCEDURE binarySearch(list, targetValue)
{
    startIndex ← 1
    endIndex ← LENGTH(list)
    REPEAT UNTIL(endIndex < startIndex)
    {
        middleIndex ← startIndex + (endIndex - startIndex) / 2
        middleIndex ← middleIndex - middleIndex % 1
        IF(list[middleIndex] = targetValue)
        {
            RETURN(middleIndex)
        }
        ELSE
        {
            IF(list[middleIndex] > targetValue)
            {
                endIndex ← middleIndex - 1
            }
            ELSE
            {
                startIndex ← middleIndex + 1
            }
        }
    }
    RETURN(-1)
}

An algorithm's efficiency can be determined by counting the number of operations it performs.

Every iteration of the REPEAT UNTIL loops in the algorithms above is considered an "operation".

How many operations does the following procedure call perform? linearSearch(["and", "because", "many", "one", "with"], "with")

1:4

Linear search checks if each value in a list is equal to a target value, iterating through the list from start to finish.

Since "with" is at the end of the list inputted into linearSearch, and there are five elements in that list, the REPEAT UNTIL loop runs 5 times, meaning the call performs 5 operations.

How many operations does the following call perform? binarySearch(["and", "because", "many", "one", "with"], "with")

2:3

First, "with" is compared to the middle element of the list "many". Since "many" comes before "with" in alphabetical order, all the elements with an index less than or equal to the middle index are removed. The elements "one" and "with" are left behind. This is one operation.

Next, since the algorithm chooses the element with a lower index if there are two middle elements, "one" is compared to "with". "one" comes before "with" in alphabetical order, so "one" is removed. This is another operation

Finally, the last remaining element "with" is compared to "with". Since they are equal, the procedure returns the last index of the list, which is where "with" is stored. This is the final operation.

Summing up the operations, the procedure call performs 3 operations.

The questions above demonstrate a concept called worst-case scenario.

Worst-case Scenario Definition Scenarios where a procedure performs the maximum possible number of operations for some input size \(n\)

The linear search call above is a worst-case scenario. Since the target value was at the end of the list in the call, the procedure had to go through every single element in the list before finding the target value. If, instead, the target value was at the start of the list in the call (i.e. linearSearch(["with", "and", "because", "many", "one"], "with")), the procedure would immediately find the target value, which would not have been a worst-case scenario.

The binary search call above is also a worst-case scenario.

You call linear search again: linearSearch(["and", "because", "many", "one", "with"], "with"). What is the minimum number of elements you need to add to the start of this list for the procedure to perform one extra operation?

3:1

Every element added before the target value makes the REPEAT UNTIL loop in linear search run an extra time. Thus, each added element adds one extra operation, so you only need to add one element to add one operation.

You call binary search again: binarySearch(["and", "because", "many", "one", "with"], "with"). What is the minimum number of elements you need to add to the start of this list for the procedure to perform one extra operation?

4:3

Each time the REPEAT UNTIL loop in binary search runs, either half or one more than half of the list is eliminated, so each time this happens, the procedure performs one operation.

Adding 3 elements to the start of the list creates a list with 8 elements. In the first operation, the first half of the 8 elements are eliminated. 4 elements remain. In the second operation, the first half of those are eliminated. 2 remain. After the third operation, 1 remains. In the fourth and final operation, the one remaining element "with" is compared to the target value.

Following the same logic for a list with 7 elements: 3 operations are performed on a list with 7 elements. 3 operations were also performed for the original 5-element list.

Thus, you need to add a minimum of 3 elements to the list to add one operation.

You call binary search one final time with a list of 8 elements. Assuming the worst-case scenario, what is the minimum number of elements you need to add to this list for the procedure to perform one extra operation?

5:4

Following the same logic as in the explanation for the previous problem, binary search performs 5 operations on a list of 16 elements (8 elements from the original list + the 8 elements you added).

Binary search will perform 4 operations for any list with 8 to 15 elements.

Thus, you need to add a minimum of 8 elements to the list to add an operation.

For a list with \(n\) number of elements, how many operations will linear search perform in the worst-case scenario?

6:1

In the worst-case scenario, every element in the list of \(n\) elements must be checked. Thus, linear search will perform \(n\) operations for a list of \(n\) elements.

For a list with \(n\) number of elements, how many operations will binary search perform in the worst-case scenario?

Log is the inverse function of an exponent, so the following equation is true:
\(\log_y(y^{x})=x\)

\(\text{floor}(x)\) rounds \(x\) to the nearest integer less than or equal to \(x\), and \(\text{ciel}(x)\) rounds \(x\) to the nearest integer greater than or equal to \(x\).

Notice how the number of operations increase by one at each power of 2 number of list elements. \(n=2^1=2\) list elements require 2 operations, \(n=2^2=4\) require 3 operations, \(n=2^3=8\) require 4 operations, and so on.

7:3

Notice how the number of operations increase by one at each power of 2 number of list elements. \(n=2^1=2\) list elements require 2 operations, \(n=2^2=4\) require 3 operations, \(n=2^3=8\) require 4 operations, and so on.

By using the inverse function of the exponent, \(\log\), we can find which power of 2 equals \(n\). \(\log_2(n)\) is a good start.

However, \(\log_2(n)\) outputs one less operation than the actual amount for each power of 2 number of list elements (e.g. \(\log_2(8)\) outputs 3 but 4 operations are performed in actuality on a list of 8 elements). Thus, we need to add 1 to this log: \(\log_2(n)+1\).

For any \(n\) between two powers of 2, \(\log_2(n)+1\) outputs a decimal number greater than the output for the power of 2 less than \(n\), which is nonsense because there cannot be a decimal amount of operations! Since any \(n\) between a power of 2 number of list elements require the same amount of operations as the closest power of 2 less than \(n\) list elements, the \(\log_2(n)\) can be surrounded by a \(\text{floor}\) function to get the correct number of operations binary search performs on a list of \(n\) elements: \(\text{floor}(\log_2(n))+1\).

Going back to our digital library of books, let's say the library contains 129,864,880 books, with each book containing 100,000 words. You need to search a list of 12,986,488,000,000 words (129,864,880 books times 100,000 words) for specific keywords.

Your computer has a CPU that can perform 3,700,000,000 operations per second. You run binary and linear search on your computer.

Using the formulas we built in the last two problems, how many seconds would it take linear search to find a keyword in this list of words in the worst-case scenario? Round your answer to the nearest integer. Use a calculator!

3510

The number of operations linear search performs is the same as the number of elements in the inputted list. Thus, we only need to divide the list length by 3,700,000,000.

\(12,986,488,000,000\div3,700,000,000\approx3510\)

How many seconds would it take binary search to find a keyword in this list of words in the worst-case scenario? Round your answer to the nearest integer. Use a calculator!

0

Plugging the number of elements in the list into the binary search operations formula, we get:
\(\text{floor}(\log_2(12,986,488,000,000))+1=44\)

Dividing this number by 3,700,000,000, we get:
\(44\div3,700,000,000\approx0\)

Notice the giant difference in run times for binary and linear search. Linear search takes approximately an hour to search the library, while binary search takes mere milliseconds!

While binary search may seem like the go-to algorithm for searching a list, it requires the list to be sorted first.

\(n^2\) describes the worst-case scenario efficiency of a sorting algorithm.

Let's say we have a list with 10 unsorted elements. What is the minimum number of binary and linear searches that must be made for the total number of operations performed by those linear searches to exceed the total number of operations performed by both those binary searches and the sorting algorithm?

The sorting algorithm is only run once and assume worst-case scenario.

8:4

One run of the sorting algorithm costs 100 operations (\(10^2\)). Every run of binary search costs 4 operations (\(\text{floor}(\log_2(10))+1=4\)).

Every run of linear search costs 10 operations.

\(x\) runs of linear search on the list costs \(10x\) operations. \(x\) runs of binary search on the list costs \(4x+100\) operations.

We can solve those two linear equations for the value of \(x\) at which they are equal: $$10x=4x+100$$ $$6x=100$$ $$x\approx16.67$$

Since the equation for the total cost of \(x\) runs of linear search has a larger slope than the equation for binary search, the cost of binary search falls under linear search for any \(x\) over 16.67. 17 is closest integer greater than 16.67, so 17 is the answer.

Binary search is only worth using when either the list you have is already sorted or you need to search the list many times.

An algorithm's efficiency is often described by formulas like the ones you made above. We often say an algorithm's efficiency is a function of \(n\) (\(f(n)\)) where \(n\) is the input size. In the case of binary and linear search, the input size is the number of elements in the inputted lists.

When the function \(f(n)\) that describes an algorithm's efficiency is a polynomial, we say that that algorithm runs in polynomial time. Similarly for exponential and factorial functions.

Algorithms are classified into either reasonable or unreasonable time based on the functions that describe their efficiency.

When a problem cannot be solved in reasonable time, heuristics are often used.

Heuristic Definition A technique used to solve a problem that may not be completely accurate when 100% accurate solutions are impractical

Unfortunately, the list of words from the digital library is unsorted, so you will need to make a decision: to use binary search or to use linear search.

Summary Notes Finish Lesson