Mean, median, mode and other statistical functions

From Computer Science Wiki

Warning.png Many students have tried this problem and found it difficult. Please be patient and don't give up!!

This a problem set for you to work through [1]

This is a problem set. Some of these are easy, others are far more difficult. The purpose of these problems sets are to HELP YOU THINK THROUGH problems. The solution is at the bottom of this page, but please don't look at it until you have tried (and failed) at least three or four times.


What is this problem set trying to do[edit]

You are going to use a number of built-in methods here. If you complete this problem set, you will have shown me you understand:

  1. sorting lists click here to learn a bit about sorting
  2. counting occurrences in a list click here for a review of counting
  3. you will return to an old friend, modulo click here for a refresher
  4. max function click here to learn more about max
  5. min function click here to learn more about min

The Problem[edit]

Please program the following functions:

  1. Mean For a data set, the terms arithmetic mean, mathematical expectation, and sometimes average are used synonymously to refer to a central value of a discrete set of numbers: specifically, the sum of the values divided by the number of values.[2]
  2. Mode The mode is the value that appears most often in a set of data. [3]
  3. Median In statistics and probability theory, a median is the number separating the higher half of a data sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one (e.g., the median of {3, 3, 5, 9, 11} is 5). If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values[4]

Some Code to Get You Started[edit]

list=[2,3,3,2,3,2,3,9,7,3,4,8,1,2,8,7,6,5,8,9,1,2,3,2,1,4,3,2,1,4,5,4,1,6,9,6,1,4,2,3,5]


def mean(list):
    answer = sum(list)   
    ...
    return mean

def mode(list):
    frequency = {}
    highest = max(list)
    lowest = min(list)
    # in this loop, we simply update our list named "frequency with the count of values.
    # we use highest + 1 because the range function doesn't include the last value.
    for i in range(lowest,highest+1):
        ...
    return mode

def median(list):
    new_list = sorted(list)
    ...
    return median
   
# your program must return the correct answers for the questions below: 

print("the mean of list is: " + str(mean(list)))
print("the median of list is: " + str(median(list)))
print("the mode of list is: " + str(mode(list)))

Take This Further[edit]

  1. plot (graphically - with ascii art) the range of numbers
  2. calculate the standard deviation of a range of numbers

How you will be assessed[edit]

Every problem set is a formative assignment. Please click here to see how you will be graded

References[edit]

A few different possible solutions[edit]

Click the expand link to see one possible solution, but NOT before you have tried and failed!

list=[2,3,3,2,3,2,3,9,7,3,4,8,1,2,8,7,6,5,8,9,1,2,3,2,1,4,3,2,1,4,5,4,1,6,9,6,1,4,2,3,5]

def mean(list):
    answer = sum(list)   
    mean = answer / len(list)
    return mean

def mode(list):
    frequency = {}
    highest = max(list)
    lowest = min(list)
    # in this loop, we simply update our dictionary named "frequency with the count of values.
    for i in range(lowest,highest+1):
        frequency.update({i:list.count(i)})
    values = frequency.values()
    keys = frequency.keys()
    mode = keys[values.index(max(values))]
    return mode

def median(list):
    new_list = sorted(list)
    if len(new_list) % 2 == 1:
        median = new_list[len(list)/2]
    return median
   
print("the mean of list is: " + str(mean(list)))
print("the median of list is: " + str(median(list)))
print("the mode of list is: " + str(mode(list)))

The example below includes standard deviation and a bar graph.

import matplotlib.pyplot as plt
 
list =[2,3,3,2,3,2,3,9,7,3,4,8,1,2,8,7,6,5,8,9,1,2,3,2,1,4,3,2,1,4,5,4,1,6,9,6,1,4,2,3,5,5]
numbers = [1,2,3,4]
 
def graph(graph_list):
    count = [1]
    sorted_list = sorted(graph_list)
    highest = max(sorted_list)
    lowest = min(sorted_list)
    for i in range(lowest,highest+1):
        count.append(i)
    plt.hist(sorted_list, bins=count)
    plt.ylabel('Occurences')
    plt.xlabel('Number')
    plt.show()
   
def mean(mean_list):
    mean = sum(numbers)/len(numbers)  
    return mean
 
def mode(mode_list):
    frequency = []
    count = []
    sorted_list = sorted(mode_list)
    highest = max(sorted_list)
    lowest = min(sorted_list)
    for i in range(lowest,highest+1):
        frequency.append(sorted_list.count(i))
        count.append(i)
    highest_frequency = max(frequency)
    index_count = frequency.index(highest_frequency)
    mode = count[index_count]
    return mode
 
def median(median_list):
   sorted_list = sorted(median_list)
   if len(sorted_list)%2 ==1:
    index = len(sorted_list)/2
    median = sorted_list[index]
   else:
       index_1 = len(sorted_list)/2
       index_2 = len(sorted_list)/2 - 1
       median = []
       median.append(sorted_list[index_1])
       median.append(sorted_list[index_2])
       median = mean(median)
       print("The median is rounded up")          
   return median
 
 
def standev(stdev_list):
    mean_list = []
    mean_num = mean(stdev_list)
    for i in range (0,len(stdev_list)):
        mean_list.append(stdev_list[i]-mean_num)
    mean_difference = mean(mean_list)
    stdev = mean_difference**1/2
    return stdev
   
# your program must return the correct answers for the questions below:
 
print("the mean of list is: " + str(mean(numbers)))
print("the median of list is: " + str(median(numbers)))
print("the mode of list is: " + str(mode(numbers)))
print("the standard deviation of list is: " + str(standev(numbers)))
graph(numbers)

The code below approaches mode differently than the two examples above (which use a dictionary).

list=[2,3,3,2,3,2,3,9,7,3,4,8,1,2,8,7,6,5,8,9,1,2,3,2,1,4,3,2,1,4,5,4,1,6,9,6,1,4,2,3,5]
 
def mean(list):
    mean = sum(list)/len(list)  
    return mean
 
def median(list):
    new_list = sorted(list)
    median = new_list[len(new_list)/2]
    return median
   
def mode(list):
    current_top = 0
    highest = max(list)
    lowest = min(list)
    for i in range(lowest,highest+1):
        new_possible_top = list.count(i)
        if new_possible_top > current_top:
            current_top = new_possible_top
            mode = i
    return mode
   
# your program must return the correct answers for the questions below:
print("the mean of list is: " + str(mean(list)))
print("the median of list is: " + str(median(list)))
print("the mode of list is: " + str(mode(list)))