Statistics 1

Section 3.1 Statistics 1

Statistics is the branch of mathematics that deals with collecting, organizing, analyzing and interpreting data in order to make decisions or draw conclusions.

🔗

In our daily lives, we are constantly surrounded by data from school exam results, sports scores, population figures, business sales, weather reports, to health records.

🔗

Understanding statistics helps us make sense of this information and use it to solve real life problems.

🔗

Subsection 3.1.1 Collection of Data

Subsubsection 3.1.1.1 Sources of Data

Activity 3.1.1.

\(\textbf{Work in groups}\)

🔗

A teacher assigned students a research topic on the effects of social media on teenage mental health in Kenya.

🔗

Suggest one potential primary data collection method that could be used.
🔗

🔗
Suggest one potential secondary data source that could be used.
🔗

🔗
What is the difference between primary and secondary data?
🔗

🔗
Why is secondary data important for research and decision-making?
🔗

🔗
Discuss and share with other groups.
🔗

🔗

🔗

\(\textbf{Key Takeaway and Definitions}\)

🔗

Data can be obtained from primary sources, which are original first-hand data collected directly for a specific purpose (e.g., surveys, interviews),

🔗

secondary sources, which are previously collected data used for a different purpose (e.g., government reports, research articles),

🔗

and tertiary sources, which summarize and compile information from primary and secondary sources (e.g., dictionaries, textbooks, encyclopedias).

🔗

Example 3.1.1.

Classify the Following as Primary or Secondary Data Sources:

🔗

A student conducts a survey to find out the favorite sports of their classmates.

🔗
A teacher uses last year’s national exam results to analyze student performance trends.

🔗
A researcher reads a government report on the most common diseases in Kenya.

🔗
A doctor observes a patient’s symptoms and records them for a medical study.

🔗
A scientist conducts an experiment to test the growth rate of plants under different conditions.

🔗

🔗

Solution.

Primary because the student is collecting first-hand data directly from people.

🔗
Secondary because the exam results were already collected and recorded by an external body.

🔗
Secondary because The report was collected and published by someone else for a different purpose.

🔗
Primary because The doctor is directly collecting new data from real-life observation.

🔗
Primary because The scientist is generating new data through an experiment.

🔗

🔗

Example 3.1.2.

A community group wants to understand the needs of youth in their area.

🔗

What primary data collection methods could the group use?
🔗

🔗
What problems might they have when trying to find this information?
🔗

🔗
How can they use the information they find to help the community?
🔗

🔗
What secondary data sources could be useful to this community group?
🔗

🔗

🔗

Solution.

Focus groups, surveys, interviews.
🔗

🔗
- Overcoming language barriers.
  🔗
  
  🔗
- Getting honest responses.
  🔗
  
  🔗
🔗
🔗
- They can start new programs for young people.
  🔗
  
  🔗
- They can tell leaders what young people need.
  🔗
  
  🔗
🔗
🔗
- Academic studies on youth issues.
  🔗
  
  🔗
- Government reports on youth.
  🔗
  
  🔗
🔗
🔗

🔗

Exercises Exercises

1.

Grade \(10\) students from Korinyang Primary School went to Lake Nakuru National Park and counted the flamingos they saw for their Biology project.

🔗

Is this a primary or secondary data source?
🔗

🔗
Give reason.
🔗

🔗

🔗

2.

In a market, a shop owner watches what customers buy most often and looks for trends based on what they see in the store.

🔗

Which sources of data can the shop owner use to collect the data?

🔗

3.

Define the following terms.

🔗

Data Source

🔗
Raw Data

🔗
Tertiary Sources

🔗
Primary sources

🔗

🔗

4.

What are the advantages of using secondary data over primary data in some research situations?

🔗

Subsubsection 3.1.1.2 Methods of Data Collection

Activity 3.1.2.

\(\textbf{Work in groups}\)

🔗

The following data represents the number of hours people spend watching television per week.
🔗
- \(0 - 5\) hours: \(20\) people
  🔗
  
  🔗
- \(6 - 10\) hours: \(35\) people
  🔗
  
  🔗
- \(11 - 15\) hours: \(25\) people
  🔗
  
  🔗
- \(16 - 20\) hours: \(10\) people
  🔗
  
  🔗
🔗
🔗
Create a table to organize this data.
🔗

🔗
What type of data collection method was most likely used to gather this data?
🔗

🔗
What are some other questions that could be asked to further explore this topic?
🔗

🔗
Discuss and share with other groups.
🔗

🔗

🔗

\(\textbf{Key Takeaway and Definitions}\)

🔗

Methods of collecting data help us gather information. We can collect data directly through surveys, interviews, or experiments, or indirectly from books, reports, and online sources.
🔗

🔗

🔗

Example 3.1.3.

A student wants to find out the most popular extracurricular activities among their classmates.

🔗

What data collection method would be most appropriate?
🔗

🔗
What are two examples of specific questions they could ask?
🔗

🔗

🔗

Solution.

Survey/Questionnaire.
🔗

🔗
Questions for students to ask are;
🔗
- What extracurricular activities do you participate in?
  🔗
  
  🔗
- How often do you participate in these activities?
  🔗
  
  🔗
🔗
🔗

🔗

Example 3.1.4.

A local bakery, conducted a study asking customers about their favorite types of pastries. The results are:

🔗

Cakes: \(45\)%
🔗

🔗
Cookies: \(30\)%
🔗

🔗
Breads: \(20\)%
🔗

🔗
Doughnuts: \(5\)%
🔗

🔗

🔗

What type of data collection method was most likely used to gather this data?
🔗

🔗
What is the most popular type of pastry among customers?
🔗

🔗
What percentage of customers prefer either Cakes or Cookies?
🔗

🔗
What type of pastry is the least popular among customers?
🔗

🔗

🔗

Solution.

A survey.
🔗

🔗
Cakes
🔗

🔗
\(45% \) + \(30%\)= \(75%\)
🔗

Therefore, the percentage of customers who prefer either Cakes or Cookies are \(75%\)
🔗

🔗
Doughnuts
🔗

🔗

🔗

Exercises Exercises

1.

Which data collection method would be most suitable for finding out the most common types of litter found in your school compound?

🔗

Interviews
🔗

🔗
Surveys sent to parents
🔗

🔗
Observations
🔗

🔗
Analyzing government reports
🔗

🔗

🔗

2.

Describe one ethical consideration you should keep in mind when conducting interviews with community members.

🔗

3.

Why is it important to keep accurate records when conducting observations?

🔗

4.

Give one example of a situation where you would use secondary data collection in a Geography class.

🔗

Subsection 3.1.2 Representing Data using a Frequency Distribution Table

Activity 3.1.3.

\(\textbf{Work in groups}\)

🔗

Below are the weekly pocket expenses (in Ksh) of a randomly selected group of \(25\) students.
🔗

\(120, 150, 180, 200, 220, 250, 270, 290, 300, 320, 350, 370, 390, 400,\)
🔗

\(420, 450, 470, 480, 490, 500, 340, 230, 280, 410, 330\)
🔗

🔗
Draw a grouped frequency distribution table with \(5\) classes to represent the data?
🔗

🔗
Identify the number with the highest frequency.
🔗

🔗
Compare your answer with others in class.
🔗

🔗

🔗

\(\textbf{Key Takeaway and Definitions}\)

🔗

A frequency distrribution table is a table that shows an event and how many times it happens.

🔗

There are two types:

🔗

\(\textbf{Ungrouped Frequency Distribution}\text{:}\) for small datasets with individual values.
🔗

🔗
\(\textbf{Grouped Frequency Distribution}\text{:}\) for large datasets where values are grouped into intervals.
🔗

\(\textbf{Steps to construct a grouped frequency distribution table}\)
🔗
1. Determine the Range of Data
  🔗
  
  \begin{align*} \textbf{Range} = \amp \textbf{ Maximum Value} - \textbf{ Minimum Value} \end{align*}
  
  🔗
  
  🔗
2. Decide the Number of Classes (Groups)
  🔗
  
  🔗
3. Calculate the Class Width
  🔗
  
  \begin{align*} \textbf{Class width} = \amp \frac{\textbf{ Range}}{\textbf{ Number of classes}} \end{align*}
  
  🔗
  
  🔗
4. Establish Class Boundaries
  🔗
  - Begin with the Lowest Value: Use the smallest data value as the lower limit of the first class.
    🔗
    
    🔗
  - Determine the Upper Limit: Add the class width to establish the upper limit of the first class and the lower limit of the next class.
    🔗
    
    🔗
  - Repeat the Process: Continue this pattern until all class intervals are created.
    🔗
    
    🔗
  🔗
  🔗
5. Tally the Frequencies
  🔗
  - Count how many data points fall within each class interval and record the frequency.
    🔗
    
    🔗
  🔗
  🔗
6. Complete the Table
  🔗
  - Class Interval
    🔗
    
    🔗
  - Tally Marks
    🔗
    
    🔗
  - Frequency(f)
    🔗
    
    🔗
  🔗
  🔗
🔗
🔗

🔗

Example 3.1.5.

The following data represents test scores of \(20\) students in a grade \(10\) class.

🔗

\(45, 50, 55, 50, 60, 70, 75, 80, 70, 55, 60, 65, 50, 55, 45, 60, 75, 80, 70, 50\)

🔗

Prepare ungrouped frequency distribution table for the dataset.

🔗

Solution.

Table 3.1.6.

🔗

Test scores	Tally	Frequency
\(45\)	\(//\)	\(2\)
\(50\)	\(////\)	\(4\)
\(55\)	\(///\)	\(3\)
\(60\)	\(////\)	\(4\)
\(65\)	\(/\)	\(1\)
\(70\)	\(///\)	\(3\)
\(75\)	\(//\)	\(2\)
\(80\)	\(//\)	\(2\)

🔗

Example 3.1.7.

The number of customers visiting a supermarket over \(30\) days were recorded as follows:

🔗

\(135, 125, 140, 160, 145, 120, 150, 140, 130, 125, 135, 155, 140, 135, 130,\)

🔗

\(155, 150, 160, 145, 140, 120, 145, 135, 140, 150, 130, 150, 125, 145, 120\)

🔗

Draw a grouped frequency distribution table with \(5\) classes to represent the data?

🔗

Solution.

To prepare frequency table for the grouped data above, we need to first find the range for the data.

🔗

\begin{align*} \textbf{Range} = \amp \textbf{ Maximum Value} - \textbf{ Minimum Value} \end{align*}

🔗

Maximum value = 160

🔗

Minimum value = 120

🔗

\begin{align*} \textbf{Range} = \amp 160 - 120\\ \amp 40 \end{align*}

🔗

Range is \(40\)

🔗

Next, Determine Class Width

🔗

\begin{align*} \textbf{Class width} = \amp \frac{\textbf{ Range}}{\textbf{ Number of classes}}\\ \amp \frac{40}{5} = 8 \end{align*}

🔗

Class widths are \(8\)

🔗

Create Class Intervals

🔗

Starting from 120, we create intervals of width 8:

\(120 - 127\)

\(128 - 135\)

\(136 - 143\)

\(144 - 151\)

\(152 - 160\)

Tally the Data

We count how many values fall into each interval.

🔗

\(120 - 127\text{:}\) \(120, 120, 125, 125, 125\)

🔗

\(128 - 135\text{:}\) \(130, 130, 130, 135, 135, 135, 135\)

🔗

\(136 - 143\text{:}\) \(140, 140, 140, 140, 140, 140\)

🔗

\(144 - 151\text{:}\) \(145, 145, 145, 145, 150, 150, 150, 150\)

🔗

\(152 - 160\text{:}\) \(155, 155, 160, 160\)

🔗

Then construct the frequency Table

🔗

Table 3.1.8.

🔗

Test scores	Tally	Frequency
\(120 - 127\)	\(\cancel{////}\)	\(5\)
\(128 - 135\)	\(\cancel {//////}\)	\(7\)
\(136 - 143\)	\(\cancel{/////}\)	\(6\)
\(144 - 151\)	\(\cancel{///////}\)	\(8\)
\(152 - 160\)	\(////\)	\(4\)

🔗

Exercises Exercises

1.

Twenty five students in Grade 10 recorded their time travel to school in minutes as follows:

🔗

\(15, 8, 22, 30, 12, 25, 18, 10, 35, 20, 5, 28, 15, 40, 17, 23, 12, 32, 7, 19, 27, 14, 21, 9, 33\)

🔗

Draw a frequency distribution table to represent the data.

🔗

2.

Thirty customers at a service center recorded their wait times in minutes as follows:

🔗

\(5, 12, 8, 15, 3, 10, 18, 6, 13, 9, 2, 16, 11, 7, 19, 14, 4, 8, 12, 20, 5, 17, 9, 13, 6, 11, 3, 15, 8, 14.\)

🔗

Prepare a frequency distribution table for the set of data.

🔗

3.

The costs (in Ksh.) of manufacturing equipment across different factories were recorded as follows:

🔗

\(1250, 1425, 1580, 1720, 1850, 1975, 2100, 2235, 2370, 2480, 2610, 4310, 4425,\)

🔗

\(2750, 2880, 3025, 3150, 3280, 3410, 3525, 3640, 3750, 3870, 3975, 4080, 4195,\)

🔗

\(4550, 4680, 4820, 4950, 1380, 1625, 1890, 2340, 2570, 2780, 3120, 3390, 3610, \)

🔗

\(4520, 4750, 3150, 2840, 1950, 2640, 3470, 4180, 3840, 4030, 4270.\)

🔗

Prepare a frequency distribution table for the grouped data.

🔗

4.

The annual rainfall (in mm) recorded in a region was as follows:

🔗

\(625, 645, 670, 695, 720, 745, 770, 790, 810, 835, 860, 880, 905, \)

🔗

\(1000, 1025, 1050, 1075, 1100, 1125, 1150, 930, 950, 975, 1180.\)

🔗

Construct a grouped frequency distribution table for the data.

🔗

Subsection 3.1.3 Measures of Central Tendency

Measures of central tendency are statistical values that describe the center or average of a set of data.

🔗

They help us identify a single number that represents the entire distribution of data, giving us an idea of what is “typical” or “common.”

🔗

The three main measures are Mean, Median and Mode.

🔗

Together, these measures give us different perspectives of the center of the data.

🔗

Subsubsection 3.1.3.1 Ungrouped Data for Measures of Central Tendency

Ungrouped data is the raw form of data, where individual observations are listed as they are collected, without being organized into groups or classes

🔗

It is simply a list of numbers, facts, or values recorded in the order they are obtained

🔗

Activity 3.1.4.

\(\textbf{Work in groups}\)

🔗

Consider the following data set:
🔗

\(32, 33, 35,36, 38, 40, 41, 42, 44, 45, 47, 48, 50,52, 54, 55, 56, 57,\) \(58, 60, 62, 63, 65,66, 68, 70, 72, 74, 75, 78, 80, 82, 85\)
🔗

🔗
Construct a frequency distribution table for the data.
🔗

🔗
Find the Mean, Mode and Median for the data.
🔗

🔗
Discuss with other groups
🔗

🔗

🔗

\(\textbf{Key Takeaway and Defintions}\)

🔗

There are three main measures of central tendency:

🔗

\(\textbf{Mean}\)
🔗

Mean is the sum of all values divided by the total number of values. It is also known as Arithmetic Average.
🔗

\begin{align*} \textbf{Mean} = \amp \frac{ \textbf{∑X}}{ \textbf{N}} \end{align*}

🔗

Where \(\textbf{X}\) represents the values in the dataset and \(\textbf{N}\) is the total number of values.
🔗

Frequency distribution table can be used to find the mean for the ungrouped data. Using the formula below:
🔗

\begin{align*} \overline{\textbf{x}} = \amp \frac{ \textbf{∑fx}}{ \textbf{∑f}} \end{align*}

🔗

Where;
🔗
- \(\textbf{x}\) represents the values in the dataset
  🔗
  
  🔗
- \(\overline{\textbf{x}}\) is the mean
  🔗
  
  🔗
- \(\textbf{∑fx}\) is the sum of products of \(\textbf{ x}\) and \(\textbf{ f}\)
  🔗
  
  🔗
- \(\textbf{∑f}\) is the Sum of frequencies
  🔗
  
  🔗
🔗
🔗
\(\textbf{Median}\)
🔗

Median is the middle value when the data is arranged in ascending or descending order. If the dataset has an even number of values, the median is the average of the two middle values.
🔗

To find the median for even numbers, the formula is;
🔗

\(\textbf{Median} \) = \(\frac{\left(\frac{n}{2}\right)^{\text{th}} + \left(\frac{n}{2} + 1\right)^{\text{th}}}{2}\)
🔗

🔗
\(\textbf{Mode}\)
🔗

Mode is the most frequently occurring value in the dataset. A dataset can have:
🔗
- No mode: if no value repeats.
  
  🔗
- One mode (Unimodal): if one value appears most frequently.
  
  🔗
- Two modes (Bimodal): if two values appear equally most frequently.
  
  🔗
- Multimodal: if more than two values appear frequently.
  
  🔗
🔗
🔗

🔗

Example 3.1.9.

In a class of \(30\) students, the test scores of students are:

🔗

\(45, 67, 89, 56, 45, 78, 90, 67, 81, 73, 55, 62, 77, 84, 91,\) \(69, 58, 72, 88, 95, 60, 75, 45, 67, 80, 92, 87, 79, 68, 55\)

🔗

Find the:

🔗

Mean
🔗

🔗
Median
🔗

🔗
Mode
🔗

🔗

🔗

Solution.

The formula for the mean is:
🔗

\begin{align*} \textbf{Mean} = \amp \frac{ \textbf{∑X}}{ \textbf{N}} \end{align*}

🔗

Where;
🔗
- \(∑X\) is the sum of all values
  🔗
  
  🔗
- \(N\) is the total number of values
  🔗
  
  🔗
🔗
\(∑X\) = \(45+67+89+56+45+78+90+67+81+73+55+62\)
🔗

\(+77+84+91+69+58+72+88+95+60+75+45+67\)
🔗

\(+80+92+87+79+68+55\)
🔗

= \(2096\)
🔗

\begin{align*} = \amp \frac{2096}{30}\\ = \amp 69.87 \end{align*}

🔗

Therefore, the mean is \(69.87\)
🔗

🔗
Median
🔗

The median is the middle value when data is arranged in ascending or descending order.
🔗

\(45, 45, 45, 55, 55, 56, 58, 60, 62, 67, 67, 67, 68, 69, 72,\)
🔗

\(73, 75, 77, 78, 79, 80, 81, 84, 87, 88, 89, 90, 91, 92, 95\)
🔗

Since there are \(30\) values (even number), the median is;
🔗

\(\textbf{Median} \) = \(\frac{\left(\frac{n}{2}\right)^{\text{th}} + \left(\frac{n}{2} + 1\right)^{\text{th}}}{2}\)
🔗

= \(\frac{\left(\frac{30}{2}\right)^{\text{th}} + \left(\frac{30}{2} + 1\right)^{\text{th}}}{2}\)
🔗

\(15\textbf{th}\) = \(72\)
🔗

\(16\textbf{th}\) = \(73\)
🔗

\begin{align*} \textbf{Median} = \amp \frac{72+73}{2}\\ = \amp \frac{145}{2}\\ = \amp 72.5 \end{align*}

🔗

Therefore the median is \(72.5\)
🔗

🔗
Mode
🔗

The mode is the most frequently occurring values.
🔗

From the sorted data:
🔗
- \(45\) appears \(3\) times
  🔗
  
  🔗
- \(67\) appears \(3\) times
  🔗
  
  🔗
🔗
Since \(45\) and \(67\) appear most frequently, this dataset is bimodal with modes: \(45\) and \(67\text{.}\)
🔗

🔗

🔗

Example 3.1.10.

The frequency distribution table below shows marks of \(20\) students in a Grade 10 class.

🔗

Table 3.1.11.

🔗

Marks (x)	Frequency (f)	fx
\(2\)	\(3\)	\(6\)
\(4\)	\(2\)	\(8\)
\(6\)	\(4\)	\(24\)
\(7\)	\(3\)	\(21\)
\(9\)	\(5\)	\(45\)
\(11\)	\(2\)	\(22\)
\(12\)	\(1\)	\(12\)
\(Σx = 51\)	\(Σf = 20\)	\(Σfx = 138\)

Find the mean
🔗

🔗
find the mode
🔗

🔗

🔗

Solution.

To find the mean
🔗

\begin{align*} \overline{\textbf{x}} = \amp \frac{ \textbf{∑fx}}{ \textbf{∑f}} \\ = \amp \frac{138}{20}\\ = \amp 6.9 \end{align*}

🔗

Therefore, the mean is \(6.9\)
🔗

🔗
The mode is the mark with the highest frequency. In this case, the highest frequency is \(5\text{,}\) which corresponds to \(9\text{.}\)
🔗

Therefore, Mode is \(9\)
🔗

🔗

🔗

Exercises Exercises

1.

The number of books borrowed by students from a school library in a week is as follows:

🔗

\(3, 5, 2, 4, 6, 3, 5, 7, 4, 3, 6, 2, 4, 5, 3, 6\)

🔗

Find the Mean (Average) number of books borrowed.
🔗

🔗
Find the Median number of books borrowed.
🔗

🔗
Find the Mode of the books borrowed.
🔗

🔗

🔗

2.

The following frequency distribution table represents volume of water (in liters) contained in different bottles:

🔗

Table 3.1.12.

🔗

Volume of water (liters)	Number of bottles
\(34.5\)	\(3\)
\(35.8\)	\(4\)
\(37.2\)	\(2\)
\(39.0\)	\(3\)
\(40.4\)	\(3\)

Find the mean volume of water in the bottles.
🔗

🔗
Determine the mode of the data.
🔗

🔗
Find the median volume.
🔗

🔗

🔗

3.

A company records the monthly salaries (in KES) of its employees.

🔗

Table 3.1.13.

🔗

Salary (KES)	Frequency
\(25,000\)	\(8\)
\(30,000\)	\(15\)
\(35,000\)	\(12\)
\(40,000\)	\(10\)
\(50,000\)	\(5\)

Calculate the mean, median and mode.

🔗

4.

A factory records the number of products manufactured in a week:

🔗

\(150, 130, 220, 135, 180, 140, 125, 250, 145, 230, 200, 205, 145, 190, 155,\)

🔗

\(210, 225, 240, 135, 165, 245, 170, 175, 185, 130, 190, 195, 160, 170, 150,\)

🔗

\(160, 120, 200, 210, 220, 235, 180, 230, 240, 215\)

🔗

Make a frequency distribution table for the set of data.
🔗

🔗
Calculate the mean, mode and median.
🔗

🔗

🔗

Subsubsection 3.1.3.2 Grouped Data for Measures of Central Tendency

Grouped data is data that has been organized into classes or intervals together with their frequencies.

🔗

Instead of listing every single value (as in ungrouped data), the observations are arranged into groups (class intervals) to make large data sets easier to understand, analyze, and interpret.

🔗

Activity 3.1.5.

\(\textbf{Work in groups}\)

🔗

The amount of pocket money, in shillings, that parents give to students per week.

🔗

Table 3.1.14.

🔗

Pocket Money (Ksh)	\(100 - 199\)	\(200 - 299\)	\(300 - 399\)	\(400 - 499\)	\(500 - 599\)
Number of students	\(8\)	\(15\)	\(22\)	\(20\)	\(10\)

What is the modal class of pocket money given to students?
🔗

🔗
Calculate the mean of pocket money given to students per week.
🔗

🔗
Find the median amount of pocket money from the given data.
🔗

🔗
Discuss with other groups.
🔗

🔗

🔗

\(\textbf{Key Takeaway and Definitions}\)

🔗

\(\textbf{Mean}\)
🔗

Frequency distribution table can be used to find the mean for the grouped data. Using the formula below:
🔗

\begin{align*} \overline{\textbf{x}} = \amp \frac{ \textbf{∑fx}}{ \textbf{∑f}} \end{align*}

🔗

Where;
🔗
- \(\textbf{x}\) is the midpoint
  🔗
  
  🔗
- \(\overline{\textbf{x}}\) is the mean
  🔗
  
  🔗
- \(\textbf{∑fx}\) is the sum of products of \(\textbf{ x}\) and \(\textbf{ f}\)
  🔗
  
  🔗
- \(\textbf{∑f}\) is the Sum of frequencies
  🔗
  
  🔗
🔗
\(\textbf{Midpoint}\) is the average of the lower and upper boundaries of a class interval. It represents the central value of each class.
🔗

\begin{align*} \textbf{Midpoint} = \amp \frac{\textbf{Lower boundary + Upper boundary}}{2} \end{align*}

🔗

🔗
\(\textbf{Median}\)
🔗

For grouped frequency data, we use interpolation to estimate the median using the following formula;
🔗

\begin{align*} \textbf{Median} = \amp \textbf{L} + (\frac{\frac{n}{2} - \textbf{CF}}{\textbf{F}})\times \textbf{C} \end{align*}

🔗

Where;
🔗
- \(\textbf{L}\) = Lower boundary of the median class.
  🔗
  
  🔗
- \(\textbf{n}\) = sum of all frequencies.
  🔗
  
  🔗
- \(\textbf{CF}\) = Cumulative frequency before the median class.
  🔗
  
  🔗
- \(\textbf{F}\) = Frequency of the median class.
  🔗
  
  🔗
- \(\textbf{C}\) = class width
  🔗
  
  🔗
🔗
🔗
\(\textbf{Mode}\)
🔗

The Modal class is the class with highest frequency.
🔗

🔗

🔗

Example 3.1.15.

A company records the monthly salaries (in KES) of \(50\) employees in a frequency distribution table below:

🔗

Table 3.1.16.

🔗

Salary Range(KES)	Number of Employees
\(20,000 - 29,999\)	\(3\)
\(30,000 - 39,999\)	\(5\)
\(40,000 - 49,999\)	\(7\)
\(50,000 - 59,999\)	\(10\)
\(60,000 - 69,999\)	\(9\)
\(70,000 - 79,999\)	\(6\)
\(80,000 - 89,999\)	\(5\)
\(90,000 - 99,999\)	\(3\)
\(100,000 - 109,999\)	\(2\)

Find the mean and mode of the data.
🔗

🔗
Find the modal class of the data.
🔗

🔗

🔗

Solution.

To calculate the mean of the grouped data we need to find;

🔗

\(\textbf{∑fx}\) which is the sum of products of \(\textbf{x}\) and \(\textbf{f}\)
🔗

🔗
\(\textbf{∑f}\) is the sum of frequencies
🔗

🔗

🔗

Table 3.1.17.

🔗

Salary Range(KES)	Midpoint(x)	Number of Employees(f)	xf
\(20,000 - 29,999\)	\(25,000\)	\(3\)	\(25,000 \times 3 = 75,000\)
\(30,000 - 39,999\)	\(35,000\)	\(5\)	\(35,000 \times 5 = 175,000\)
\(40,000 - 49,999\)	\(45,000\)	\(7\)	\(45,000 \times 7 = 315,000\)
\(50,000 - 59,999\)	\(55,000\)	\(10\)	\(55,000 \times 10 = 550,000\)
\(60,000 - 69,999\)	\(65,000\)	\(9\)	\(65,000 \times 9 = 585,000\)
\(70,000 - 79,999\)	\(75,000\)	\(6\)	\(75,000 \times 6 = 450,000\)
\(80,000 - 89,999\)	\(85,000\)	\(5\)	\(85,000 \times 5 = 425,000\)
\(90,000 - 99,999\)	\(95,000\)	\(3\)	\(95,000 \times 3 = 285,000\)
\(100,000 - 109,999\)	\(105,000\)	\(2\)	\(105,000 \times 2 = 210,000\)
\(\textbf{Total (∑)}\)	\(585,000\)	\(50\)	\(3,070,000\)

\begin{align*} \overline{\textbf{x}} = \amp \frac{ \textbf{∑fx}}{ \textbf{∑f}} \\ = \amp \frac{ 3,070,000}{ 50}\\ = \amp 61,400 \end{align*}

🔗

Therefore, the mean is \(61,400\)

🔗

The modal class is \(50,000 - 59,999 \textbf{ KES}\text{,}\) since is the one with the highest frequency.
🔗

🔗

🔗

Example 3.1.18.

The data below represents the times (in seconds) recorded in the heats of a \(100 \textbf{ m}\) race during an athletics event:

🔗

\(14.5, 13.7, 14.8, 15.3, 15.1, 14.2, 14.9, 12.6, 11.9, 13.1, 12.3, 14.7, 14.1, 15.0,\)

🔗

\(14.3, 15.2, 11.7, 12.9, 13.5, 15.4, 12.8, 12.1, 14.4, 13.2, 14.6, 11.6, 12.7, 15.5\)

🔗

\(14.0, 14.9, 13.9, 12.0, 13.8, 15.2, 13.3\)

🔗

Create a frequency distribution table using class intervals:
🔗
- \(\displaystyle 11.5 - 11.9\)
  
  🔗
- \(\displaystyle 12.0 - 12.4\)
  
  🔗
- \(\displaystyle 12.5 - 12.9\)
  
  🔗
- \(\displaystyle 13.0 - 13.4\)
  
  🔗
- \(\displaystyle 13.5 - 13.9\)
  
  🔗
- \(\displaystyle 14.0 - 14.4\)
  
  🔗
- \(\displaystyle 14.5 - 14.9\)
  
  🔗
- \(\displaystyle 15.0 - 15.4\)
  
  🔗
- \(\displaystyle 15.5 - 15.9\)
  
  🔗
🔗
🔗
Determine the modal class
🔗

🔗
Estimate median based on the frequency table.
🔗

🔗
Find the mean based on the frequency table.
🔗

🔗

🔗

Solution.

To create a frequency distribution table We need to count how many values fall into each class interval.

🔗

Table 3.1.19.

🔗

Class Interval (seconds)	Midpoint(x)	Frequency(f)	Cumulative frequency(CF)	fx
\(11.5 - 11.9\)	\(11.7\)	\(2\)	\(2\)	\(23.4\)
\(12.0 - 12.4\)	\(12.2\)	\(4\)	\(6\)	\(36.6\)
\(12.5 - 12.9\)	\(12.7\)	\(5\)	\(11\)	\(38.1\)
\(13.0 - 13.4\)	\(13.2\)	\(4\)	\(15\)	\(52.8\)
\(13.5 - 13.9\)	\(13.7\)	\(5\)	\(20\)	\(54.8\)
\(14.0 - 14.4\)	\(14.2\)	\(6\)	\(26\)	\(71.0\)
\(14.5 - 14.9\)	\(14.7\)	\(5\)	\(31\)	\(73.5\)
\(15.0 - 15.4\)	\(15.2\)	\(4\)	\(35\)	\(60.8\)
\(15.5 - 15.9\)	\(15.7\)	\(1\)	\(36\)	\(15.7\)
\(\textbf{Total (∑)}\)	\(123.3\)	\(36\)	\(36\)	\(491.2\)

🔗

The modal class is the class interval with the highest frequency.
🔗

From the table, The highest frequency is \(6\text{,}\) which appears in class interval: \(14.0 - 14.4\text{.}\)
🔗

So, the modal class is \(14.0 - 14.4\)
🔗

🔗
Total frequency (∑f) = \(36\)
🔗

Median position is \(\frac{36}{2}\) = \(18\)
🔗

The cumulative frequency just before \(18\) is \(15\text{,}\) and the next class reaches \(20\text{,}\) so the median class is \(13.5 - 13.9\text{.}\)
🔗

Using the median formula:
🔗

\begin{align*} \textbf{Median} = \amp \textbf{L} + (\frac{\frac{n}{2} - \textbf{CF}}{\textbf{F}})\times \textbf{C} \end{align*}

🔗

Where;
🔗
- L = \(13.5\) (lower boundary of median class)
  🔗
  
  🔗
- \(\frac{n}{2}\) = \(18\)
  🔗
  
  🔗
- CF = \(15\) (cumulative frequency before the median class)
  🔗
  
  🔗
- F = \(5\) (frequency of the median class)
  🔗
  
  🔗
- C = \(0.5\) class width
  🔗
  
  🔗
🔗
\begin{align*} \textbf{Median} = \amp 13.5 + (\frac{18 - 15}{5})\times 0.5\\ = \amp 13.5 + (\frac{3}{5}\times 0.5)\\ = \amp 13.5 + 0.3\\ = \amp 13.8 \end{align*}

🔗

Thus, the median time is \(13.8 \textbf{ seconds}\text{.}\)
🔗

🔗
The mean is given by;
🔗

\begin{align*} \overline{\textbf{x}} = \amp \frac{ \textbf{∑fx}}{ \textbf{∑f}} \end{align*}

🔗

From the table;
🔗

\(\textbf{∑fx}\) = \(491.2\)
🔗

\(\textbf{∑f}\) = \(36\)
🔗

\begin{align*} = \amp \frac{ 491.2}{ 36}\\ = \amp 13.64 \end{align*}

🔗

Thus, the mean time is \(13.64 \textbf{ seconds}\)
🔗

🔗

🔗

Exercises Exercises

1.

Mathematics test scores for \(60\) students in a class from Ichina primary school are:

🔗

\(32, 45, 12, 56, 38, 74, 60, 29, 41, 50, 55, 67, 72, 31, 47,\)

🔗

\(39, 18, 26, 64, 42, 48, 52, 69, 77, 35, 58, 23, 19, 61, 54,\)

🔗

\(70, 33, 28, 37, 44, 46, 30, 49, 79, 62, 21, 16, 53, 57, 40,\)

🔗

\(34, 25, 68, 66, 51, 59, 71, 27, 20, 36, 43, 63, 65, 75, 80\)

🔗

Create a frequency distribution table using class intervals of \(10\text{,}\) starting from \(10 - 19\text{,}\) \(20 - 29\text{,}\) \(30 - 39\text{,}\) ..., \(70 - 79\text{.}\)
🔗

🔗
Determine the modal class
🔗

🔗
Estimate the mean and median from the frequency table.
🔗

🔗

🔗

2.

The population of \(50\) towns in Kakamega was recorded as follows:

🔗

\(152, 168, 140, 155, 172, 184, 176, 193, 150, 160, 175, 143, 182, 164, 149,\)

🔗

\(170, 185, 157, 169, 188, 154, 178, 166, 147, 190,\)

🔗

\(145, 180, 158, 137, 174, 192, 141, 165, 187, 144,\)

🔗

\(162, 153, 171, 139, 148, 156, 183, 177, 186, 159\)

🔗

Create a grouped frequency table with class intervals of \(10\text{,}\) starting from \(135 - 144\text{.}\)
🔗

🔗
Determine the modal class.
🔗

🔗
Estimate the mean and median from the distribution.
🔗

🔗

🔗

3.

The monthly electricity bills (in KES) of households in a town are recorded in the table below:

🔗

Table 3.1.20.

🔗

Electricity Bill (KES)	Frequency (f)
\(1,000 - 1,999\)	\(6\)
\(2,000 - 2,999\)	\(10\)
\(3,000 - 3,999\)	\(14\)
\(4,000 - 4,999\)	\(12\)
\(5,000 - 5,999\)	\(8\)

Find the median electricity bill.
🔗

🔗
Identify the modal class.
🔗

🔗

🔗

4.

A researcher collects data on daily rainfall (in mm) over a month and organizes it into \(10\) equal class intervals

🔗

Table 3.1.21.

🔗

Rainfall (mm)	Frequency (f)
\(0 - 9\)	\(2\)
\(10 - 19\)	\(4\)
\(20 - 29\)	\(6\)
\(30 - 39\)	\(8\)
\(40 - 49\)	\(10\)
\(50 - 59\)	\(12\)
\(60 - 69\)	\(9\)
\(70 - 79\)	\(6\)
\(80 - 89\)	\(4\)
\(90 - 100\)	\(2\)

Identify the modal class.
🔗

🔗
Calculate the Mean rainfall.
🔗

🔗
Determine the Median rainfall.
🔗

🔗

🔗

Subsection 3.1.4 Representation of Data

Representation of data is the process of presenting collected information (data) in an organized and visual form so that it is easy to understand, interpret and analyze.

🔗

Instead of leaving data as raw numbers, we use tables, charts and graphs to show patterns, trends, and comparisons more clearly.

🔗

Subsubsection 3.1.4.1 Drawing Histograms and Frequency Polygons of Data

Activity 3.1.6.

Work in groups

🔗

In a school with \(500\) students, their heights were measured and recorded in the following table.

🔗

Table 3.1.22.

🔗

Height (cm)	Number of Students (Frequency)
\(140 - 149\)	\(30\)
\(150 - 159\)	\(70\)
\(160 - 169\)	\(110\)
\(170 - 179\)	\(150\)
\(180 - 189\)	\(90\)
\(190 - 199\)	\(50\)

Choose a suitable scale and represent the data on a histogram and a frequency polygon.
🔗

🔗
Compare and discuss your graphs with other groups.
🔗

🔗

🔗

\(\textbf{Key Takeaway and Definitions}\)

🔗

A histogram uses adjacent bars to show frequency distribution, while a frequency polygon connects the midpoints of the bars with a line to show patterns.

🔗

\(\textbf{Class Width}\) is the difference between the upper and lower boundaries of a class.

🔗

\(\textbf{Equal class width}\) means all bars have the same width.

🔗

\(\textbf{Unequal class width}\) means bars have different widths to better represent uneven data.

🔗

\(\textbf{Frequency density}\) is a measure used in histograms to ensure that the area of each bar represents the actual frequency of observations, especially when class widths are unequal.

🔗

Frequency density is calculated using the formula:

🔗

\begin{align*} \textbf{Frequency density} = \amp \frac{\textbf{Frequency}}{\textbf{Class width}} \end{align*}

🔗

Where;

🔗

\(\textbf{Frequency}\) is the number of observations in a class interval.

🔗

\(\textbf{Why Use Frequency Density Instead of Frequency?}\)

🔗

In a histogram, the area of each bar (not just the height) represents the frequency.
🔗

🔗
If class widths are unequal, simply plotting frequency would distort the representation.
🔗

🔗
Using frequency density ensures that the area of each bar remains proportional to the actual frequency.
🔗

🔗

🔗

\(\textbf{Midpoint}\) of a class interval represents the central value of that range. It is the average of the lower and upper boundaries of the class.

🔗

Formula for midpoint:

🔗

\begin{align*} \textbf{Midpoint} = \amp \frac{\textbf{Lower bound + Upper bound}}{2} \end{align*}

🔗

Example 3.1.23.

The table below presents the salary distribution of employees in a company.

🔗

Table 3.1.24.

🔗

Salary Range (KSh)	Frequency
\(1000 - 1500\)	\(42\)
\(1500 - 2000\)	\(35\)
\(2000 - 2500\)	\(20\)
\(2500 - 3000\)	\(15\)
\(3000 - 4000\)	\(18\)
\(4000 - 5000\)	\(42\)

Draw a histogram and a frequency polygon to represent the data.

🔗

Solution.

To draw a histogram and a frequency polygon, we need to find the frequency density and the midpoint of each class interval.

🔗

We use frequency density instead of frequency to draw the histogram because the class widths are unequal.

🔗

The formula for frequency density:

🔗

\begin{align*} \textbf{Frequency density} = \amp \frac{\textbf{Frequency}}{\textbf{Class width}} \end{align*}

🔗

The formula for Midpoint:

🔗

\begin{align*} \textbf{Midpoint} = \amp \frac{\textbf{Lower bound + Upper bound}}{2} \end{align*}

🔗

Table 3.1.25.

🔗

Salary range(Ksh.)	Frequency	Class width	Frequency density	Midpoint
\(1000 - 1500\)	\(42\)	\(500\)	\(0.084\)	\(1250\)
\(1500 - 2000\)	\(35\)	\(500\)	\(0.070\)	\(1750\)
\(2000 - 2500\)	\(20\)	\(500\)	\(0.040\)	\(2250\)
\(2500 - 3000\)	\(15\)	\(500\)	\(0.030\)	\(2750\)
\(3000 - 4000\)	\(18\)	\(1000\)	\(0.018\)	\(3500\)
\(4000 - 5000\)	\(42\)	\(1000\)	\(0.042\)	\(4500\)

🔗

Example 3.1.26.

The following frequency distribution shows the daily rainfall amounts (in mm) recorded at a weather station over a \(60\) day period.

🔗

Table 3.1.27.

🔗

Rainfall(mm)	Frequency
\(0 - 5\)	\(22\)
\(6 - 10\)	\(15\)
\(11 - 15\)	\(12\)
\(16 - 25\)	\(8\)
\(26 - 40\)	\(3\)

Create a histogram to represent this data.

🔗

Solution.

To draw a histogram we need to find the frequency density because class width are unequal.

🔗

The formula for frequency density:

🔗

\begin{align*} \textbf{Frequency density} = \amp \frac{\textbf{Frequency}}{\textbf{Class width}} \end{align*}

🔗

Table 3.1.28.

🔗

Rainfall(mm)	Frequency	Class width	Frequency density
\(5 - 10\)	\(22\)	\(5\)	\(4.4\)
\(10 - 15\)	\(15\)	\(5\)	\(3.0\)
\(15 - 20\)	\(12\)	\(5\)	\(2.4\)
\(20 - 30\)	\(8\)	\(10\)	\(0.8\)
\(30 - 45\)	\(3\)	\(15\)	\(0.2\)

🔗

Exercises Exercises

1.

The following data represents the heights (in cm) of \(30\) students in a class:

🔗

\(150, 155, 160, 162, 165, 158, 170, 172, 168, 153, 163, 167, 175, 178, 161,\)

🔗

\(156, 169, 171, 159, 164, 173, 176, 157, 166, 174, 177, 154, 165, 179, 160\)

🔗

Create a frequency table with class intervals of \(5 \textbf{ cm}\) and midpoints of the data.
🔗

🔗
Using the frequency table you created Draw a histogram to represent the data.
🔗

🔗
Draw a frequency polygon on the same axes as your histogram.
🔗

🔗
Label your axes and give your graphs appropriate titles.
🔗

🔗

🔗

2.

The following data represents the ages of \(25\) people in a community meeting:

🔗

\(20, 25, 30, 35, 40, 22, 28, 33, 38, 42, 27, 32, 37, 41,\)

🔗

\(24, 29, 34, 39, 43, 26, 31, 36, 44, 23, 45\)

🔗

Create a frequency table with class intervals of \(5\) years
🔗

🔗
Draw a histogram to represent the data.
🔗

🔗
Draw a frequency polygon on the same axes.
🔗

🔗
Label the axes and provide titles for your graphs.
🔗

🔗

🔗

3.

A school collected data on the number of books read by students in a term. The following frequency table shows the results:

🔗

Table 3.1.29.

🔗

Number of Books Read	Frequency
\(0 - 2\)	\(15\)
\(3 - 5\)	\(25\)
\(6 - 8\)	\(35\)
\(9 - 11\)	\(15\)
\(12 - 14\)	\(10\)

Draw a histogram to represent the data from your frequency table.
🔗

🔗
On the same axes, draw a frequency polygon.
🔗

🔗
Estimate the median number of books read. Explain your reasoning.
🔗

🔗

🔗

4.

A survey was conducted to find out how much time people spend on social media daily. The following data was collected:

🔗

Table 3.1.30.

🔗

Time (Minutes)	Frequency	Class Width	Frequency Density
\(0 - 10\)	\(15\)	\(10\)	\(1.5\)
\(10 - 20\)	\(25\)	\(10\)	\(2.5\)
\(20 - 30\)	\(30\)	\(10\)	\(3.0\)
\(30 - 60\)	\(40\)	\(30\)	\(1.33\)
\(60 - 120\)	\(20\)	\(60\)	\(0.33\)

Draw a histogram to represent the sales data.
🔗

🔗
Draw a frequency polygon to represent the sales data.
🔗

🔗
Label your axes and provide appropriate titles for your graphs.
🔗

🔗

🔗

Subsection 3.1.5 Interpretation of data

Interpretation of data is the process of examining and explaining the meaning of organized or represented data in order to draw conclusions, make decisions or solve problems.

🔗

Once data is collected and represented in tables, graphs or charts, we look for patterns, relationships and trends to understand what the data is telling us.

🔗

Subsubsection 3.1.5.1 Interpreting Histograms and Frequency Polygons of Data

Activity 3.1.7.

\(\textbf{Work in groups}\)

🔗

The histogram below represent a household’s daily water consumption (in liters) recorded over a month.

🔗

Determine the day when the water consumption was high.
🔗

🔗
Determine the day when the water consumption was low.
🔗

🔗
Discuss and share with other group.
🔗

🔗

🔗

\(\textbf{Key Takeaway and Definitions}\)

🔗

Interpretation of data helps us understand collected information by finding patterns, trends, and connections so we can make better decisions.
🔗

🔗

🔗

Example 3.1.31.

The histogram below represents the ages of attendees recorded by the organizers at a community event.

🔗

How many age groups are represented in the histogram?
🔗

🔗
What is the total number of attendees recorded in the histogram?
🔗

🔗
Which age group has the highest number of attendees?
🔗

🔗

🔗

Solution.

By counting the number of bars in the histogram, we can determine the number of age groups.
🔗

The bars are \(5\)
🔗

Therefore, there were five age groups that attended the event.
🔗

🔗
The total number of attendees is the sum of all frequencies (heights of the bars).
🔗

\(50+25+40+35+15\) = \(165\)
🔗

Therefore, the number of attendees were \(165\)
🔗

🔗
The age group corresponding to the tallest bar has the highest number of attendees.
🔗

Therefore, the age group \(10 - 15\) had the highest number of attendees, with a total of \(50\) participants.
🔗

🔗

🔗

Example 3.1.32.

The graph below represents a histogram and frequency polygon of the distribution of exam scores of students in a Grade 10 class.

🔗

Describe the shape of the distribution of exam scores.
🔗

🔗
What is the midpoint of the class interval \(70 - 85\text{?}\)
🔗

🔗
Compare the height of the first bar (\(40 - 55\) score range) to the height of the last bar (\(85 - 95\) score range). What does this tell you about the number of students in those score ranges?
🔗

🔗

🔗

Solution.

The distribution is skewed to the right (positively skewed).
🔗

🔗
\(\displaystyle 77.5\)
🔗

🔗
The first bar is much taller than the last bar. This means that many more students got scores in the \(40 - 55\) range than in the \(85 - 95\) range.
🔗

🔗

🔗

Exercises Exercises

1.

The following histogram shows sales of milk (in litres) sold by Akiru.

🔗

What does the y-axis represent?
🔗

🔗
What does the x-axis represent?
🔗

🔗
Which day did Akiru
🔗

🔗
Describe the shape of the histogram. Is it symmetrical or skewed? If skewed, is it skewed left or right?
🔗

🔗

🔗

2.

The following histogram shows the height of students in a grade 10 class.

🔗

Use the information from the graph to answer the following questions:

🔗

Calculate the frequency of individuals with heights between \(145 \textbf{ cm}\) and \(155 \textbf{ cm}\) Show your working
🔗

🔗
Identify the modal class.
🔗

🔗
Estimate the total number of individuals represented in the histogram.
🔗

🔗
Explain one difference between a histogram and a bar graph.
🔗

🔗
Describe the overall shape of the height distribution shown in the histogram.
🔗

🔗

🔗

3.

The following graph shows a histogram and frequency polygon of the weight of girls in a class.

🔗

4.

Interpret the histogram and frequency polygon graph below and answer the questions given.

🔗

Describe the overall shape of this rainfall distribution graph.
🔗

🔗
At which rainfall ranges do the frequencies seem to decline?
🔗

🔗
Calculate the total frequency across all rainfall ranges.
🔗

🔗
What range of rainfall appears most frequently?
🔗

🔗
Estimate the median rainfall range from this distribution.
🔗

🔗
Which rainfall range appears to be the mode of this distribution?
🔗

🔗
What might cause variations in rainfall distribution?
🔗

🔗

🔗

Prev Top Next