Define and distinguish between different types of data (primary, secondary, and tertiary).
Identify appropriate data collection methods, including surveys, experiments, and observations.
Organize and present data using frequency distribution tables for both grouped and ungrouped data.
Compute and interpret measures of central tendency (mean, median, and mode) to summarize datasets.
Visualize data effectively through histograms and frequency polygons.
Apply statistical methods to analyze real-world problems and draw meaningful conclusions.
Subsection3.1.1Collection of Data
Subsubsection3.1.1.1Sources of Data
Activity3.1.1.
\(\textbf{Work in groups}\)
A teacher assigned students a research topic on the effects of social media on teenage mental health in Kenya.
Suggest one potential primary data collection method that could be used.
Suggest one potential secondary data source that could be used.
What is the difference between primary and secondary data?
Why is secondary data important for research and decision-making?
Discuss and share with other groups.
\(\textbf{Key Takeaway}\)
Data can be obtained from primary sources, which are original first-hand data collected directly for a specific purpose (e.g., surveys, interviews), secondary sources, which are previously collected data used for a different purpose (e.g., government reports, research articles), and tertiary sources, which summarize and compile information from primary and secondary sources (e.g., dictionaries, textbooks, encyclopedias).
Example3.1.1.
Classify the Following as Primary or Secondary Data Sources:
A student conducts a survey to find out the favorite sports of their classmates.
A teacher uses last year’s national exam results to analyze student performance trends.
A researcher reads a government report on the most common diseases in Kenya.
A doctor observes a patient’s symptoms and records them for a medical study.
A scientist conducts an experiment to test the growth rate of plants under different conditions.
Solution.
Primary because the student is collecting first-hand data directly from people.
Secondary because the exam results were already collected and recorded by an external body.
Secondary because The report was collected and published by someone else for a different purpose.
Primary because The doctor is directly collecting new data from real-life observation.
Primary because The scientist is generating new data through an experiment.
Example3.1.2.
A community group wants to understand the needs of youth in their area.
What primary data collection methods could the group use?
What problems might they have when trying to find this information?
How can they use the information they find to help the community?
What secondary data sources could be useful to this community group?
Solution.
Focus groups, surveys, interviews.
Overcoming language barriers.
Getting honest responses.
They can start new programs for young people.
They can tell leaders what young people need.
Academic studies on youth issues.
Government reports on youth.
ExercisesExercises
1.
Grade \(10\) students from Korinyang Primary School went to Lake Nakuru National Park and counted the flamingos they saw for their Biology project.
Is this a primary or secondary data source?
Give reason.
2.
In a market, a shop owner watches what customers buy most often and looks for trends based on what they see in the store.
Which sources of data can the shop owner use to collect the data?
3.
Define the following terms.
Data Source
Raw Data
Tertiary Sources
Primary sources
4.
What are the advantages of using secondary data over primary data in some research situations?
Subsubsection3.1.1.2Methods of Data Collection
Activity3.1.2.
\(\textbf{Work in groups}\)
The following data represents the number of hours people spend watching television per week.
\(0 - 5\) hours: \(20\) people
\(6 - 10\) hours: \(35\) people
\(11 - 15\) hours: \(25\) people
\(16 - 20\) hours: \(10\) people
Create a table to organize this data.
What type of data collection method was most likely used to gather this data?
What are some other questions that could be asked to further explore this topic?
Discuss and share with other groups.
\(\textbf{Key Takeaway}\)
Methods of collecting data help us gather information. We can collect data directly through surveys, interviews, or experiments, or indirectly from books, reports, and online sources.
Example3.1.3.
A student wants to find out the most popular extracurricular activities among their classmates.
What data collection method would be most appropriate?
What are two examples of specific questions they could ask?
Solution.
Survey/Questionnaire.
Questions for students to ask are;
What extracurricular activities do you participate in?
How often do you participate in these activities?
Example3.1.4.
A local bakery, conducted a study asking customers about their favorite types of pastries. The results are:
Cakes: \(45\)%
Cookies: \(30\)%
Breads: \(20\)%
Doughnuts: \(5\)%
What type of data collection method was most likely used to gather this data?
What is the most popular type of pastry among customers?
What percentage of customers prefer either Cakes or Cookies?
What type of pastry is the least popular among customers?
Solution.
A survey.
Cakes
\(45% \) + \(30%\)= \(75%\)
Therefore, the percentage of customers who prefer either Cakes or Cookies are \(75%\)
Doughnuts
ExercisesExercises
1.
Which data collection method would be most suitable for finding out the most common types of litter found in your school compound?
Interviews
Surveys sent to parents
Observations
Analyzing government reports
2.
Describe one ethical consideration you should keep in mind when conducting interviews with community members.
3.
Why is it important to keep accurate records when conducting observations?
4.
Give one example of a situation where you would use secondary data collection in a Geography class.
Subsection3.1.2Representing Data using a Frequency Distribution Table
Activity3.1.3.
\(\textbf{Work in groups}\)
Below are the weekly pocket expenses (in Ksh) of a randomly selected group of \(25\) students.
\(\textbf{x}\) represents the values in the dataset
\(\overline{\textbf{x}}\) is the mean
\(\textbf{∑fx}\) is the sum of products of \(\textbf{ x}\) and \(\textbf{ f}\)
\(\textbf{∑f}\) is the Sum of frequencies
\(\textbf{Median}\)
Median is the middle value when the data is arranged in ascending or descending order. If the dataset has an even number of values, the median is the average of the two middle values.
To find the median for even numbers, the formula is;
Create a frequency distribution table using class intervals of \(10\text{,}\) starting from \(10 - 19\text{,}\)\(20 - 29\text{,}\)\(30 - 39\text{,}\) ..., \(70 - 79\text{.}\)
Determine the modal class
Estimate the mean and median from the frequency table.
2.
The population of \(50\) towns in Kakamega was recorded as follows:
Create a grouped frequency table with class intervals of \(10\text{,}\) starting from \(135 - 144\text{.}\)
Determine the modal class.
Estimate the mean and median from the distribution.
3.
The monthly electricity bills (in KES) of households in a town are recorded in the table below:
Table3.1.20.
Electricity Bill (KES)
Frequency (f)
\(1,000 - 1,999\)
\(6\)
\(2,000 - 2,999\)
\(10\)
\(3,000 - 3,999\)
\(14\)
\(4,000 - 4,999\)
\(12\)
\(5,000 - 5,999\)
\(8\)
Find the median electricity bill.
Identify the modal class.
4.
A researcher collects data on daily rainfall (in mm) over a month and organizes it into \(10\) equal class intervals
Table3.1.21.
Rainfall (mm)
Frequency (f)
\(0 - 9\)
\(2\)
\(10 - 19\)
\(4\)
\(20 - 29\)
\(6\)
\(30 - 39\)
\(8\)
\(40 - 49\)
\(10\)
\(50 - 59\)
\(12\)
\(60 - 69\)
\(9\)
\(70 - 79\)
\(6\)
\(80 - 89\)
\(4\)
\(90 - 100\)
\(2\)
Identify the modal class.
Calculate the Mean rainfall.
Determine the Median rainfall.
Subsection3.1.4Representation of Data
Subsubsection3.1.4.1Drawing Histograms and Frequency Polygons of Data
Activity3.1.6.
Work in groups
In a school with \(500\) students, their heights were measured and recorded in the following table.
Table3.1.22.
Height (cm)
Number of Students (Frequency)
\(140 - 149\)
\(30\)
\(150 - 159\)
\(70\)
\(160 - 169\)
\(110\)
\(170 - 179\)
\(150\)
\(180 - 189\)
\(90\)
\(190 - 199\)
\(50\)
Choose a suitable scale and represent the data on a histogram and a frequency polygon.
Compare and discuss your graphs with other groups.
\(\textbf{Key Takeaway}\)
A histogram uses adjacent bars to show frequency distribution, while a frequency polygon connects the midpoints of the bars with a line to show patterns.
\(\textbf{Class Width}\) is the difference between the upper and lower boundaries of a class.
\(\textbf{Equal class width}\) means all bars have the same width.
\(\textbf{Unequal class width}\) means bars have different widths to better represent uneven data.
\(\textbf{Frequency density}\) is a measure used in histograms to ensure that the area of each bar represents the actual frequency of observations, especially when class widths are unequal.
Frequency density is calculated using the formula:
\(\textbf{Frequency}\) is the number of observations in a class interval.
\(\textbf{Why Use Frequency Density Instead of Frequency?}\)
In a histogram, the area of each bar (not just the height) represents the frequency.
If class widths are unequal, simply plotting frequency would distort the representation.
Using frequency density ensures that the area of each bar remains proportional to the actual frequency.
\(\textbf{Midpoint}\) of a class interval represents the central value of that range. It is the average of the lower and upper boundaries of the class.
Create a frequency table with class intervals of \(5\) years
Draw a histogram to represent the data.
Draw a frequency polygon on the same axes.
Label the axes and provide titles for your graphs.
3.
A school collected data on the number of books read by students in a term. The following frequency table shows the results:
Table3.1.29.
Number of Books Read
Frequency
\(0 - 2\)
\(15\)
\(3 - 5\)
\(25\)
\(6 - 8\)
\(35\)
\(9 - 11\)
\(15\)
\(12 - 14\)
\(10\)
Draw a histogram to represent the data from your frequency table.
On the same axes, draw a frequency polygon.
Estimate the median number of books read. Explain your reasoning.
4.
A survey was conducted to find out how much time people spend on social media daily. The following data was collected:
Table3.1.30.
Time (Minutes)
Frequency
Class Width
Frequency Density
\(0 - 10\)
\(15\)
\(10\)
\(1.5\)
\(10 - 20\)
\(25\)
\(10\)
\(2.5\)
\(20 - 30\)
\(30\)
\(10\)
\(3.0\)
\(30 - 60\)
\(40\)
\(30\)
\(1.33\)
\(60 - 120\)
\(20\)
\(60\)
\(0.33\)
Draw a histogram to represent the sales data.
Draw a frequency polygon to represent the sales data.
Label your axes and provide appropriate titles for your graphs.
Subsection3.1.5Interpretation of data
Subsubsection3.1.5.1Interpreting Histograms and Frequency Polygons of Data
Activity3.1.7.
\(\textbf{Work in groups}\)
The histogram below represent a household’s daily water consumption (in liters) recorded over a month.
Determine the day when the water consumption was high.
Determine the day when the water consumption was low.
Discuss and share with other group.
\(\textbf{Key Takeaway}\)
Interpretation of data helps us understand collected information by finding patterns, trends, and connections so we can make better decisions.
Example3.1.31.
The histogram below represents the ages of attendees recorded by the organizers at a community event.
How many age groups are represented in the histogram?
What is the total number of attendees recorded in the histogram?
Which age group has the highest number of attendees?
Solution.
By counting the number of bars in the histogram, we can determine the number of age groups.
The bars are \(5\)
Therefore, there were five age groups that attended the event.
The total number of attendees is the sum of all frequencies (heights of the bars).
\(50+25+40+35+15\) = \(165\)
Therefore, the number of attendees were \(165\)
The age group corresponding to the tallest bar has the highest number of attendees.
Therefore, the age group \(10 - 15\) had the highest number of attendees, with a total of \(50\) participants.
Example3.1.32.
The graph below represents a histogram and frequency polygon of the distribution of exam scores of students in a Grade 10 class.
Describe the shape of the distribution of exam scores.
What is the midpoint of the class interval \(70 - 85\text{?}\)
Compare the height of the first bar (\(40 - 55\) score range) to the height of the last bar (\(85 - 95\) score range). What does this tell you about the number of students in those score ranges?
Solution.
The distribution is skewed to the right (positively skewed).
\(\displaystyle 77.5\)
The first bar is much taller than the last bar. This means that many more students got scores in the \(40 - 55\) range than in the \(85 - 95\) range.
ExercisesExercises
1.
The following histogram shows sales of milk (in litres) sold by Akiru.
What does the y-axis represent?
What does the x-axis represent?
Which day did Akiru
What does the shape of the histogram tell you about the sales pattern of milk?
Describe the shape of the histogram. Is it symmetrical or skewed? If skewed, is it skewed left or right?
2.
The following histogram shows the height of students in a grade 10 class.
Use the information from the graph to answer the following questions:
Calculate the frequency of individuals with heights between \(145 \textbf{ cm}\) and \(155 \textbf{ cm}\) Show your working
Identify the modal class.
Estimate the total number of individuals represented in the histogram.
Explain one difference between a histogram and a bar graph.
Describe the overall shape of the height distribution shown in the histogram.
3.
The following graph shows a histogram and frequency polygon of the weight of girls in a class.
4.
Interpret the histogram and frequency polygon graph below and answer the questions given.
Describe the overall shape of this rainfall distribution graph.
At which rainfall ranges do the frequencies seem to decline?
Calculate the total frequency across all rainfall ranges.
What range of rainfall appears most frequently?
Estimate the median rainfall range from this distribution.
Which rainfall range appears to be the mode of this distribution?
What might cause variations in rainfall distribution?