🐍 Python and Numbers: The Definitive Guide for Social Researchers [Part 1 of 2]

Imagine this: you have 10,000 responses from a national survey. You need to calculate demographic percentages by region, create composite indices, and detect patterns in the responses. In Excel, this task would take days and would be full of risks: broken formulas, hard-to-track errors, and the constant threat of losing work due to a corrupt file.

As a social researcher, you deserve a tool that:

  • Automates the repetitive tasks that consume your time.

  • Guarantees the accuracy and reproducibility of your analyses.

  • Handles large volumes of data without complications.

Python offers exactly this. In this practical guide, I'll show you how to use Python to fundamentally transform your analysis process, saving you hours of manual work and eliminating common calculation errors.

What You'll Learn

  • Python number fundamentals: integers and decimals for social data.

  • Basic operations for processing survey data (percentages, averages, counts).

  • How to maintain precision in simple but essential calculations.

  • Basics for organizing and documenting your numerical analyses.

Don't worry if you've never programmed before: this guide assumes you're starting from scratch. We'll go step by step, using social research examples. In later parts, we'll explore advanced techniques, but for now, we'll focus on the basic concepts to get started.

Numbers in Social Research

Let's imagine a typical day analyzing research data. You'll primarily work with four types of numbers:

  • Simple counts: For example, when you count how many people responded to your survey. If you have 350 responses, you want to make sure that number is exact.

  • Averages: Like when you calculate the average age of your participants; if it's 34.5 years, you need to keep that decimal to be precise.

  • Percentages: Imagine you're measuring support for a public policy. 78% approval is clearer to communicate than 0.78 or "78 out of 100".

  • Proportions: Sometimes you need numbers between 0 and 1, like when calculating participation rates. A rate of 0.82 means 82% of your sample participated.

Each of these number types requires special treatment. In Excel, they all look the same in a cell, which can lead to confusion. Python will help us handle them more clearly and precisely, avoiding common errors like losing important decimals or confusing percentages with proportions.

Although each type of number has its particularities, we'll start by learning how Python helps us work with them in a basic and reliable way.

From Excel to Python: A New Paradigm

If you're used to Excel, switching to Python might seem challenging at first. But don't worry, it's like learning to use a new tool that will make your work easier. Let's look at a simple example we've all faced:

Imagine you're recording participants' ages in your study:

In Excel:

  • You write "42" in cell A1.

  • Is it a number or text? Depends on the cell format.

  • To calculate the average, you need to write =AVERAGE(A1:A100).

  • If you insert a row, your formulas might break!

In Python:

  • You write age = 42

    • Python automatically recognizes an integer.

    • You can use age in any calculation without worrying about its location.

    • Python will warn you if you try to do something incorrect (like dividing by zero).

    • Operations are clearer: average = sum(ages) / len(ages)

This difference might seem small, but it makes your work more:

  • Secure: Python prevents common errors.

  • Clear: The code says exactly what it does.

  • Reproducible: Other researchers can verify your calculations.

  • Flexible: You can process thousands of data points with the same code.

It's like moving from a basic calculator to a tool that understands what you want to do and helps you do it correctly.

Let's see how Python helps us work with numbers:

How Python Understands Numbers

Imagine you're starting to process data from your latest survey. When you write a number in Python, like:

participant_age = 42

Python automatically does several things for you:

1. Understands the data type

  • Knows that 42 is an integer (for counting people).

  • If you wrote 42.5, it would know you need decimals (for averages).

  • You don't need to "format cells" like in Excel.

2. Protects you from common errors

  • If you try to divide by zero, it warns you instead of giving a silent error.

  • If you accidentally mix text with numbers (like "42" + 1), it alerts you.

  • Helps you maintain data integrity.

3. Makes calculations more intuitive

  • To add one person: participant_age + 1 # Gives 43.

  • To calculate percentages: (women / total) * 100

  • Operations read like you'd write them on paper.

4. Includes useful tools (methods)

  • You can easily convert between number types.

  • Example: float(42) converts to decimal: 42.0

  • Example: int(42.8) converts to integer: 42.

These tools are like "actions" you can perform on your numbers. Don't worry about memorizing everything now - we'll use them as needed in practical examples.

The Two Basic Types of Numbers in Python

Analyzing demographic data often involves working with large numbers, such as census data or research budgets.

1. Integer Numbers (Integer): For counting whole things

Imagine you're processing an electoral survey:

# Electoral participation data
registered_voters = 5000    # Can't have half a voter
votes_cast = 4850          # We count whole votes
minimum_age = 18           # Ages are integer numbers

# Checking if a number is really an integer
decimal_number = 42.0
is_integer = decimal_number.is_integer()
print(f"Is 42.0 an integer?: {is_integer}")    # Shows: True

another_decimal = 42.5
is_integer = another_decimal.is_integer()
print(f"Is 42.5 an integer?: {is_integer}")    # Shows: False

💡 When to use integers? Use them when:

  • Counting people: "350 participants responded to the survey".

  • Recording exact ages: "the participant is 42 years old".

  • Need whole numbers: "we received 128 responses".

💡 Practical tip: The is_integer() method is useful when:

  • Processing data from files or surveys where numbers might come with decimals.

  • Need to verify if a decimal number is actually an integer (like 42.0).

  • Want to validate data before making conversions.

2. Números Decimales (Float): Para promedios y porcentajes

# Survey analysis
participation = 97.5          # Response percentage
average_age = 34.7           # Average age (with decimals)
women_proportion = 0.62      # 62% of the sample

# Rounding numbers for reports
rounded_age = round(average_age)      # Rounds to 35
print(f"Rounded age: {rounded_age} years")

# Controlling decimals with round()
adjusted_participation = round(participation, 1)    # One decimal: 97.5
print(f"Participation: {adjusted_participation}%")

adjusted_proportion = round(women_proportion, 2)    # Two decimals: 0.62
print(f"Women proportion: {adjusted_proportion}")

💡 When to use decimals? Use them when:

  • Calculating percentages: "75.5% agree".

  • Working with averages: "average age of 34.7 years".

  • Handling proportions: "0.62 represents 62% women".

  • Need precision: "development index of 0.823".

💡 Practical tip:

  • If you're counting something that can't be divided (people, responses), use integers.

  • If you need decimals for precision (averages, percentages), use decimals.

  • Python will help you maintain this distinction automatically.

  • The round() function is essential when:

    • You need to present results without too many decimals.

    • You want to control your data precision:

      • round(number) rounds to the nearest integer.

      • round(number, 1) keeps one decimal.

      • round(number, 2) keeps two decimals.

    • Preparing data for reports or visualizations.

A Practical Example: Basic Analysis of a Social Intervention

Let's put into practice what we've learned with a real example. Imagine you're evaluating the impact of a job training workshop that was delivered in two different formats (in-person and virtual):

# 1. Recording workshop participants
in_person_participants = 150    # Group A (in-person)
virtual_participants = 150      # Group B (virtual)

# 2. Storing satisfaction scores (0-10)
in_person_satisfaction = 8.7     # Decimal scores for more precision
virtual_satisfaction = 7.5       # Python knows we need decimals here

# 3. Now Python helps us with calculations
# First, total participants
total_participants = in_person_participants + virtual_participants
print(f"Total participants: {total_participants}")    # Shows: 300

# Calculate and round average satisfaction
average_satisfaction = (in_person_satisfaction + virtual_satisfaction) / 2
rounded_average = round(average_satisfaction, 1)    # Round to one decimal
print(f"Average satisfaction: {rounded_average}")    # Shows: 8.1

# Check if the average is an integer
is_integer = average_satisfaction.is_integer()
print(f"Is the average an integer?: {is_integer}")    # Shows: False

# See the difference between groups
difference = in_person_satisfaction - virtual_satisfaction
print(f"Difference between groups: {difference:.1f}")    # Shows: 1.2

💡 What's happening here?

  • We use integers to count people (can't have half a participant).

  • We use decimals for ratings (we need that precision).

  • Python automatically maintains the correct number type.

  • We control decimals when displaying results (.1f shows one decimal).

💡 Advantages over Excel:

  • No need for complex formulas with cell references.

  • Calculations are clearer and easier to verify.

  • You can easily repeat the analysis with different data.

  • The code documents exactly what you did.

How Python Protects Your Numerical Data

As your analysis grows more complex, you'll appreciate how Python protects your numerical data from accidental changes. Imagine you're processing survey data and want to update the number of participants:

# Initial participant registration
participants = 150

# If we try to modify the number directly...
participants[0] = 2  # ❌ Error: Python doesn't allow direct number modification

# The correct way is to create a new variable
updated_participants = participants + 10  # ✅ Correct: 160

💡 Why is this good for your research?

  • Protects your original data from accidental modifications.

  • Forces you to document each change you make.

  • Facilitates the reproducibility of your analysis.

  • Allows you to track the history of your calculations.

Practical tip: Create new variables with descriptive names that indicate what they represent:

  • day1_participants = 150 # Original data.

  • updated_participants = day1_participants + 50 # After more registrations.

  • final_participants = updated_participants + 25 # When registration closes.

Example: Tracking an Ongoing Survey

# First day of surveys
day1_responses = 50
print(f"First day responses: {day1_responses}")    # Shows: 50

# Second day: add more responses
cumulative_responses = day1_responses + 30
print(f"Cumulative responses: {cumulative_responses}")    # Shows: 80

# First day data remains intact
print(f"First day responses: {day1_responses}")    # Still shows: 50

# Calculate daily average
daily_average = cumulative_responses / 2
print(f"Daily average: {daily_average}")    # Shows: 40.0

💡 Notice how:

  • Each variable has a name that clearly describes its content.

  • Original data (day1_responses) remains unchanged.

  • Each new calculation generates a new variable with its own name.

  • Comments document what each line of code shows.

  • You can track the entire process from initial data to final result.

Basic Operations with Numbers

With a solid understanding of data protection, let's explore the essential operations you'll use most in your social research. We'll start with practical examples of demographic analysis:

Fundamental Operations

# Demographic analysis of study participants
total_sample = 120           # Total sample size
women = 75                  # Female participants
men = 45                    # Male participants

# 1. Addition (+): Total verification
calculated_total = women + men
print(f"Total participants: {calculated_total}")    # Should be 120
print(f"Do the numbers add up?: {calculated_total == total_sample}")    # True

# 2. Subtraction (-): Gap analysis
gender_gap = women - men
print(f"Gender difference: {gender_gap} more women")    # 30

# 3. Division (/): Percentage calculation
women_percentage = (women / total_sample) * 100
men_percentage = (men / total_sample) * 100
print(f"Gender distribution:")
print(f"- Women: {women_percentage:.1f}%")    # 62.5%
print(f"- Men: {men_percentage:.1f}%")    # 37.5%

# 4. Multiplication (*): Scale conversion
satisfaction_scale5 = 4.2    # Rating on 1-5 scale
satisfaction_scale10 = satisfaction_scale5 * 2    # Convert to 1-10 scale
print(f"Satisfaction (1-10 scale): {satisfaction_scale10:.1f}")    # 8.4

💡 Notice how:

  • Each operation has a specific purpose in the analysis:

    • Addition → verify that data adds up.

    • Subtraction → analyze gaps or differences.

    • Division → calculate percentages and proportions.

    • Multiplication → convert between scales.

  • Variable names are descriptive and follow a clear convention.

  • Comments explain the purpose of each operation.

  • The code produces formatted results ready for reports.

💪 Practice What You've Learned!

Now that you understand the basics, let's strengthen your skills with some real-world challenges:

Challenge 1: Analyzing a Pilot Survey

You have the following data from a pilot survey:

  • 45 participants answered "Yes".

  • 30 participants answered "No".

  • 15 participants answered "Maybe".

Your mission:

  1. Calculate the total number of participants.

  2. Calculate the percentage for each response and round to one decimal using round()

  3. Verify that the percentages sum to 100%.

⚠️ Try solving it on your own before looking at the solution!

Solution:

# Survey data
yes_response = 45      # Participants who said "Yes"
no_response = 30      # Participants who said "No"
maybe_response = 15   # Participants who said "Maybe"

# 1. Calculate total
total_participants = yes_response + no_response + maybe_response
print(f"Total participants: {total_participants}")    # 90

# 2. Calculate and round percentages
yes_percentage = round((yes_response / total_participants) * 100, 1)
no_percentage = round((no_response / total_participants) * 100, 1)
maybe_percentage = round((maybe_response / total_participants) * 100, 1)

# 3. Show results
print("\nResponse distribution:")
print(f"- Yes:    {yes_percentage}%")      # 50.0%
print(f"- No:     {no_percentage}%")      # 33.3%
print(f"- Maybe:  {maybe_percentage}%")  # 16.7%

# 4. Verify sum equals 100%
total_percentages = yes_percentage + no_percentage + maybe_percentage
print(f"\nPercentage sum: {total_percentages}%")  # 100.0%

💡 What did we learn?

  • How to calculate precise percentages using round()

  • The importance of verifying percentages sum to 100%.

  • The proper format for presenting statistical results.

Challenge 2: Comparing Budgets

In a research project you have:

  • Survey takers budget: 5000

  • Materials budget: 3000

  • Transportation budget: 2000

Your mission:

  1. Calculate the total budget

  2. Calculate the percentage of each item and round to one decimal

  3. Calculate how much more is spent on survey takers than on transportation

⚠️ Try solving it on your own before looking at the solution!

Solution:

# Budget data
survey_takers_budget = 5000
materials_budget = 3000
transportation_budget = 2000

# 1. Total budget
total_budget = survey_takers_budget + materials_budget + transportation_budget
print(f"Total budget: ${total_budget:,}")    # 10,000

# 2. Percentages by item and rounding
survey_takers_percentage = round((survey_takers_budget / total_budget) * 100, 1)
materials_percentage = round((materials_budget / total_budget) * 100, 1)
transportation_percentage = round((transportation_budget / total_budget) * 100, 1)

print("\nBudget distribution:")
print(f"- Survey takers: {survey_takers_percentage}%")    # 50.0%
print(f"- Materials:     {materials_percentage}%")       # 30.0%
print(f"- Transportation:{transportation_percentage}%")       # 20.0%

# 3. Difference between survey takers and transportation
difference = survey_takers_budget - transportation_budget
print(f"\nDifference between survey takers and transportation: ${difference:,}")    # 3,000

💡 What did we learn?

  • How to handle budgets and calculate distributions.

  • Using thousands separators for better readability.

  • Professional presentation of rounded percentages.

Challenge 3: Analyzing Grades

You have three grades from the same participant:

  • First evaluation: 8.5

  • Second evaluation: 7.5

  • Third evaluation: 9.0

Your mission:

  1. Calculate the average of the three grades.

  2. Verify if the average needs decimals using is_integer() and display the result appropriately.

  3. Calculate the percentage of the maximum possible (10 points) and round it to two decimals.

  4. Calculate how many points are needed to reach 10.

⚠️ Try solving it on your own before looking at the solution!

Solution:

 # Grade data
evaluation1 = 8.5
evaluation2 = 7.5
evaluation3 = 9.0

# 1. Calculate average
evaluation_sum = evaluation1 + evaluation2 + evaluation3
average = evaluation_sum / 3

# 2. Verify if decimals needed and display appropriately
if average.is_integer():
    print(f"Average: {int(average)}")
else:
    rounded_average = round(average, 1)
    print(f"Average: {rounded_average}")

# 3. Calculate and round percentage
percentage = (average / 10) * 100
rounded_percentage = round(percentage, 2)
print(f"Achieved percentage: {rounded_percentage}%")

# 4. Calculate missing points
missing_points = round(10 - average, 1)
print(f"Missing points: {missing_points}")

💡 What did we learn?

  • Precise handling of averages and grades.

  • When and how to use is_integer() to improve presentation.

  • The importance of rounding in educational data analysis.

Diving Deeper into Operations

Now that you've practiced basic operations with the previous challenges, let's see how to apply them in more complex analyses. We'll start with a detailed example of demographic analysis.

Detailed Demographic Analysis

# Age analysis in our sample
minimum_age = 18              # Minimum age to participate
maximum_age = 65             # Maximum age recorded
young_participants = 30      # Participants under 25
adult_participants = 90      # Participants 25 or older

# 1. Calculate basic age statistics
age_range = maximum_age - minimum_age
print(f"Study age range: {age_range} years")    # 47 years

# 2. Analyze age group distribution
total_participants = young_participants + adult_participants
print(f"Total participants: {total_participants}")    # 120

# 3. Calculate percentages by group
young_percentage = (young_participants / total_participants) * 100
adult_percentage = (adult_participants / total_participants) * 100

# 4. Present demographic results
print("\nAge group distribution:")
print(f"- Under 25: {young_percentage:.1f}%")    # 25.0%
print(f"- 25 or older: {adult_percentage:.1f}%")    # 75.0%

💡 Notice how:

  • We combine subtraction for the range (maximum_age - minimum_age = 47) and division for percentages.

  • We use names that describe the data (minimum_age, young_participants).

  • We explain the data with comments (# Participants under 25).

  • We format percentages with one decimal ({young_percentage:.1f}%).

  • We verify that percentages sum to 100% (25% + 75%).

Comparing Data Between Groups

A crucial aspect of social research is comparing different groups or categories. Let's see how Python helps us analyze the distribution of participants by age groups:

# Participation by age group
group_18_25 = 150    # Participants aged 18-25
group_26_35 = 230    # Participants aged 26-35
group_36_plus = 120  # Participants aged 36 or more

# Calculate total
total = group_18_25 + group_26_35 + group_36_plus

print(f"Total participants: {total}")  # Shows: 500

# Calculate percentages
percentage_18_25 = (group_18_25 / total) * 100
percentage_26_35 = (group_26_35 / total) * 100
percentage_36_plus = (group_36_plus / total) * 100

# Show distribution
print("\nAge distribution:")
print(f"- 18-25 years: {percentage_18_25:.1f}%")  # Shows: 30.0%
print(f"- 26-35 years: {percentage_26_35:.1f}%")  # Shows: 46.0%
print(f"- 36 or more: {percentage_36_plus:.1f}%")   # Shows: 24.0%

💡 Notice how:

  • We organize data in variables with descriptive names (group_18_25).

  • We use the same formula for all percentages (group / total * 100).

  • We format the output for better presentation (:.1f}%).

  • We verify that percentages sum to 100% (30 + 46 + 24 = 100).

The Order of Operations: Avoiding Common Errors

When analyzing survey data with different sample sizes, we often need to calculate weighted averages to account for groups that carry more weight than others. Here, the order of operations is crucial:

# Calculating the weighted average of two groups
group_a = 20    # Number of participants group A
grade_a = 8.5   # Average grade group A
group_b = 30    # Number of participants group B
grade_b = 7.5   # Average grade group B

# Without parentheses (incorrect)
wrong_average = group_a * grade_a + group_b * grade_b / (group_a + group_b)
print("\nIncorrect calculation (without parentheses):")
print(f"group_a * grade_a + group_b * grade_b / (group_a + group_b)")
print(f"Incorrect result: {wrong_average:.1f}")  # Gives 177.5 (wrong!)

# With parentheses (correct)
correct_average = (group_a * grade_a + group_b * grade_b) / (group_a + group_b)
print("\nCorrect calculation (with parentheses):")
print(f"(group_a * grade_a + group_b * grade_b) / (group_a + group_b)")
print(f"Correct result: {correct_average:.1f}")  # Gives 7.9 (right!)

# Let's see why they're different
print("\nStep by step of correct calculation:")
print(f"1. Group A weight = {group_a} × {grade_a} = {group_a * grade_a}")        # = 170
print(f"2. Group B weight = {group_b} × {grade_b} = {group_b * grade_b}")        # = 225
print(f"3. Sum of weights = {group_a * grade_a + group_b * grade_b}")           # = 395
print(f"4. Total participants = {group_a + group_b}")                       # = 50
print(f"5. Final average = 395 ÷ 50 = {correct_average:.1f}")                 # = 7.9

💡 Notice how:

  • Without parentheses, Python calculates group_b * grade_b / (group_a + group_b) first, giving a meaningless result.

  • With parentheses, it first sums all weights (170 + 225) and then divides by the total.

  • The result 7.9 makes sense because it's between the lowest grade (7.5) and the highest grade (8.5).

  • Comments with values help us verify each step of the calculation.

Order of Operations in Python:

  1. First resolves what's inside parentheses.

  2. Then does multiplications and divisions (from left to right).

  3. Finally does additions and subtractions (from left to right).

Practical tip: When calculating weighted averages:

  1. Group weight multiplications and sums in parentheses.

  2. Group the total participants in parentheses.

  3. Add comments showing partial results.

💪 Practice the Order of Operations!

Challenge 4: Calculating Group Averages

In a job satisfaction survey you have:

  • Group A: 40 people, average rating 7.5

  • Group B: 60 people, average rating 8.2

Your mission:

  1. Calculate the overall average (hint: you'll need parentheses).

  2. Verify that your result is between 7.5 and 8.2

  3. Explain why the result is closer to 8.2

⚠️ Try solving it on your own before looking at the solution!

Solution:

# Group data
group_a = 40          # Group A participants
grade_a = 7.5         # Average grade A
group_b = 60          # Group B participants
grade_b = 8.2         # Average grade B

# Incorrect calculation (without parentheses)
wrong_average = group_a * grade_a + group_b * grade_b / (group_a + group_b)
print(f"Incorrect result: {wrong_average:.1f}")    # Makes no sense!

# Correct calculation (with parentheses)
correct_average = (group_a * grade_a + group_b * grade_b) / (group_a + group_b)
print(f"Correct result: {correct_average:.1f}")     # 7.9

Challenge 5: Percentages and Proportions

You have the following participation data:

  • 120 people were invited

  • 85% confirmed attendance

  • Of those who confirmed, 90% attended

Your mission:

  1. Calculate how many people confirmed

  2. Calculate how many people attended

  3. Calculate the final attendance percentage of total invitees

⚠️ Try solving it on your own before looking at the solution!

Solution:

# Initial data
invitees = 120
confirmation_rate = 85    # 85%
attendance_rate = 90      # 90%

# 1. People who confirmed
confirmed = (invitees * confirmation_rate) / 100
print(f"Confirmed: {confirmed}")    # 102

# 2. People who attended
attendees = (confirmed * attendance_rate) / 100
print(f"Attended: {attendees}")    # 91.8

# 3. Final attendance percentage
final_percentage = (attendees / invitees) * 100
print(f"Attendance percentage: {final_percentage:.1f}%")    # 76.5%

💡 Note: These challenges emphasize the importance of operation order and using parentheses. Remember that parentheses help you control which operations are performed first.

Converting Between Number Types

Up until now, we've worked with integers and decimals as they come in our data. However, in social research, we frequently need to convert numbers between different formats: transforming percentages to proportions, rounding results for reports, or converting text to numbers when importing data. Let's see how Python makes these tasks easier:

# Data from a job satisfaction survey
responses = "150"          # Number as text (for example, from a CSV file)
satisfied = 120           # Integer number
percentage = 80.0         # Decimal number

# 1. Convert text to number
total = int(responses)     # Converts "150" to 150

print(f"Total responses: {total}")  # Shows: 150

# 2. Calculate and round percentages
exact_percentage = (satisfied / total) * 100    # Gives 80.0
rounded_percentage = int(exact_percentage)     # Gives 80

print(f"Exact percentage: {exact_percentage:.1f}%")      # Shows: 80.0%
print(f"Rounded percentage: {rounded_percentage}%")  # Shows: 80%

# 3. Convert integer to decimal for precise calculations
average = float(satisfied) / float(total)

print(f"Proportion as decimal: {average:.2f}")  # Shows: 0.80

💡 Notice how:

  • We use int() to convert text to integer numbers ("150"150).

  • We also use int() to round decimals (80.080).

  • We use float() when we need precise decimals.

  • The format :.1f helps us show only one decimal in percentages.

Most common conversion functions:

  1. int(): To get integer numbers.

  2. float(): To get decimal numbers.

  3. str(): To convert numbers to text.

Practical tip: When working with numerical data:

  1. Convert texts to numbers as soon as you read them from your files.

  2. Use decimals (float) for percentage and average calculations.

  3. Round results only at the end, for showing them in reports.

Avoiding Common Errors When Converting Numbers

When working with survey data, it's common to encounter some problems when converting numbers. Let's see how to avoid them:

# 1. Be careful with decimals when converting to integers
average_age = 45.7
age_int = int(average_age)        # Gives 45 (truncates, doesn't round)
rounded_age = round(average_age)   # Gives 46 (rounds correctly)

print(f"Original age: {average_age}")
print(f"With int(): {age_int}")          # Warning! We lost information
print(f"With round(): {rounded_age}")     # Better for reports

# 2. Converting text with numbers
response = "18.5"
# First convert to decimal, then to integer
age = int(float(response))   # float("18.5") → 18.5, then int(18.5) → 18
print(f"Age: {age}")         # Shows: 18

# 3. Rounding percentages
percentage = 76.7234
rounded_percentage = round(percentage, 1)  # Round to one decimal
print(f"Percentage: {rounded_percentage}%")  # Shows: 76.7%

💡 Notice how:

  • int() always removes decimals without rounding (45.745).

  • To round, use round() before converting to integer.

  • For text with decimals, we first use float() and then int()

  • round(number, 1) allows us to keep one decimal in percentages.

Practical tip: Before converting numbers:

  1. Decide if you need to round or truncate.

  2. If you have text with decimals, convert it to float() first.

  3. For reports, use round() to control decimals.

Working with Large Numbers

Analyzing demographic data often involves working with large numbers, such as census data or research budgets. Python offers us two ways to make them more readable:

# 1. Using underscores to write large numbers
participants = 12_500      # Easier to read than 12500
budget = 1_500_000        # Easier to read than 1500000

# Underscores don't affect the number
print(f"Participants: {participants}")       # Shows: 12500
print(f"Budget: {budget}")                  # Shows: 1500000

# 2. Using formatting to display large numbers
print(f"Participants: {participants:,}")     # Shows: 12,500
print(f"Budget: ${budget:,}")               # Shows: $1,500,000

# 3. Combining with decimals
response_rate = 92.5678
average_budget = budget / participants

print(f"Response rate: {response_rate:.1f}%")              # Shows: 92.6%
print(f"Cost per participant: ${average_budget:,.2f}")     # Shows: $120.00

💡 Notice how:

  • Underscores (12_500) make the code more readable.

  • The :, format adds commas to show thousands (12,500).

  • You can combine :, with .2f for decimals ($120.00).

  • Underscores are only visual, they don't affect the number.

Practical tip: For working with large numbers:

  1. Use underscores when writing numbers in your code.

  2. Use :, to display numbers with thousand separators.

  3. Combine with .1f or .2f to control decimals in reports.

Different Types of Division for Different Needs

Statistical analysis requires different types of division operations to calculate percentages, averages, and distributions. Different analytical tasks require different approaches to division. Let's explore how Python's division operations can handle various research scenarios:

# 1. Normal division (/) for percentages and averages
participants = 350
responses = 280
response_rate = responses / participants

print(f"Response rate: {response_rate:.2f}")     # Shows: 0.80
print(f"Percentage: {response_rate * 100:.1f}%")     # Shows: 80.0%

# 2. Integer division (//) for forming groups
total_students = 32
group_size = 5
number_of_groups = total_students // group_size
students_without_group = total_students % group_size

print(f"With {total_students} students you can form:")
print(f"- {number_of_groups} groups of {group_size}")   # Shows: 6 groups of 5
print(f"- {students_without_group} remain without group")   # Shows: 2 without group

# 3. Combining divisions in analysis
budget = 1000

# First calculate complete groups
complete_groups = total_students // group_size    # Gives 6

# Calculate budget per group
budget_per_group = budget / complete_groups

print(f"Budget per group: ${budget_per_group:.2f}")  # Shows: $166.67

💡 Notice how:

  • Normal division (/) always gives decimals (280/350 = 0.80).

  • Integer division (//) rounds down (32//5 = 6).

  • The remainder (%) tells you how much is left over (32%5 = 2).

  • You can combine types according to your need.

Practical tip: Choose the type of division based on your goal:

  1. Use / for exact percentages and averages.

  2. Use // when you need complete groups.

  3. Use % to know how many elements are left over.

Controlling Precision in Our Calculations

The credibility of your research often depends on presenting numbers with appropriate precision. Let's master Python's tools for controlling decimal places:

# 1. Percentages in reports
participants = 350
responses = 280
rate = responses / participants * 100

# Different levels of precision
print(f"Response rate: {rate}%")           # Shows: 80.0%
print(f"With 1 decimal: {rate:.1f}%")           # Shows: 80.0%
print(f"With 2 decimals: {rate:.2f}%")         # Shows: 80.00%

# 2. Averages
group_a = 25
group_b = 30
total = group_a + group_b
average = total / 2
print(f"Average: {average:.1f}")   # Shows: 27.5

# 3. Budgets
budget = 1000
participants = 33
cost_per_person = budget / participants

print(f"Total budget: ${budget:,.2f}")           # Shows: $1,000.00
print(f"Cost per person: ${cost_per_person:.2f}")      # Shows: $30.30

💡 Notice how:

  • For percentages, one decimal is usually enough (.1f).

  • For money, we use two decimals (.2f).

  • For averages, we normally use one decimal.

  • We can combine thousands format (,) with decimals (.2f).

Practical tip: Choose precision according to context:

  1. Percentages → one decimal (80.5%).

  2. Money → two decimals ($30.30).

  3. Averages → one decimal (27.5).

  4. Rates and proportions → two decimals (0.80).

💪 Final Challenge: Your First Research with Python!

Ready to put everything together? Let's tackle a comprehensive research scenario that combines all the concepts we've covered.

You have the following data from a national education survey:

  • "12500" students from public universities (data as text).

  • "7500" students from private universities (data as text).

  • Budget per public student: 8500.50

  • Budget per private student: 15750.75

Your mission as a researcher:

  1. Data Preparation

  • Convert student numbers from text to integers.

  • Calculate total number of students.

  • Calculate total budget for each sector, rounding to two decimals.

  1. Percentage Analysis

  • Calculate and round to one decimal the percentage each sector represents.

  • Calculate average investment per student and round to two decimals.

  • Determine the distribution of groups of 30 students.

  1. Comparative Analysis

  • Calculate total budget with thousand separators.

  • Compare budgets by sector (percentages rounded to one decimal).

  • Determine the investment difference between sectors.

⚠️ Try solving it on your own before looking at the solution!

Solution:

# 1. Data Preparation
# Convert text to numbers
public_students = int("12500")
private_students = int("7500")

# Budget per student
public_budget = 8500.50
private_budget = 15750.75

# Calculate totals
total_students = public_students + private_students
print(f"Total students: {total_students:,}")    # 20,000

# Calculate total budgets (rounded to 2 decimals)
total_public_budget = round(public_students * public_budget, 2)
total_private_budget = round(private_students * private_budget, 2)

# 2. Percentage Analysis
# Calculate and round percentages
public_percentage = round((public_students / total_students) * 100, 1)
private_percentage = round((private_students / total_students) * 100, 1)

print("\nStudent Distribution:")
print(f"- Public:  {public_percentage}%")    # 62.5%
print(f"- Private:  {private_percentage}%")    # 37.5%

# Calculate average investment per student
average_investment = round((total_public_budget + total_private_budget) / total_students, 2)
print(f"\nAverage investment per student: ${average_investment:,.2f}")

# Group formation
group_size = 30
complete_groups = total_students // group_size
remaining_students = total_students % group_size

print(f"\nYou can form {complete_groups} groups of {group_size}")
print(f"With {remaining_students} students remaining")

# 3. Comparative Analysis
total_budget = total_public_budget + total_private_budget
print(f"\nTotal budget: ${total_budget:,.2f}")

# Budget percentages (rounded)
public_budget_percentage = round((total_public_budget / total_budget) * 100, 1)
private_budget_percentage = round((total_private_budget / total_budget) * 100, 1)

print("\nBudget Distribution:")
print(f"- Public:  {public_budget_percentage}%")
print(f"- Private:  {private_budget_percentage}%")

# Difference between sectors
budget_difference = round(abs(total_private_budget - total_public_budget), 2)
print(f"\nInvestment difference between sectors: ${budget_difference:,.2f}")

💡 Tips for solving the challenge:

  • Start by converting texts to numbers.

  • Organize your calculations in clear sections.

  • Use integer division (//) for complete groups.

  • Use modulo (%) for remaining students.

  • Verify that percentages sum to 100%.

  • Use comma formatting for large numbers.

  • Keep two decimals for money amounts.

Key Concepts:

  1. Types of Numbers

  • Integers for counts (participants, responses).

  • Decimals for percentages and averages.

  1. Basic Operations

  • Normal division (/) for percentages.

  • Integer division (//) for groups.

  • Remainder (%) for leftover elements.

  1. Format and Precision

  • Decimal control (.1f, .2f).

  • Thousands separators for readability.

  • Professional result presentation.

Upcoming Topics:

In the second part we'll explore:

  • Comparison and logical operators for data analysis.

  • Advanced mathematical operations without libraries.

  • Numerical data input and output techniques.

  • Best practices for keeping your code clean and reliable.

  • Advanced tricks for efficient number manipulation.

Don't miss the next part where we'll learn these essential tools for data analysis in social research!

 
Previous
Previous

🐍 Python and Numbers: The Definitive Guide for Social Researchers [Part 2 of 2]

Next
Next

🧵 First Steps with Text Strings in Python: [Part 3 of 3]: Transforming Text with Essential Methods