🐍 Built-in Python Functions: Essential Tools for Social Scientists

📚 Introduction

As a social scientist, you face unique challenges in data processing:

  • Cleaning survey responses with thousands of participants.

  • Analyzing trends across multiple years.

  • Combining data from different research sources.

  • Ensuring your analyses are reproducible.

Tired of spending hours in Excel performing these tasks manually? Python can transform days of repetitive work into minutes of automated processing. In this guide, you'll learn 12 essential functions that:

  • Automate survey analysis.

  • Simplify longitudinal studies.

  • Speed up data cleaning.

  • Guarantee consistent and reproducible results.

This article is your first step in Python, focusing on the most basic yet powerful built-in functions. It's important to know that Python offers even more specialized tools for social research:


  • Pandas: library specifically designed for data analysis.

  • NumPy: advanced functions for statistical calculations.

  • Matplotlib: professional result visualization.

  • SciPy: statistical tests and advanced analysis.

Mastering today's basic functions will prepare you to leverage these advanced tools in the future.

🔍 Building on What We've Learned

In previous articles, we explored these basic Python functions:

Function Purpose
print(), input() Program communication
type() Data type identification
int(), float(), str() Type conversion
round() Working with precise numbers

Now we'll take the next step: built-in functions that will transform your research data processing.

🛠️ The Functions We'll Learn

1. 📊 Basic Analysis

Functions for instant descriptive statistics:

Function Purpose Example Result
len() Count elements len([1, 2, 3]) 3
sum() Sum values sum([1, 2, 3]) 6
max() Find maximum max(1, 2, 3) 3
min() Find minimum min(1, 2, 3) 1

2. 🔄 Generation and Sorting

Functions for creating and reorganizing sequences:

Function Purpose Example Result
range() Generate sequences range(1, 4) 1, 2, 3
reversed() Reverse order reversed([1, 2, 3]) 3, 2, 1

3. 📋 Data Transformation

Functions for efficiently organizing information:

Function Purpose Example Result
sorted() Sort elements sorted([3, 1, 2]) [1, 2, 3]
enumerate() Number elements enumerate(['a', 'b']) (0,'a'), (1,'b')
zip() Combine sequences zip(['a','b'], [1,2]) ('a',1), ('b',2)

4. ⚡ Sequence Processing

Functions for automating operations:

Function Purpose Example Result
map() Transform elements map(float, ["1", "2"]) 1.0, 2.0
filter() Select elements filter(None, [0, 1, 0]) [1]

💡 Practical Examples

The functions we've just seen are powerful on their own, but their true value becomes apparent in real social research challenges. Next, we'll explore four common scenarios that every scientist faces, and how these functions can automate and simplify your work.

Example 1: From Survey to Insights in Seconds

Using basic analysis functions (len, sum, max, min)

Imagine this common scenario. You've just completed a community satisfaction survey, and your director asks for a statistical summary for the meeting in 15 minutes. With Excel, you'd have to create separate formulas for each calculation. With Python, you can get all results in seconds:

  • Total number of participants.

  • Average satisfaction.

  • Maximum and minimum levels.

  • All with just a few lines of code.

# Community satisfaction survey data
responses = [5, 4, 5, 3, 4, 5, 2, 5, 4, 3]  # 10 responses on 1-5 scale

# 1. Basic participation analysis

total_participants = len(responses)
average_satisfaction = sum(responses) / len(responses)
maximum_level = max(responses)
minimum_level = min(responses)

# Display results

print("=== Community Satisfaction Analysis ===")
print(f"Total participants: {total_participants}")
print(f"Average satisfaction level: {average_satisfaction:.1f}/5")
print(f"Highest level reported: {maximum_level}")
print(f"Lowest level reported: {minimum_level}")

This code shows:

  • How many people responded to the survey.

  • The average satisfaction level in the community.

  • The maximum and minimum satisfaction levels reported.

💡 Notes on Optional Arguments

Each function has additional options useful in your research:

len()

  • Takes no optional arguments.

  • Works with any type of sequence (lists, tuples, strings).

sum()

  • Accepts an initial value: sum([1, 2, 3], start=10) → 16

  • Useful for adding a base value to your sum.

max() and min()

  • Accept a default value. For example, max([], default=0) → 0

  • Useful for handling empty sequences without errors.

  • Can use key to define the comparison criterion:

    • max([1, -5, 3], key=abs) → -5 (because |-5| is the largest absolute value).

    • max(["a", "bb", "ccc"], key=len) → "ccc" (because it has the greatest length).

    • min(["Ana", "juan", "María"], key=str.lower) → "Ana" (case-insensitive comparison).

Example 2: Temporal Analysis of Social Trends

Organizing chronological data with range() and reversed()

As a social researcher, you need to work with sequences of years and identifier numbers. For example:

  • Generate year ranges for a longitudinal study.

  • Create sequential IDs for participants.

  • Analyze data in chronological or reverse order.

Let's see how Python makes these tasks easier:

# 1. Generating year sequences for a study
print("=== Ranges for Longitudinal Study ===")
study_years = list(range(2019, 2024))  # Generates years from 2019 to 2023
print("Study period:", study_years)
print("Reverse chronological order:", list(reversed(study_years)))

# 2. Creating IDs for participants

print("\n=== Participant IDs ===")
participant_ids = list(range(101, 106))  # Generates IDs from 101 to 105
print("Assigned IDs:", participant_ids)

# 3. Generating measurement scales

print("\n=== Likert Scale ===")
likert_scale = list(range(1, 6))  # Generates scale from 1 to 5
print("Scale levels:", likert_scale)
print("Inverted scale:", list(reversed(likert_scale)))

💡 Notes on Optional Arguments

The functions for generating and sorting sequences have useful additional features:

range()

  • Can take 1, 2, or 3 arguments:

    • range(5) → generates numbers from 0 to 4.

    • range(2, 5) → generates numbers from 2 to 4.

    • range(0, 10, 2) → generates even numbers: 0, 2, 4, 6, 8.

  • The step can be negative. For example, range(10, 0, -1) → countdown from 10 to 1.

reversed()

  • Takes no optional arguments.

  • Useful for inverting sequences.

Example 3: Organizing and Combining Data

Using sorted(), enumerate() and zip()

As we saw earlier, lists are ordered collections of elements in square brackets []. In addition to reversing their order with reversed(), we can also:

  • Sort them in different ways with sorted()

  • Automatically number them with enumerate()

  • Combine lists with zip()

For example, if we have a list of grades [7.5, 6.0, 9.0], we can sort it from lowest to highest or vice versa. Let's see how to use these functions with research data:

# Basic examples with lists
ages = [25, 30, 28]        # List of ages
names = ["Ana", "Juan", "María"]  # List of names

# 1. Sort ages from lowest to highest using sorted()

print("=== sorted() Function ===")
print("Original list:", ages)
print("Sorted list:", sorted(ages))  # Shows: [25, 28, 30]

# 2. Create indices with enumerate()

print("\n=== enumerate() Function ===")
print("Original list:", names)
print("With indices:", list(enumerate(names)))  # Shows: [(0, "Ana"), (1, "Juan"), (2, "María")]

# 3. Combine lists with zip()

print("\n=== zip() Function ===")
print("Names:", names)
print("Ages:", ages)
print("Combined:", list(zip(names, ages)))  # Shows: [("Ana", 25), ("Juan", 30), ("María", 28)]

💡 Notes on Optional Arguments

The functions we've seen have useful additional options:

sorted()

  • Accepts reverse=True to sort from highest to lowest. For example, sorted([1, 2, 3], reverse=True) → [3, 2, 1].

  • Can use key to sort by specific criteria (we'll cover this in upcoming articles).

enumerate()

  • Accepts start to begin from another number. For example, enumerate(names, start=1) → [(1, "Ana"), (2, "Juan"), (3, "María")]

  • By default, if we don't specify start, it begins from 0.

zip()

  • Can combine more than two sequences using zip(names, ages, other_data).

  • Stops at the end of the shortest sequence.

Note: In upcoming articles, we'll learn to work with the results of these functions using loops and other advanced methods.

Example 4: Survey Data Cleaning and Preparation

Automating repetitive tasks with map() and filter()

As a social researcher, much of your time is spent preparing data for analysis. For example:

  • Converting text ages to numbers. For example, "25" becomes 25.

  • Filtering valid survey responses.

  • Transforming response codes to a standard format.

Let's see how to automate these common tasks:

# 1. Convert demographic data
print("=== Demographic Data Processing ===")
text_ages = ["25", "30", "28", "35", "29"]
numeric_ages = list(map(int, text_ages))
print("Original ages (text):", text_ages)
print("Converted ages (numbers):", numeric_ages)

#2. Filter valid responses

print("\n=== Response Validation ===")
responses = [1, 0, 1, 1, 0, 1]  # 1=Completed, 0=Not completed
valid_responses = list(filter(None, responses))
print("Total responses:", len(responses))
print("Valid responses:", len(valid_responses))

💡 Notes on Optional Arguments

The transformation and filtering functions have useful additional features:

  • map()

    • Accepts any value-transforming function as its first argument:

      • map(float, ["1.5", "2.5"]) → converts strings to decimals.

      • map(int, ["1", "2"]) → converts strings to integers.

      • map(str, [1, 2]) → converts numbers to strings.

    • The function must be applicable to each element.

    • Always needs two arguments: the function and the sequence.

filter()

  • Can use None to filter true values:

    • filter(None, [0, 1, 0, 1]) → keeps only the 1s.

    • filter(None, ["", "hello", "", "world"]) → keeps only non-empty strings.

    • filter(None, [0, False, None, 1, True]) → keeps only true values.

  • Also accepts a function that returns True or False. We'll cover this when we study functions.

Note: The map() and filter() functions are useful with large amounts of data, as they allow us to transform or filter all elements at once.

🏋️ Test Your Knowledge!

Challenge: NGO Data Analysis

An NGO collected information about community projects and needs your help processing it. Use the functions you learned to solve these exercises:

Available data:

project_years = ["2021", "2022", "2020", "2023", "2022"]
participants = [45, 60, 35, 80, 55]
budgets = [5000, 6000, 4500, 8000, 5500]
statuses = [1, 1, 0, 1, 0]  # 1 = Completed, 0 = In progress

Exercises:

  1. Convert the text years to numbers using map().

  2. Calculate:

  • Total participants (using sum()).

  • Project with most and least participants using max() and min().

  1. Filter completed projects using filter().

  2. Use range() to generate sequential IDs for the projects.

Bonus: Use sorted() to order the budgets from highest to lowest.

Expected result:

Years as numbers: [2021, 2022, 2020, 2023, 2022]
Total participants: 275
Largest group: 80
Smallest group: 35
Completed projects: 3
Assigned IDs: [101, 102, 103, 104, 105]
Budgets from highest to lowest: [8000, 6000, 5500, 5000, 4500]

⚠️ Try it first! Solve these exercises on your own. Practice is the best way to learn. The solution is below.


View solution:

# NGO Data

project_years = ["2021", "2022", "2020", "2023", "2022"]
participants = [45, 60, 35, 80, 55]
budgets = [5000, 6000, 4500, 8000, 5500]
statuses = [1, 1, 0, 1, 0]  # 1 = Completed, 0 = In progress

# 1. Convert text years to numbers
# map(int, ...) applies the int() function to each element of project_years
# list() converts the map() result into a viewable list

numeric_years = list(map(int, project_years))
print("Years as numbers:", numeric_years)

# 2. Participant analysis
# sum() adds all numbers in the participants list

total_participants = sum(participants)

# max() finds the highest value in participants

max_participants = max(participants)

# min() finds the lowest value in participants

min_participants = min(participants)
print(f"Total participants: {total_participants}")
print(f"Largest group: {max_participants}")
print(f"Smallest group: {min_participants}")

# 3. Filter completed projects
# filter(None, statuses) keeps only true values (1s)
# list() converts the filter() result into a list
# len() counts how many completed projects there are

completed_projects = list(filter(None, statuses))
print(f"Completed projects: {len(completed_projects)}")

# 4. Generate sequential IDs
# range(101, 106) generates numbers from 101 to 105
# - We start at 101 for more professional IDs
# - We end at 106 because range() doesn't include the last number

project_ids = list(range(101, 106))
print("Assigned IDs:", project_ids)

# Bonus: Sort budgets from highest to lowest
# sorted() orders numbers from lowest to highest by default
# reverse=True inverts the order to get highest to lowest

sorted_budgets = sorted(budgets, reverse=True)
print("Budgets from highest to lowest:", sorted_budgets)

💡 Notice how:

  • map(int, list): Transformed each text year to a number, enabling numerical operations.

  • sum(list): Calculated the total participants by adding all values.

  • max(list) and min(list): Found the largest and smallest groups.

  • filter(None, list): Selected only completed projects (value 1).

  • range(start, end): Generated sequential IDs from 101 to 105.

  • sorted(list, reverse=True): Ordered budgets from highest to lowest.

  • list(): Converted map(), filter(), and range() results into viewable lists.

Each function solved a specific task that would require multiple manual steps in Excel. Their combination enables efficient and reproducible data processing.

🎯 Conclusion

You've begun transforming your data processing. The 12 functions we've explored are equivalent to dozens of manual Excel operations:

📊 Instant Analysis

  • len(), sum(), max(), min(): Replace manual cell counting and =SUM(), =MAX(), =MIN() formulas

  • Benefit: Descriptive statistics in seconds, without dragging formulas.

⏱️ Sequence Automation

range() and reversed() replace manual series filling and column reordering.

  • Benefit: Automatic generation of years, IDs, and scales.

🔄 Efficient Organization

  • sorted(), enumerate(), zip() replace manual sorting and column combining.

  • Benefit: Data organization without copy and paste.

⚡ Mass Processing

  • map(), filter() replace cell-by-cell editing and manual filtering.

  • Benefit: Data transformation and cleaning in seconds.

📚 Additional Resources

In this article, we've covered some of the most essential built-in functions for social scientists, but Python offers many more. You can explore the complete list in the official documentation: https://docs.python.org/3/library/functions.html

Some of these additional functions will be explored in upcoming articles, while others you can discover on your own based on your specific research needs.

🚀 Next Steps

In the next article, "Python Lists: The Fundamental Data Structure", you'll learn to:

  • Organize your data in flexible structures.

  • Efficiently manipulate datasets.

  • Combine these functions with more advanced operations.


Did you find this guide useful? Share it with other social researchers and leave us your comments about what other topics you'd like us to cover!

 
Previous
Previous

📚 Python Lists Demystified: A Beginner's Guide for Social Research" [Part 1 of 3]

Next
Next

So You Want to Learn Python? (Please Don't Start with "Hello World")