🐍 Built-in Python Functions: Essential Tools for Social Scientists
📚 Introduction
As a social scientist, you face unique challenges in data processing:
Cleaning survey responses with thousands of participants.
Analyzing trends across multiple years.
Combining data from different research sources.
Ensuring your analyses are reproducible.
Tired of spending hours in Excel performing these tasks manually? Python can transform days of repetitive work into minutes of automated processing. In this guide, you'll learn 12 essential functions that:
Automate survey analysis.
Simplify longitudinal studies.
Speed up data cleaning.
Guarantee consistent and reproducible results.
This article is your first step in Python, focusing on the most basic yet powerful built-in functions. It's important to know that Python offers even more specialized tools for social research:
Pandas: library specifically designed for data analysis.
NumPy: advanced functions for statistical calculations.
Matplotlib: professional result visualization.
SciPy: statistical tests and advanced analysis.
Mastering today's basic functions will prepare you to leverage these advanced tools in the future.
🔍 Building on What We've Learned
In previous articles, we explored these basic Python functions:
Function | Purpose |
---|---|
print() , input() | Program communication |
type() | Data type identification |
int() , float() , str() | Type conversion |
round() | Working with precise numbers |
Now we'll take the next step: built-in functions that will transform your research data processing.
🛠️ The Functions We'll Learn
1. 📊 Basic Analysis
Functions for instant descriptive statistics:
Function | Purpose | Example | Result |
---|---|---|---|
len() |
Count elements | len([1, 2, 3]) |
3 |
sum() |
Sum values | sum([1, 2, 3]) |
6 |
max() |
Find maximum | max(1, 2, 3) |
3 |
min() |
Find minimum | min(1, 2, 3) |
1 |
2. 🔄 Generation and Sorting
Functions for creating and reorganizing sequences:
Function | Purpose | Example | Result |
---|---|---|---|
range() |
Generate sequences | range(1, 4) |
1, 2, 3 |
reversed() |
Reverse order | reversed([1, 2, 3]) |
3, 2, 1 |
3. 📋 Data Transformation
Functions for efficiently organizing information:
Function | Purpose | Example | Result |
---|---|---|---|
sorted() |
Sort elements | sorted([3, 1, 2]) |
[1, 2, 3] |
enumerate() |
Number elements | enumerate(['a', 'b']) |
(0,'a'), (1,'b') |
zip() |
Combine sequences | zip(['a','b'], [1,2]) |
('a',1), ('b',2) |
4. ⚡ Sequence Processing
Functions for automating operations:
Function | Purpose | Example | Result |
---|---|---|---|
map() |
Transform elements | map(float, ["1", "2"]) |
1.0, 2.0 |
filter() |
Select elements | filter(None, [0, 1, 0]) |
[1] |
💡 Practical Examples
The functions we've just seen are powerful on their own, but their true value becomes apparent in real social research challenges. Next, we'll explore four common scenarios that every scientist faces, and how these functions can automate and simplify your work.
Example 1: From Survey to Insights in Seconds
Using basic analysis functions (len, sum, max, min)
Imagine this common scenario. You've just completed a community satisfaction survey, and your director asks for a statistical summary for the meeting in 15 minutes. With Excel, you'd have to create separate formulas for each calculation. With Python, you can get all results in seconds:
Total number of participants.
Average satisfaction.
Maximum and minimum levels.
All with just a few lines of code.
# Community satisfaction survey data
responses = [5, 4, 5, 3, 4, 5, 2, 5, 4, 3] # 10 responses on 1-5 scale
# 1. Basic participation analysis
total_participants = len(responses)
average_satisfaction = sum(responses) / len(responses)
maximum_level = max(responses)
minimum_level = min(responses)
# Display results
print("=== Community Satisfaction Analysis ===")
print(f"Total participants: {total_participants}")
print(f"Average satisfaction level: {average_satisfaction:.1f}/5")
print(f"Highest level reported: {maximum_level}")
print(f"Lowest level reported: {minimum_level}")
This code shows:
How many people responded to the survey.
The average satisfaction level in the community.
The maximum and minimum satisfaction levels reported.
💡 Notes on Optional Arguments
Each function has additional options useful in your research:
len()
Takes no optional arguments.
Works with any type of sequence (lists, tuples, strings).
sum()
Accepts an initial value:
sum([1, 2, 3], start=10)
→ 16Useful for adding a base value to your sum.
max() and min()
Accept a default value. For example,
max([], default=0)
→ 0Useful for handling empty sequences without errors.
Can use
key
to define the comparison criterion:max([1, -5, 3], key=abs)
→ -5 (because |-5| is the largest absolute value).max(["a", "bb", "ccc"], key=len)
→ "ccc" (because it has the greatest length).min(["Ana", "juan", "María"], key=str.lower)
→ "Ana" (case-insensitive comparison).
Example 2: Temporal Analysis of Social Trends
Organizing chronological data with range() and reversed()
As a social researcher, you need to work with sequences of years and identifier numbers. For example:
Generate year ranges for a longitudinal study.
Create sequential IDs for participants.
Analyze data in chronological or reverse order.
Let's see how Python makes these tasks easier:
# 1. Generating year sequences for a study
print("=== Ranges for Longitudinal Study ===")
study_years = list(range(2019, 2024)) # Generates years from 2019 to 2023
print("Study period:", study_years)
print("Reverse chronological order:", list(reversed(study_years)))
# 2. Creating IDs for participants
print("\n=== Participant IDs ===")
participant_ids = list(range(101, 106)) # Generates IDs from 101 to 105
print("Assigned IDs:", participant_ids)
# 3. Generating measurement scales
print("\n=== Likert Scale ===")
likert_scale = list(range(1, 6)) # Generates scale from 1 to 5
print("Scale levels:", likert_scale)
print("Inverted scale:", list(reversed(likert_scale)))
💡 Notes on Optional Arguments
The functions for generating and sorting sequences have useful additional features:
range()
Can take 1, 2, or 3 arguments:
range(5)
→ generates numbers from 0 to 4.range(2, 5)
→ generates numbers from 2 to 4.range(0, 10, 2)
→ generates even numbers: 0, 2, 4, 6, 8.
The step can be negative. For example,
range(10, 0, -1)
→ countdown from 10 to 1.
reversed()
Takes no optional arguments.
Useful for inverting sequences.
Example 3: Organizing and Combining Data
Using sorted(), enumerate() and zip()
As we saw earlier, lists are ordered collections of elements in square brackets []. In addition to reversing their order with reversed(), we can also:
Sort them in different ways with sorted()
Automatically number them with enumerate()
Combine lists with zip()
For example, if we have a list of grades [7.5, 6.0, 9.0], we can sort it from lowest to highest or vice versa. Let's see how to use these functions with research data:
# Basic examples with lists
ages = [25, 30, 28] # List of ages
names = ["Ana", "Juan", "María"] # List of names
# 1. Sort ages from lowest to highest using sorted()
print("=== sorted() Function ===")
print("Original list:", ages)
print("Sorted list:", sorted(ages)) # Shows: [25, 28, 30]
# 2. Create indices with enumerate()
print("\n=== enumerate() Function ===")
print("Original list:", names)
print("With indices:", list(enumerate(names))) # Shows: [(0, "Ana"), (1, "Juan"), (2, "María")]
# 3. Combine lists with zip()
print("\n=== zip() Function ===")
print("Names:", names)
print("Ages:", ages)
print("Combined:", list(zip(names, ages))) # Shows: [("Ana", 25), ("Juan", 30), ("María", 28)]
💡 Notes on Optional Arguments
The functions we've seen have useful additional options:
sorted()
Accepts reverse=True to sort from highest to lowest. For example,
sorted([1, 2, 3], reverse=True)
→ [3, 2, 1].Can use key to sort by specific criteria (we'll cover this in upcoming articles).
enumerate()
Accepts start to begin from another number. For example,
enumerate(names, start=1)
→ [(1, "Ana"), (2, "Juan"), (3, "María")]By default, if we don't specify start, it begins from 0.
zip()
Can combine more than two sequences using
zip(names, ages, other_data).
Stops at the end of the shortest sequence.
Note: In upcoming articles, we'll learn to work with the results of these functions using loops and other advanced methods.
Example 4: Survey Data Cleaning and Preparation
Automating repetitive tasks with map() and filter()
As a social researcher, much of your time is spent preparing data for analysis. For example:
Converting text ages to numbers. For example, "25" becomes 25.
Filtering valid survey responses.
Transforming response codes to a standard format.
Let's see how to automate these common tasks:
# 1. Convert demographic data
print("=== Demographic Data Processing ===")
text_ages = ["25", "30", "28", "35", "29"]
numeric_ages = list(map(int, text_ages))
print("Original ages (text):", text_ages)
print("Converted ages (numbers):", numeric_ages)
#2. Filter valid responses
print("\n=== Response Validation ===")
responses = [1, 0, 1, 1, 0, 1] # 1=Completed, 0=Not completed
valid_responses = list(filter(None, responses))
print("Total responses:", len(responses))
print("Valid responses:", len(valid_responses))
💡 Notes on Optional Arguments
The transformation and filtering functions have useful additional features:
map()
Accepts any value-transforming function as its first argument:
map(float, ["1.5", "2.5"])
→ converts strings to decimals.map(int, ["1", "2"])
→ converts strings to integers.map(str, [1, 2])
→ converts numbers to strings.
The function must be applicable to each element.
Always needs two arguments: the function and the sequence.
filter()
Can use None to filter true values:
filter(None, [0, 1, 0, 1])
→ keeps only the 1s.filter(None, ["", "hello", "", "world"])
→ keeps only non-empty strings.filter(None, [0, False, None, 1, True])
→ keeps only true values.
Also accepts a function that returns True or False. We'll cover this when we study functions.
Note: The map() and filter() functions are useful with large amounts of data, as they allow us to transform or filter all elements at once.
🏋️ Test Your Knowledge!
Challenge: NGO Data Analysis
An NGO collected information about community projects and needs your help processing it. Use the functions you learned to solve these exercises:
Available data:
project_years = ["2021", "2022", "2020", "2023", "2022"]
participants = [45, 60, 35, 80, 55]
budgets = [5000, 6000, 4500, 8000, 5500]
statuses = [1, 1, 0, 1, 0] # 1 = Completed, 0 = In progress
Exercises:
Convert the text years to numbers using
map()
.Calculate:
Total participants (using
sum()
).Project with most and least participants using max() and min().
Filter completed projects using
filter()
.Use range() to generate sequential IDs for the projects.
Bonus: Use sorted() to order the budgets from highest to lowest.
Expected result:
Years as numbers: [2021, 2022, 2020, 2023, 2022]
Total participants: 275
Largest group: 80
Smallest group: 35
Completed projects: 3
Assigned IDs: [101, 102, 103, 104, 105]
Budgets from highest to lowest: [8000, 6000, 5500, 5000, 4500]
⚠️ Try it first! Solve these exercises on your own. Practice is the best way to learn. The solution is below.
View solution:
# NGO Data
project_years = ["2021", "2022", "2020", "2023", "2022"]
participants = [45, 60, 35, 80, 55]
budgets = [5000, 6000, 4500, 8000, 5500]
statuses = [1, 1, 0, 1, 0] # 1 = Completed, 0 = In progress
# 1. Convert text years to numbers
# map(int, ...) applies the int() function to each element of project_years
# list() converts the map() result into a viewable list
numeric_years = list(map(int, project_years))
print("Years as numbers:", numeric_years)
# 2. Participant analysis
# sum() adds all numbers in the participants list
total_participants = sum(participants)
# max() finds the highest value in participants
max_participants = max(participants)
# min() finds the lowest value in participants
min_participants = min(participants)
print(f"Total participants: {total_participants}")
print(f"Largest group: {max_participants}")
print(f"Smallest group: {min_participants}")
# 3. Filter completed projects
# filter(None, statuses) keeps only true values (1s)
# list() converts the filter() result into a list
# len() counts how many completed projects there are
completed_projects = list(filter(None, statuses))
print(f"Completed projects: {len(completed_projects)}")
# 4. Generate sequential IDs
# range(101, 106) generates numbers from 101 to 105
# - We start at 101 for more professional IDs
# - We end at 106 because range() doesn't include the last number
project_ids = list(range(101, 106))
print("Assigned IDs:", project_ids)
# Bonus: Sort budgets from highest to lowest
# sorted() orders numbers from lowest to highest by default
# reverse=True inverts the order to get highest to lowest
sorted_budgets = sorted(budgets, reverse=True)
print("Budgets from highest to lowest:", sorted_budgets)
💡 Notice how:
map(int, list)
: Transformed each text year to a number, enabling numerical operations.sum(list)
: Calculated the total participants by adding all values.max(list)
andmin(list)
: Found the largest and smallest groups.filter(None, list)
: Selected only completed projects (value 1).range(start, end)
: Generated sequential IDs from 101 to 105.sorted(list, reverse=True)
: Ordered budgets from highest to lowest.list()
: Converted map(), filter(), and range() results into viewable lists.
Each function solved a specific task that would require multiple manual steps in Excel. Their combination enables efficient and reproducible data processing.
🎯 Conclusion
You've begun transforming your data processing. The 12 functions we've explored are equivalent to dozens of manual Excel operations:
📊 Instant Analysis
len()
,sum()
,max()
,min()
: Replace manual cell counting and =SUM(), =MAX(), =MIN() formulasBenefit: Descriptive statistics in seconds, without dragging formulas.
⏱️ Sequence Automation
range() and reversed()
replace manual series filling and column reordering.
Benefit: Automatic generation of years, IDs, and scales.
🔄 Efficient Organization
sorted()
,enumerate()
,zip()
replace manual sorting and column combining.Benefit: Data organization without copy and paste.
⚡ Mass Processing
map()
,filter()
replace cell-by-cell editing and manual filtering.Benefit: Data transformation and cleaning in seconds.
📚 Additional Resources
In this article, we've covered some of the most essential built-in functions for social scientists, but Python offers many more. You can explore the complete list in the official documentation: https://docs.python.org/3/library/functions.html
Some of these additional functions will be explored in upcoming articles, while others you can discover on your own based on your specific research needs.
🚀 Next Steps
In the next article, "Python Lists: The Fundamental Data Structure", you'll learn to:
Organize your data in flexible structures.
Efficiently manipulate datasets.
Combine these functions with more advanced operations.
Did you find this guide useful? Share it with other social researchers and leave us your comments about what other topics you'd like us to cover!