Taught: 2024-2025
EPF, Paris-Cachan, France
This course introduces the foundational concepts and techniques of Data Analysis, focusing on the practical skills needed to explore, interpret, and communicate insights from data. Students will learn how to handle real-world datasets, clean and preprocess data, and apply statistical and computational methods to uncover patterns and trends. The course emphasizes the use of data visualization and reporting tools to present results effectively, enabling students to make data-driven decisions in various domains, such as business, healthcare, and social sciences.
By the end of the course, students will have a solid understanding of the data analysis pipeline and the ability to apply their knowledge to real-world problems using Python and industry-standard libraries.
Key Learning Outcomes:
- Understand the end-to-end process of data analysis, from collection to reporting.
- Perform data cleaning, transformation, and preprocessing.
- Apply exploratory data analysis (EDA) techniques to uncover insights.
- Use statistical methods to test hypotheses and validate results.
- Create effective data visualizations and dashboards to communicate findings.
- Gain proficiency in Python and popular data analysis libraries.
Course Topics:
Introduction to Data Analysis
- Overview of the data analysis lifecycle
- Importance of data-driven decision-making
Data Collection and Preparation
- Types of data: structured, unstructured, and semi-structured
- Data cleaning: handling missing, inconsistent, and duplicate data
- Data transformation: scaling, normalization, and encoding
Exploratory Data Analysis (EDA)
- Descriptive statistics (mean, median, standard deviation, etc.)
- Identifying trends, patterns, and anomalies
- Data visualization: histograms, scatterplots, box plots, etc.
Statistical Analysis
- Probability distributions and their applications
- Hypothesis testing (e.g., t-tests, chi-square tests)
- Correlation and regression analysis
Data Visualization and Communication
- Principles of effective visualization
- Tools: matplotlib, seaborn, and Plotly
- Designing dashboards and reports
Practical Applications
- Case studies in business, finance, healthcare, and social sciences
- Interpreting results and deriving actionable insights
Advanced Topics (Optional)
- Introduction to time-series analysis
- Basics of machine learning for data analysis
Course Format:
- Lectures: Introduction to theoretical concepts and methods.
- Practical Labs: Hands-on sessions with Python and data analysis libraries.
- Assignments: Data cleaning, visualization, and hypothesis testing tasks.
- Final Project: Analyze a dataset and present findings through visualizations and reports.
Prerequisites:
- Basic programming knowledge (preferably Python).
- Familiarity with high school-level mathematics, including algebra and basic statistics.