Project

The final project is designed to give you experience implementing a machine learning model on a real-world dataset—without me telling you what method to use. We want you to explore any creative or unconventional methods you can think of! This is a completely open-ended problem, and the most important thing I want to see is effort. So let’s go over the project details:

Topics

There are three project topics covering a range of machine learning tasks:

  1. Identify Chinese characters (computer vision) — You will be given images containing a mix of handwritten numeric digits and Chinese characters. Your task is to identify and draw bounding boxes around the Chinese characters.
  2. Informative tweets (sentiment analysis) — Based on the WNUT-2020 competition, you are given a dataset of tweets related to the COVID-19 pandemic. Your job is to identify which tweets are informative.
  3. Virtual bidding in electric markets (trend prediction) — Develop a classification algorithm and trading strategy to identify profitable trading opportunities.

Dataset

The datasets are hosted on GitHub here.

Kaggle Competition

To make things more exciting, we are hosting a competition on Kaggle! You will submit your best runs to Kaggle (details are in the project description files).
The top individual or team for each topic will receive a prize:

We hope this will encourage you to try new methods and explore the current state-of-the-art models in machine learning.
We will be verifying the winners and ensuring the submitted numbers come from models that follow the competition rules.
Important: Make sure to set a random seed so that your training is reproducible.

Teams

You may work in teams of up to three people and only need to complete one of the three project topics. There’s a variety to choose from, so pick the one that interests you most!

Deliverables

You must submit:

Due Dates

There are two important deadlines: