Projects


Technology Fundamentals for Business Analytics

For each of these solutions, all writing should be your own. The final delivered document will be inspected using the plagiarism tool available via Blackboard.

Kaggle Assignment 1 (Due on 10/23)

Public solutions/tutorials to Kaggle problems can be tremendous opportunities to learn data science. Use the Kaggle Scripts service to find working solutions for R and Python.

The goal of the first assignment is to understand data from an analytics problem as well as so that you see the overall process in action. Overall, you should assess 2 different datasets and 3 total solutions.

Solutions 1-2: Titantic Dataset

Solutions 1-2 should be for the Titanic, and you should be able to get a prediction for each. One should be Python and one R and it should not be the simple naive/gender solutions we did in class You should compare differences in the solutions (in terms of performance) so both solutions must work.

(1) Overview of Titanic Analytics Problem and Data. You could provide descriptions of key fields.

(2) Solutions. Provide an overall summary of the approach including which features where created, how the data was sampled, and the model used for the prediction. Going line by line through the solutions and commenting what they are doing is useful. You should do background research on models used and provide detailed line by line comments as are in the labs.

(3) Compare the predictive performance of the R solution vs the Python solution.

Solution 3: Choose a solution to a problem from an active or closed Kaggle competition.

(4) Overview of Analytics Problem and Data (1 page).

(5) Provide overview of solution/approach (as above this should include line by line comments on the code).

(6) Prepare 3 minute presentation on dataset and solution. 1 slide problem. 1 slide data. 1 slide solution.

Kaggle Assignment 2 (Due on 11/30)

For the second Kaggle assignment we are all going to enter the What’s Cooking contest. This contest provides an overview of a variety of item.

For this assignment, I’d like you to provide the following:

(1) Introduction and overview of the data.

(2) A series of blog style entries including your code and the different ways that you have processed the data to developed features or identified different models. You can base your model on an existing dataset but you should be working here on your own to enhance and improve using new methods. As part of the assignment, you should have 3 entries into Kaggle (as a minimum). You should include evidence of this and indicate the performance of each model.

(3) You should attempt to develop a table that clearly represents the results from the 3 submissions and summarizes what was tried for each model.

Kaggle Assignment 2 Advanced Option (Due on 11/30)