Write some Software

Este projecto fue adjudicado a SabidHabib por $80 USD.

Obtén cotizaciones gratis para un proyecto como este
Empleador trabajando
Presupuesto de Proyecto
$10 - $30 USD
Ofertas Totales
Descripción del Proyecto

There are three types of data for assignment. The same data will be used for project. The project will be a group project. The assignments are individual assignments.

Dataset 1: This data set is taken from the UCI- Public data set.

This data contains labelled Cellphone Spam and Good messages. This dataset is well described. Please read the description before you start working on it.

Link to download dataset: [url removed, login to view]+Spam+Collection

Dataset 2: This data set is taken from the UCI- Public data set.

It contains sentences labelled with positive or negative sentiment, extracted from reviews of products, movies, and restaurants. Please read the description before you start working with it.

Link to download dataset: [url removed, login to view]+Labelled+Sentences

Dataset 3: Wikipedia data!

I chose the category CLASSIFICATION_Algorithms. It has 3 categories listed under it: Artificial Intelligence, Decision Tree, Ensemble Learning. We will use these categories as class labels. From each one these categories sample 14 pages. Do not sample pages under CLASSIFICATION_Algorithms!! Use these pages for assignments and projects.

Task Description:


There are two sets of Wikipedia articles. The first set is from Wikipedia featured articles of a certain type. The first set becomes class Featured. The second set of articles are wikipedia (non-featured) articles of similar type to featured articles. The second set becomes class Non-Featured. We are dealing with a binary classification problem.

To create attributes, extract all possible tokens from the entire dataset after stemming and stop-word removal. Create 1-gram, 2-gram and 3-grams from these tokens. Use these n-grams as the attributes for ARFF files.

Perform attribute selection on each of 1-gram, 2gram, 3-gram an using information gain and gain ratio. Perform classification using decision tree, and naïve Bayes.

Make a Wiki report on your finding including various statistical evaluation measures given by WEKA for each classifier.

Link: Classification_algorithms: [url removed, login to view]:Classification_algorithms

Link: Artificial Intelligence: [url removed, login to view]:Artificial_neural_networks

Link: Decision Tree: [url removed, login to view]:Decision_trees

Link: Ensemble Learning: [url removed, login to view]:Ensemble_learning


Stemming and Stop-Word removal: You can use NLTK!!

Stemming: Convert to root word. Running-->.Run

Stop words: High frequency but low meaning

[url removed, login to view]

Buscando hacer algo de dinero?

  • Establece tu presupuesto y período de tiempo
  • Describe tu propuesta
  • Consigue pago por tu trabajo

Contrata Freelancers que también oferten en este proyecto

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online