Journal

[expand title=”Open┬áJournal” swaptitle=”Close Journal” swapalt=”Close Journal”]

Week of March 23, 2017
Will, Felix, and I have begun working on the RFID authorization system. To start this, Will and I have setup and dedicated a computer to be used with this. We needed to install the MySQL database software on it, but there were a few prerequisites for the software. These included, .NET 4.5.1, Microsoft Visual C++ 2013 Redistributable, and Microsoft Visual C++ 2015 Redistributable. So far, MySQL itself hasn’t been installed, but we have gotten the prerequisites installed. I’ve also begun to take a look at a few arduino code examples involving the RFID hardware and the Powerswitch Tail.

Week of February 16, 2017
While waiting for the NAO robot to be repaired, I’ve begun to look for something to do in the meantime. I recently read an article about a database of 200,000 tweets by paid Russian agents attempting to influence the election, which got me thinking about using the database to train a machine learning program to identify future potential tweets by Russian agents. To accomplish this, I recognized that each tweet exists in a binary and either is or isn’t created by Russian agents, and so I began researching binary classification models. I’m still doing a lot of reading on this, but I’ve determined that after being analyzed by the trained program, each tweet should have a percentage likelihood of being created by a paid Russian agent.

Week of February 9, 2017
After trying to factory reset the NAO, we’ve determined that it needs to be repaired, and we will have to send it in.

Week of February 2, 2017
This week, Will and I got started with the NAO Robot, Lucy. We discovered that its user partition was corrupted, which was the potential cause for the long boot times. We have decided to factory reset the robot, and we worked on factory resetting the NAO Robot, Lucy. It took some time to flash the USB drive with a copy of naoqi, the OS that NAO runs on, and we will finish the factory reset process next week.

Week of January 26, 2017
Today we researched how to get started with the NAO Robot. We ran into some problems booting up the robot, and we are currently researching a solution to this. Right now, the robot remains in the startup phase for an extended period of time, remaining unresponsive.

Week of January 19, 2017
This week I missed classes due to soccer games.

Week of January 12, 2017
Will and I have taken a look into working with the NAO Robot. We are thinking of using it to do something related to understanding and reacting to human emotion, and teaching it this through machine learning.

Week of December 8, 2017
Will and I have decided to solely focus on researching the libraries used for AI and ML. One of the first libraries I looked at for this machine learning project was scikit-learn. Some of the machine learning models it supports includes the following: classification, regression, and clustering. We were previously planning on using its classification capabilities for our project, but that is true no more. Another popular use of classification is with image recognition and analysis. Regression analysis is used to make predictions based on a continuous relationship in past data, and clustering is used to group data into “clusters.” With Bitcoin’s recent spectacular rise in value, another potential idea Will thought of was to make predictions for its price. In scikit-learn or any other library, we would use regression analysis for this.

Week of December 1, 2017
We continued to brainstorm a potential solution to our problem or a different project idea. One of the solutions we thought up for our problem was to find only tweets that had emojis and assign certain emojis to certain emotions and use this to determine the overall emotion of the tweet. We’ve also looked at moving away from Twitter and looked at using different datasets (https://www.kaggle.com/datasets). As of now, we may decide to do more research to understand the full capabilities of Machine Learning and to know what we can actually accomplish.

Week of November 10, 2017
This week, after receiving some concern and advice from Joe, Will and I decided to take another look at our methodology of determining the emotion of tweets. Up until now, we have been creating our dataset by using our subjective interpretations of the tweets’ intended emotion. We only kept the tweets that we had consensus on, and this was done blindly. We thought this would be fine as long as we acknowledged the subjective nature, but we would like to now either create an alternate methodology or pivot our project into another direction. We’ve begun to look at other social media platforms and how they could be mined for data to discover what makes a social media post popular. We’ve also thought about interpreting and analyzing the usage emojis in social media posts. There is still more brainstorming that needs to be done, which we will continue next week.

Week of October 27, 2017
Today has been another day of identifying the emotions in tweets and feeding them into our database. This should be over by next week, where I will make the machine learning program that processes our data.

Week of October 20, 2017
This week I created a Python script to load the database with one thousand tweets scraped from twitter. After that, I created a script to rate the tweets’ emotions within the database. I sent this over to Will and we began rating these tweets. Over the next few weeks, we will continue rating these tweets.

Below is the source code for the program that displays the tweets to a user and prompts them for an emotion.

import pymysql
import pymysql.cursors
import sys

emotions = {'love':1, 'joy':2, 'surprise':3, 'anger':4, 'sadness':5, 'fear':6}
connection = pymysql.connect(host='127.0.0.1', user='root', password='meme',db='twitter', charset='utf8mb4', cursorclass=pymysql.cursors.DictCursor)

try:
    while True:
        with connection.cursor() as cursor:
            ratedDataSQL = "SELECT `tweet` FROM `data` WHERE `rated_by_ferdi`=0 limit 5;"
            cursor.execute(ratedDataSQL)
            tweetText = cursor.fetchone()['tweet']
            print 'Love (1), Joy (2), Surprise (3), Anger (4), Sadness (5), Fear (6)'
            emotion = raw_input(tweetText.encode('utf-8')+':\n').lower()
            if '1' in emotion or 'love' in emotion:
                emotion = emotions['love']
            elif '2' in emotion or 'joy' in emotion:
                emotion = emotions['joy']
            elif '3' in emotion or 'surprise' in emotion:
                emotion = emotions['surprise']
            elif '4' in emotion or 'anger' in emotion:
                emotion = emotions['anger']
            elif '5' in emotion or 'sadness' in emotion:
                emotion = emotions['sadness']
            elif '6' in emotion or 'fear' in emotion:
                emotion = emotions['fear']
            else:
                print 'error'
                sys.exit()
            insertTweetSQL = 'UPDATE `data` SET `emotion_ferdi` = %i, `rated_by_ferdi` = 1 WHERE `tweet` = "%s"' % (emotion, tweetText)
            cursor.execute(insertTweetSQL)
            connection.commit()




except KeyboardInterrupt:
	connection.close()
	sys.exit()

Below is the source code for the program that mines the tweets from twitter.

import pymysql
import pymysql.cursors
from twitterscraper import query_tweets

keywords = ['nfl', 'ripaim']

try:
    with connection.cursor() as cursor:
        ratedDataSQL = "SELECT `tweet` FROM `data` WHERE `rated_by_ferdi`=0"
        cursor.execute(ratedDataSQL)
        print cursor.fetchone()

finally:
    connection.close()
for keyword in keywords:
    for tweet in query_tweets(keyword, 10)[:10]:
        tweetText = tweet.text.encode('utf-8')+'\n'
        if 'pic.twitter' not in tweetText and '' not in tweetText and '' not in tweetText:
            print tweetText

 

Week of October 6, 2017
I finally setup a MySQL database for our dataset and have begun to populate it with an assortment of tweets. I’ve also started to work on the program to begin manually determining the emotion for the tweets for the dataset.

Week of September 22, 2017
This week, Will finished the methodology for categorizing each tweet, and I’ve begun to write the program to create the dataset. I will be using a Python implementation of MySQL to create a database to store the tweets and their assigned emotions in. Here is a breakdown of how I envision this process:

1. Tweets matching a certain keyword will be scraped from twitter and added to the database.
2. Will or myself opens the secondary program, and tweets that have not been classified will enter a queue to be classified.
3. The tweet will then be sent back to the database with the updated emotion and a note that says it has been classified by one of us.
4. Tweets that we have assigned different emotions will be discarded to ensure the emotions we classify them with are accurate (each tweet needs to be looked at by both of us).

Once our dataset is created, we will be able to create and train our machine learning program.

Week of September 15, 2017
This week, Will and I completed our whitepaper and put it on the website. Individually, I continued to familiarize myself with the scikit-learn library, and I will be utilizing the supervised learning and classification capabilities of the library. Next week, the methodology should be complete, and, from there, I will create a program to begin creating the dataset.

Week of September 8, 2017
This week Will P. and I continued our research, and we began to work on writing the white paper. We’ve found the 6 emotions that we will use to classify tweets. We will continue to work on developing the methodology we will use to classify these tweets, making sure that it is done in an effective way. I also did some more research with regards to┬ádeveloping the program. For now, I’ve narrowed down the main python library we will use for the machine learning (scikit-learn).

Week of August 28, 2017
This week Will P. and I were brainstorming, and we came up with the idea of detecting emotion in tweets using machine learning. We developed a rough timeline of what our next steps will be. It includes further research on the subject, development of the core program, the creation of our dataset, and potentially a browser extension. For now, we will continue researching the topic and begin to write a white paper.

[/expand]