python - how to properly loop through two files comparing strings in both files against each other -


i having trouble doing sentiment analysis of tweets (file 1, standard twitter json response) against list of words (file 2, tab delimited, 2 columns) sentiment assigned them (either positive or negative).

the problem is: top loop running once , script ends while looping through file 1 nested within looping through file 2 , trying compare , keep running sum of combined sentiment each tweet.

so have:

def get_sentiments(tweet_file, sentiment_file):       sent_score = 0     line in tweet_file:          document = json.loads(line)         tweets = document.get('text')          if tweets != none:             tweet = str(tweets.encode('utf-8'))              #print tweet               z in sentiment_file:                 line = z.split('\t')                 word = line[0].strip()                 score = int(line[1].rstrip('\n').strip())                  #print score                    if word in tweet:                     print "+++++++++++++++++++++++++++++++++++++++"                     print word, tweet                     sent_score += score                print "====", sent_score, "====="      #problem, it's doing first tweet  file1 = open(tweetsfile.txt) file2 = open(sentimentfile.txt)   get_sentiments(file1, file2) 

i've spent better half of day trying figure out why prints out tweets without nested loop file2, it, processes first tweet exits.

the reason doing once loop has reached end of file, stops since there no more lines read.

in other words, first time loop runs, steps through entire file, , since there no more lines read (since reached end of file), doesn't loop again, resulting in 1 line being processed.

so 1 way solve "rewind" file, can seek method of file object.

if files aren't big, approach read them list or similar structure , loop through it.

however, since sentiment score simple lookup, best approach build dictionary sentiment scores, lookup each word in dictionary calculate overall sentiment of tweet:

import csv import json  scores = {}  # empty dictionary store scores each word  open('sentimentfile.txt') f:     reader = csv.reader(f, delimiter='\t')     row in reader:         scores[row[0].strip()] = int(row[1].strip())    open('tweetsfile.txt') f:     line in f:         tweet = json.loads(line)         text = tweet.get('text','').encode('utf-8')         if text:             total_sentiment = sum(scores.get(word,0) word in text.split())             print("{}: {}".format(text,score)) 

the with statement automatically closes file handlers. using csv module read file (it works tab delimited files well).

this line calculation:

total_sentiment = sum(scores.get(word,0) word in text.split()) 

it shorter way write loop:

tweet_score = [] word in text.split():     if word in scores:         tweet_score[word] = scores[word]  total_score = sum(tweet_score) 

the get method of dictionaries takes second optional argument return custom value when key cannot found; if omit second argument, return none. in loop using return 0 if word has no score.


Comments