Saturday, 7 September 2013

Get unique count sorted

Get unique count sorted

I have bunch of files in the following format:
15
17
18
21
14
18
14
13
17
11
11
18
15
15
12
17
9
10
12
17
14
17
etc
The following scripts reads those files:
import os
from collections import Counter
def main():
p = './newR'
fd = os.listdir(p)
countUniq(p, fd)
def writeFile(fd, fhp, fcount):
fo = './nnewR/'+fd+'.txt'
with open(fo, 'a') as f:
r = '%s %s\n' % (fhp, fcount)
f.write(r)
def countUniq(path, dirs):
for pfiles in dirs:
pathN = os.path.join(path, pfiles)
with open(pathN, 'r') as infile:
# infile = open(pathN, 'r')
data = infile.read()
fileN = os.path.basename(pathN)
stripFN = os.path.splitext(fileN)[0]
fDate = stripFN.split('_')[0]
countr = Counter()
countr.update([int(d) for d in data.split()])
for line, count in countr.items():
writeFile(fDate, line, count)
main()
This outputs the following files:
20130813.txt
20130819.txt
20130825.txt
20130831.txt
etc
Lets have a look at the first file to test if it does the job:
51 4
9 4
10 36
11 48
12 132
13 144
14 148
15 133
16 52
17 105
18 61
19 20
20 12
21 16
22 20
23 8
This is strange, why does it not start with the smallest number like 9 but
instead it does with 51!!
Another file if I randomly check:
28 4
9 20
10 122
11 136
12 298
13 302
14 397
15 314
16 218
17 264
18 148
19 93
20 32
21 49
22 16
23 13
24 8
25 4
60 4
Again it doesn't start with the smallest number, this is wrong output. I'm
suspecting it has to do with the loop when reading the file or something
which I'm not sure about as I have been stuck on this point for a while.
I could really use some input here.

No comments:

Post a Comment