本次有两个编程问题,一个是求两个数的和满足一定值的数目,另一个是求中位数。
2SUM问题
问题描述
The goal of this problem is to implement a variant of the 2-SUM algorithm (covered in the Week 6 lecture on hash table applications). The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array. Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t. (NOTE: ensuring distinctness requires a one-line addition to the algorithm from lecture.)
解题方法:
数据大小为1000000,对每个数都要循环一次,每个数找出匹配的y值。后面这一步是关键所在。如果使用hash把这么多数按照大小分成长度为2^15的数据段,则对于每个x只需遍历两个数据段即可,而数据是稀疏的,每个数据段之中可能只有一到两个值,这样算法复杂度就是O(N)。
具体实现如下:
from time import clock start=clock() def myhash(val): return val>>15 f=open('algo1-programming_prob-2sum.txt','r') valnew=[True for x in range(6103503)] tlist=[0 for x in range(-10000,10000+1)] tmp=f.read() f.close() print('read complete') vallist=[int(val) for val in tmp.split()] vallist=set(vallist) print('convert to set@int complete') minval=min(vallist) for val in vallist: val_key=myhash(val-minval) if valnew[val_key]==True: valnew[val_key]=[val] else: valnew[val_key].append(val) print('hash complete',len(valnew),len(vallist)) for val in vallist: firkey=myhash(-10000-val-minval) seckey=myhash(10000-val-minval) if firkey<len(valnew): if valnew[firkey]!=True: for tmp in valnew[firkey]: if tmp+val in range(-10000,10000+1): tlist[tmp+val+10000]=1 if firkey<len(valnew): if valnew[seckey]!=True: for tmp in valnew[seckey]: if tmp+val in range(-10000,10000+1): tlist[tmp+val+10000]=1 print('output: ',sum(tlist)) finish=clock() print finish-start ##read complete ##convert to set@int complete ##('hash complete', 6103503, 999752) ##('output: ', ***) ##480.193410146 ##user@hn:~/pyscripts$ python 2sum_hash.py ##read complete ##convert to set@int complete ##('hash complete', 6103503, 999752) ##('output: ', ***) ##183.92
在win32系统下用了480s,但debian下面只需要180s。论坛有人达到0.53s,我改进的空间还很大。
中位数问题
问题描述:
The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then mk is the (k/2)th smallest number among x1,…,xk.) In the box below you should type the sum of these 10000 medians, modulo 10000 (i.e., only the last 4 digits). That is, you should compute (m1+m2+m3+⋯+m10000)mod10000.
这个题除了对于每个新数组进行排序取中位数的方法外,可以采用两个heap快速的完成运算。在数据不断到来的过程中,要不断维护两个heap,使两个heap的size差不大于1,一个是最小堆,而另一个是最大堆,分别存放现有数据中较大和较小的half。
Python中只有heapq提供了最小堆,但可以对于值取反得到最大堆。
这次我实现了两种算法,速度差距很明显。实现算法:
from time import clock from heapq import heappush,heappop start=clock() f=open('Median.txt','r') tmp=f.read() f.close() data=[int(val) for val in tmp.split()] out=[0 for x in range(len(data))] #rudeway with high complexity #17s running time def rudeway(data,out): for ind in range(len(data)): b=data[0:ind+1] b.sort() out.append(b[(len(b)+1)/2-1]) return sum(out)%10000 #print(rudeway(data,out)) #use heapq, minus(min heap)=max heap #0.231407100855s def heapway(data,out): lheap=[] rheap=[] out[0]=data[0] tmp=sorted(data[0:2]) out[1]=tmp[0] heappush(lheap,-tmp[0]) heappush(rheap,tmp[1]) for ind in range(2,len(data)): if data[ind]>rheap[0]: heappush(rheap,data[ind]) else: heappush(lheap,-data[ind]) if len(rheap)>len(lheap): heappush(lheap,-heappop(rheap)) if len(lheap)>len(rheap)+1: heappush(rheap,-heappop(lheap)) out[ind]=-lheap[0] return sum(out)%10000 print(heapway(data,out)) finish=clock() print finish-start