1. Handling missing keys with setdefault
import sys import re WORD_RE = re.compile('w+') index = {} print(sys.argv) # Example 3-2 with open(sys.argv[1], encoding='utf-8') as fp: for line_no, line in enumerate(fp, 1): for match in WORD_RE.finditer(line): # finditer 返回的格式: <_sre.SRE_Match object; span=(0, 4), match='User'> ; # 既有匹配到的内容,也有该内容的位置, match.start() 和 match.end()分别表起始位置和结束位置 word = match.group() # match.group() 返回匹配到的内容: 如 User column_no = match.start() + 1 location = (line_no, column_no) # 以下为常规写法: occurrences = index.get(word, []) occurrences.append(location) index[word] = occurrences for word in sorted(index, key=str.upper): # 对字典进行排序 print(word, index[word]) print("-----------------------") # Example 3-4:handling missing keys with setdefault index2 = {} with open(sys.argv[1], encoding='utf-8') as fp: for line_no, line in enumerate(fp, 1): for match in WORD_RE.finditer(line): word = match.group() column_no = match.start() + 1 occurrences = (line_no, column_no) # Missing keys with setdefault index2.setdefault(word, []).append(occurrences) # setdefault :有就用它原来的,没有则设置 # Get the list of occurrences for word, or set it to [] if not found; # setdefault returns the value, so it can be updated without requiring a second search. for word in sorted(index2, key=str.upper): print(word, index2[word]) # Output 示例: # flasgger [(3, 6), (4, 6)] # flask [(2, 6)] # Flask [(2, 19)] # from [(2, 1), (3, 1), (4, 1)] # import [(1, 1), (2, 12), (3, 15), (4, 21)] # jsonify [(2, 26)] # random [(1, 8)] # request [(2, 35)] # Swagger [(3, 22)] # swag_from [(4, 28)] # utils [(4, 15)] """ The result of this line ... my_dict.setdefault(key, []).append(new_value) ... is the same as running ... if key not in my_dict: my_dict[key] = [] my_dict[key].append(new_value) ... except that the latter code performs at least two searches for key --- three if not found --- while setdefault does it all with a single lookup. """
2. Mapping with Flexible Key Lookup
2.1 defaultdict: Another Take on Missing Keys
示例代码如下:
import re import sys import collections WORD_RE = re.compile('w+') index = collections.defaultdict(list) with open(sys.argv[1], encoding='utf-8') as fp: for line_no, line in enumerate(fp, 1): for match in WORD_RE.finditer(line): word = match.group() column_no = match.start() + 1 occurrences = (line_no, column_no) # defaultdict 示例: index[word].append(occurrences) for word in sorted(index, key=str.upper): print(word, index[word]) # Output: # flasgger [(3, 6), (4, 6)] # flask [(2, 6)] # Flask [(2, 19)] # from [(2, 1), (3, 1), (4, 1)] # import [(1, 1), (2, 12), (3, 15), (4, 21)] # jsonify [(2, 26)] # random [(1, 8)] # request [(2, 35)] # Swagger [(3, 22)] # swag_from [(4, 28)] # utils [(4, 15)] """ defaultdict: How defaultdict works: When instantiating a defaultdict, you provide a callable that is used to produce default value whenever __getitem__ is passed a nonexistent key argument. For example, given an empty defaultdict created as dd = defaultdict(list), if 'new_key' is not in dd, the expression dd['new_key'] does the following steps: 1. Call list() to create a new list. 2. Inserts the list into dd using 'new_key' as key. 3. Returns a reference to that list. The callable that produces the default values is held in an instance attribute called default_factory. If no default_factory is provided, the usual KeyError is raised for missing keys. The default_factory of a defaultdict is only invoked to provide default values for __getitem__ calls, and not for the other methods. For example, if dd is a defaultdict, and k is a missing key, dd[k] will call the default_factory to create a default value, but dd.get(k) still returns None. The mechanism that makes defaultdict work by calling default_factory is actually the __missing__ special method, a feature supported by all standard mapping. """
2.2 The __missing__ Method
示例代码如下:
""" StrKeyDict0 converts nonstring keys to str on lookup """ class StrKeyDict0(dict): def __missing__(self, key): if isinstance(key, str): # 如果没有这个判断,self[k] 在没有的情况下会无限递归调用 __missing__ raise KeyError(key) return self[str(key)] def get(self, key, default=None): """ The get method delegates to __getitem__ by using the self[key] notation; that gives the opportunity for our __missing__ to act. :param key: :param default: :return: """ try: return self[key] except KeyError: return default def __contains__(self, key): # 此时不能用 key in self (self 指 StrKeyDict0 的实例,就是一个字典)进行判断, # 因为 k in dict 也会调用 __contains__ ,所以会出现无限递归调用 __contains__ return key in self.keys() or str(key) in self.keys() # A better way to create a user-defined mapping type is to subclass collections.UserDict instead of dict. """ Underlying the way mappings deal with missing keys is the aptly named __missing__ method. This method is not defined in the base dict class, but dict is aware of it: if you subclass dict and provide a __missing__ method, the standard dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError. The __missing__ method is just called by __getitem__ (i.e., for the d[k] operator). The presence of a __missing__ method has no effect on the behavior of other methods that look up keys, such as get or __contains__ . """
小结: 对于字典中不存在的 key ,有三种方式进行处理: 1. setdefault 2. collections.defaultdict 3. __missing__ 方法
3. Variations of dict: UserDict
UserDict is designed to be subclassed.
示例代码:
""" convert non-string keys to str -- on insertion, update and lookup """ import collections class StrKeyDict(collections.UserDict): def __missing__(self, key): if isinstance(key, str): raise KeyError(key) return self[str(key)] def __contains__(self, key): # self.data : UserDict 并不继承 dict,但它内部有一个 dict 的实例,叫 data, 这个 data 保存着 UserDict 实例的真正数据 return str(key) in self.data def __setitem__(self, key, value): # UserDict 实例中的数据存放在 data 属性中 # This method is easier to overwrite when we can delegate to the self.data attribute. self.data[str(key)] = value """ It's almost always easier to create a new mapping type by extending UserDict rather than dict. The main reason is that the built-in has some implementation shortcuts that end up forcing us to override methods that we can just inherit from UserDict with no problem. UserDict does not inherit from dict, but has an internal dict instance, call data, which holds the actual items. This avoids undesired recursion when coding special methods like __setitem__ , and simplify the coding of __contains__ . """
4. Immutable Mappings
示例代码如下:
>>> from types import MappingProxyType >>> >>> d = {1: 'A'} >>> d_proxy = MappingProxyType(d) >>> d_proxy mappingproxy({1: 'A'}) >>> d_proxy[1] # Items in d can be seen through d_proxy 'A' >>> d_proxy[2] = 'x' # Changes cannot be made through d_proxy Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'mappingproxy' object does not support item assignment >>> d[2] = 'B' >>> d_proxy # d_proxy is dynamic: any changes in d is reflected. mappingproxy({1: 'A', 2: 'B'}) >>> """ The mapping types provided by the standard library are all mutable, but you may need to guarantee that a user cannot change a mapping by mistake. Since Python3.3, the types module provides a wrapper class called MappingProxyType, which, given a mapping, returns a mappingproxy instance that is a read-only but dynamic view of the original mapping. So updates to the original mapping can be seen in the mappingproxy, but changes cannot be made through it. """
end