In this post we'll see how to compute the mean of the max temperatures of every month for the city of Milan.
The temperature data is taken from http://archivio-meteo.distile.it/tabelle-dati-archivio-meteo/, but since the data are shown in tabular form, we had to sniff the HTTP conversation to see that the data come from this URL and are in JSON format.
Using Jackson, we could transform this JSON into a format simpler to use with Hadoop: CSV. The result of conversion is this:
01012000,-4.0,5.0 02012000,-5.0,5.1 03012000,-5.0,7.7 04012000,-3.0,9.7 ...
If you're curious to see how we transformed it, take a look at the source code.
Let's look at the mapper class for this job:
public static class MeanMapper extends Mapper<Object, Text, Text, SumCount> {
private final int DATE = 0;
private final int MIN = 1;
private final int MAX = 2;
private Map<Text, List<Double>> maxMap = new HashMap<>();
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
// gets the fields of the CSV line
String[] values = value.toString().split((","));
// defensive check
if (values.length != 3) {
return;
}
// gets date and max temperature
String date = values[DATE];
Text month = new Text(date.substring(2));
Double max = Double.parseDouble(values[MAX]);
// if not present, put this month into the map
if (!maxMap.containsKey(month)) {
maxMap.put(month, new ArrayList<Double>());
}
// adds the max temperature for this day to the list of temperatures
maxMap.get(month).add(max);
}
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
// loops over the months collected in the map() method
for