I decided to write an article outside amateur astronomy, which is the main scope of this blog. Why? Cause I have the means and I find it fun :)
In this short paper I describe the correlation between the visitor traffic the website radiocluj.ro received and the weather conditions recorded in the geographically relevant area. My goal was to measure how much weather conditions impact visitor behaviour. I found that the general hunch (good weather-less visitors, bad weather-more visitors) is not only true, but weather has a ~20% impact on traffic, depending on the definitions used. With further refinement the findings described below might be used in a predictive manner to maximize the impact of an article, however this is outside of the scope of this paper.
Intuition tells that weather impacts website traffic and does so depending on whether it is a weekday or a weekend. The scale of the effect is unknown.
Available and used data*
1) server log
All available visitor data comes from the server logs from august 2011 till march 2014. The data used is filtered to exclude the traffic generated by editors (the building has a fixed IP) and known web crawlers. Only about 47% of the traffic can be geographically traced back to a city or county since some ISPs provide this information (rds, upc) and some not or just provide Bucharest (romtelecom, orange, vodafone).
2) geographical info
Since Cluj is by far the most important source of site traffic (diagram) it is safe to assume that the weather conditions in Cluj county and Cluj-Napoca city are those that are relevant.
3) weather info
The Radio is a subscriber to and uses data provided by the National Meteorology Administration (ANM). This data is detailed (temperature, cloud cover, precipitation, air pressure, wind strength), and has a roughly hourly resolution. Data has been processed from two intervals, October 2011-February 2013 and June 2013-March 2014, so these two make up the studied days. Lack of data or data gap means the day is skipped. A total of 778 days have been processed. ANM data is copyrighted so I will not provide any further details about weather.
4) ignored data
The website in question is mainly a news portal so traffic also depends on public events which in turn might or might not have a connection to weather and may distort the data – I ignored this aspect. Holidays and similar events are also ignored whether they happen on weekdays or weekends, no matter the weather.
1) indoor days, outdoor days
Let us define two types of days: let’s call outdoor days the days thought to be favorable for outdoor activities (ie. getting away from the computer) and indoor days the rest. Favorable weather is a very subjective matter so I used a rather rudemintal and arbitrary definition for it. Consider the hours between 6 AM and 9 PM local time and Cluj-Napoca city. A day is an outdoor day if the sky tends to be more or less clear and there is no precipitation for about half of the interval and the temperature peaks above 15°C. This definition is as good as any.
2) pageviews not visits
Since unique visitor depends on the definition of a visit and the visitor tracking methods deployed, I considered pageviews which has no client side implications (like cookies, private browsing, ambiguity induced by NATs etc).
3) combining the above
This way each day has three attributes: number of page views, whether it is an outdoor day, and weekday/weekend.
Now let’s count the days, the indoor days, the outdoor days and calculate the avarage number of pageviews for each type of day. The ratio between these avarages shows how much effect weather has on site traffic and ultimately on visitor behaviour.
|nid||idpw||idpv/day||nod||odpw||odpv/day||Pageviews per day:
Weather has a rather significant impact on our website’s traffic which depending on the definitions used can reach about 20%. This is a significant number by any means.
Internal regulations of the radio forbid disclosure of sensitive data (like traffic) so I omit data as necessary. ANM meteorology data to which the Radio is a subscriber is copyrighted so I will not publish that either.