Data Mining: 03/01/2012

One of the many areas of property and casualty risk analysis is the effective creation of rating territories based on some measure of risk. In Canada and the U.S., geographical areas are grouped into territories based on claim risk. For example, postal codes in Canada and the comparable equivalent of zip codes in the

U.S.

are grouped into rating territories.

As the use of analytics continues to adopt the more scientific or mathematical techniques, insurance organizations would be remiss in not attempting to deploy these disciplines. However, before even adopting these disciplines, the typical rigors in developing an analytics solutions still need to be conducted . This means that the information or analytical file needs to contain the right data before even using any of the more advanced techniques. In properly assessing risk, it is self-evident that any rating assessment should reflect the most recent information. Under this scenario, the determination of a proper post period is critical in the development of this solution. For instance, do we have enough data to look at the last 12 months or do we need to broaden the period to 24 or 36 months. For smaller organizations, this can be a significant factor as analysts need to have enough claims in order to build an effective solution. More claims for analysis in theory can translate to more robust models.

Another factor in this determination may be understanding the cyclicality of claims within a given business. AB claims can often take more than 12 months to determine their exact payout. If loss cost is the key determinant rather than claim frequency, one would want to look at a longer timeframe beyond 12 months in creating rating territories particularly if AB is a significant driver of overall loss cost. The compromise here is that we are sacrificing recency of information versus information that is more reflective of the specific business conditions of the company.

Besides this determination of the proper post period in analyzing claims, one needs to consider information that is considered extraneous to the solution. Often this so-called ‘extraneous’ information refers to claims that are classified as catastrophic. The inclusion of auto claims that are due to a hailstorm as part of a solution will introduce information that is not really indicative of any behavioural tendencies. Instead these tendencies are solely due to ’Acts of God’. Yet, it is behavioural tendencies that we want to better understand and more importantly to action in future business initiatives. Grouping of geographic areas around behavioural tendencies yields consistency when projecting this learning going forward. ‘Acts of God represent noise or variation in the data that should not be considered in any analysis. Accordingly, these kind of observations should be excluded from any insurance risk analysis whether it be the development of pricing models or the determination of new rating territories.

Once this analytical file is created, the ‘analysis’ portion of this exercise can commence. The information in such a file needs to contain only two fields of information with one field containing loss cost or loss frequency while the other field contains the specific geographic area.

One creative approach is to use a technique that is traditionally used to develop models. CHAID which stands for Chi-Square Automatic Interaction Detection represents a non parametric approach to building models. Essentially, the solution is a decision tree whereby the nodes or branches represent segments comprised of business rules. Comparing the traditional parametric modeling such as equations with variables and weights assigned to each variable(parametric approach), CHAID’s solution of segments and business rules(non parametric) represent the model solution. Nodes or segments can be ranked sequentially from the highest loss rate node to the lowest loss rate node and policies can selected in this manner. In most scenarios,though, there will be more than 4 nodes which allows for more granularity regarding the selection of policies.

However, one real significant advantage of CHAID is its data exploration capability. Through CHAID, certain character type variables can be grouped into meaningful segments based on some objective function. One obvious application is the grouping of character type data such as geographic areas into meaningful segments or territories. Here the node output represents groups or segments as determined by the CHAID routine. Loss rate ,though, is the key metrics which drives this statistical output.

Certainly, the use of CHAID as a tool for territory assignment is not to imply that what was done in the past was incorrect. Due diligence has always been exercised by combining both the analysis of historical loss rate data as well as the domain knowledge of the actuary. But rather than discard what has been used in the past, we simply have another option which is more mathematically based on how geographic areas are assigned into territories.

Data Mining

Saturday, March 31, 2012

Using Analytics to create Rating Territories