April 25, 2008

Predictive versus Descriptive Modeling: some points to consider

This is a fantastic article which I think very clearly describes the difference between descriptive and predictive analytics; I often find these terms blurred and blended very casually when discussing our work.

As the article suggests, understanding the difference along with the appropriate applications is fundamental to any good analytics shop. I personally believe the author is a little too critical on historically based projections and forecasts (basic descriptive analytics), but does raise some important limitations, including resource scarcity (the infamous pipeline), economic influences, and even potential competitors.

Woods also suggests productive applications of descriptive performance metrics such as “identifying broken systems” (perhaps a gift officer portfolio analysis). Many of us invest a great amount of effort in building complex and nuanced predictive models. I find it useful (and sometimes efficient) to conduct some descriptive models (average growth rate formulas, logarithmic projections) at the same time to get a wide analytics perspective. You may surprise yourself with what you might find, or discover something is missing…

Many organizations use historical analytics data as a basis for forecasting future growth, and establishing performance goals and budgets. This applicaton for analytics data can blur the distinction between predictive and descriptive data. Understanding this difference is critical to an effective analytics program. It generally falls to the analytics professional to ensure that the difference is clearly understood within the organization.

I'm going to start out with a couple of definitions. What do I mean when I say predictive versus descriptive modeling?

Read More

Labels: , ,

January 18, 2008

Segmentation and Shakespeare

Interesting news release out of Stratford England—The Royal Shakespeare Company has developed a successful partnership with an American analytics firm to successfully segment their database to identify and engage different ticketing behavior.

DonorCast has been moving into the ticketing side of predicting modeling and this technique looks promising given adequate data (isn’t that always the case though…)

The Two-Step Cluster feature in SPSS is very powerful—our practice has only just touched the surface of application possibilities. This technique can be used as a finishing “sorting” of records, or can do a sort based on key variables pre-modeling (it can handle both categorical and continuous variables).

I will find some more relevant articles to share about clustering and segmentation techniques in the next edition. In the meantime, play around with this SPSS feature and consider how it might be applied in your work.

Advanced Analytics Move Centre Stage at the Royal Shakespeare Company

SAN FRANCISCO & LONDON--(
BUSINESS WIRE)--Analytics software from KXEN is helping boost audiences at Royal Shakespeare Company (RSC) productions in a pioneering arts marketing move. The initiative, an Accenture-led program to segment audiences, has seen a 50% rise in ticket buyers at RSC's Stratford-upon-Avon theatre, a more than 70% increase in regular attendees and significantly earlier sell-outs for London bookings.

Read More

Labels: ,

November 12, 2007

(Re)-emerging strategies for the “narrative” or “unstructured data” problem.

This article discusses a re-emerging field in predictive analytics called Text Analytics. I say re-emerging, because as the author points out, narrative analysis was a cornerstone of the earliest business intelligence strategies. Today this concept may have utility especially when combined with segmentation or donor-targeting strategies. From prospect management report sheets, phone-a-thon caller logs, to the infamous “other” box on a simple survey question, Text Analytics can provide opportunities for more nuanced insight into the “narrative” data we do have—as well as applications to quantitative models we construct.

One of the fundamental problems of using mathematics to analyze human behavior is the unstructured, or as I like to call it, “narrative” data problem. The amount of purely numerical or quantifiable information available to those in the predictive analytics field is limited—and what this quantifiable information available can tell you is variable as well. I consider non-profit or fundraising analytics to be more opaque than for-profit sectors in respect to this reality. Individuals, on a basic level, need to purchase goods and services. Therefore intent and preference are more transparent. In for-profits, purchasing a product can imply a variety of affinity relationships; this product is a necessity, I prefer this product to other similar products, etc.

Philanthropic giving, monetary or in-kind, is less clear in respect to quantifiable variables producing specific affinity. Attitudes towards institutions or missions may often be more personal than the type of soap you buy, so a donation may imply high affinity. The source of affinity however, can differ greatly: I am an alumnus, my child was a patient, the institution is important to the community, I like the sports teams, etc. Also the absence of immediately available options (there are no supermarkets to choose between charitable organizations) makes comparisons difficult as well. Giving data, capacity rating, alumni classification are all quantifiable values, but some more “narrative” fields like the basic question, “why is giving to us important to you” are more complex.

While the technology for Text Analysis may be more complex and costly than many organizations care to absorb, I believe this represents a very exciting frontier; making predictive modeling more accurate, dynamic, and relevant.

Text analytics is a new IT discipline that has already proved itself in applications ranging from pharmaceutical drug discovery to counter-terrorism to survey analysis, in science, government, and industry. It is poised to break out into the broader analytics market, in workbench form, integrated with business intelligence solutions, embedded in line-of-business applications, and enabling semantic search.

Text analytics is an answer to the “unstructured data” problem, which is best expressed by the truism that eighty percent of enterprise information originates and is locked in “unstructured” form. That problem has been recognized for decades. In fact, the first definition of business intelligence (BI) itself, in an October 1958 IBM Journal article by H.P. Luhn, A Business Intelligence System, describes a system that will:

“…utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the ‘action points’ in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points.”

So we see that the earliest BI focus was on text – on extraction, categorization, and classification rather than on numerical data!


Read More

Labels: ,

April 26, 2007

Basic Data Mining Definitions

I am often asked to define some of the commonly used analytics terms. I have heard "data mining" used to refer to things from screening, to sorting names in excel, to querying from data marts. Below are a handful of some very common terms and how I define them.

Analytics
A broad set of mathematical tools used to reveal trends and patterns and harvest additional value from existing data. Analytics departments generally have the following services:
  • Descriptive Analytics: Analyzing constituencies to understand core segments according to behaviors and demographics. Also, analyzing programs to understand performance and the key factors and metrics impacting this performance.
  • Predictive Analytics: Using internal and/or external data to predict behaviors and segment constituents according to probabilities.
  • Decision logic / Decision Support: Metrics-based forecasting and simulation studies to determine database potential, capacity or philanthropic potential of constituent segments, and investment priorities.

Data Mining
Finding useful information by identifying patterns and trends within data--typically in large databases. Often this statistical pattern recognition is married with predictive analytics to produce predictive models.

Predictive Modeling
An outcome of predictive analytics, predictive models are formulas producing probability scores predicting future behaviors. Typically, these are built using statistical tools such as regression analysis, decision trees, and neural networks.

Labels:

March 15, 2007

Lifetime Value and Fundraising

Most prospecting strategies concentrate on the capacity rating. This is generally an amount a person can give in an ideal scenario if your organization is their top philanthropic priority. This amount is generally compared to gift officer yield rates and/or target ask amounts to project portfolio performance.

In the for-profit arena, lifetime value is the preferred metric of customer rating. From an annual giving perspective, it makes sense to consider lifetime value in segmentation. However, if an annual giving directors only goal is the participation rate, are they likely to risk overall participation for the sake of high lifetime value acquisition? It is likely preferable for the big picture.

Customer lifetime value is a way of measuring how much your customers are worth to you, over the length of time that they remain your customers. The lifetime for customers will vary from industry to industry, and from brand to brand. The lifetime of customers should come to an end when their contribution ceases to be profitable unless steps are taken to revitalize them.

Here is an article with methodology for determining lifetime value

Labels: ,

January 4, 2007

Analytics vs. Screening

I am often asked to compare and contrast analytics and screening. Analytics generally refers to statistical data analysis tools such as data mining, segment profiling, modeling, etc. Screening generally refers to data purchasing to enhance a file with wealth or biographical data.

Screening and analytics answer different questions. If you were to ask, "who has the means to give a major gift?" you would need to acquire wealth data either through screening or prospect research. If you were to ask, "who is likely to give a major gift to us?" you would need to use analytics or surveys. However, surveys are limited by response rates.

Analytics may be used to determine which records to screen. Screening acquires external assets and biographical information to address the ability to give. However, not all asset data is public, only some records will receive ratings, and the data is often best used for major gift identification.

Analytics incorporates internal data and external data to address the likelihood of giving. It is a science of probability. All records receive a score, and all departments can benefit from analytics. Some of my clients had me build models predicting giving to specific colleges or units like the library or the law school. Organizations with in-house data mining/analytics programs frequently build models for specific constituency groups.

Screening:
  • Pros: Actual wealth data, helps inform capacity to give, can bring efficiency to research
  • Cons: Limited to information in public databases, not everyone gets rated, people can hide from these sources, cost of outsourcing

Analytics
  • Pros: Every record can be scored, can bring efficiency to research, builds understanding of the database, effectively done in-house
  • Cons: provides probability data rather than actual data, requires investment in skill development or should be outsourced, can be difficult to explain to non-technical staff

Labels:

December 7, 2006

CRISP-DM What is it?

Newer people to data mining often ask me about CRISP-DM. They may have heard about it at conferences or from peers. However, as statisticians often do, we assume everyone knows all of the acronyms. In the interest of demystifying data mining for the novice, I will try to provide some basic definitions and resources for common terminology. CRISP-DM is a great place to start. Basically, it is just the standard method for a data mining / predictive modeling project. The letters stand for "CRoss-Industry Standard Process for Data Mining."

"CRISP-DM has not been built in a theoretical, academic manner working from technical principles, nor did elite committees of gurus create it behind closed doors. Both these approaches to developing methodologies have been tried in the past, but have seldom led to practical, successful and widely-adopted standards. CRISP-DM succeeds because it is soundly based on the practical, real-world experience of how people do data mining projects. And in that respect, we are overwhelmingly indebted to the many practitioners who contributed their efforts and their ideas throughout the project."

Read More

Labels: