Will My Data Science Project Succeed?

Assessing the probability of commercial success - Part 2.


In the previous blog, we discussed a methodology for assessing the probability of technical success - P(T). The aim was to use objective criteria and a mathematical formulation to derive the probability that a data science or AI prototype will succeed. The resulting number guides the decision to either activate or abandon the prototype phase of the project.

In this blog post we continue the topic of Feasibility Assessment. We focus on deriving the probability of commercial success - P(C). We defined P(C) as the likelihood that

  1. Our prototype can be scaled and delivered to end users, and that

  2. End users will utilize the features as they were designed.

The end users could be customers who purchase the products in which the prototype technology is embedded, or company users who utilize the commercial prototype within internal operations. In the latter case it may be more accurate to say “operationalized prototype,” but we will continue with “commercialized” as our choice terminology.

Assessing the probability of commercial success P(C) is similar in principle to assessing the P(T) but different in one important detail. While the P(T) function comprises mostly technical values and parameters evaluating the expertise, the P(C) is largely dependent on the psychological evaluation of the user base and corporate culture. This presents a challenge. Our goal is to be objective, but we will be discussing measures that are inherently difficult to quantify. Are we wallowing dangerously in the area of the subjective? More than I like, I must admit, but the process still retains rigor.

I will demonstrate with an example. Let’s pick a measure from the list of parameters that we inject into our probability function. We will cover the full list further in the paper, but for now let’s take “willingness to accept technological change”. It seems a nebulous metric, but with a little effort we can connect it to measurable historical events. Here is a possible process:

  1. Create a list of instances in the near past when your organization rolled out a replacement piece of software, went through a major version upgrade, or implemented an entirely new piece of technology. A few instances are better than one. Too many instances may clutter the analysis, so stick to a salient few.

  2. For each instance, interview prior project champions, and solicit perspectives on user pushback. Ask for the value they would assign to the metric on a 10 point scale. Take the average.

  3. Collect information on the projected adoption timeline vs the actual adoption timeline.

  4. Calculate the percentage of the time overruns over planned

  5. Assign a causal percentage to user pushback

  6. Review comments about the project on available company or user forums. Run a simple statistical analysis of how representative the negative reactions have been.

  7. Use this additional info to nudge the project champion assessment numbers up or down based on what you found.

The approach of determining an initial metric number from interviews, then making small adjustments based on the new information, has surprising efficacy. A variant of the approach is described and documented in Gardner and Tetlock’s book Superforecasting: The Art and Science of Prediction, 2013. I would recommend it for inspiration.

Commercial Success Evaluation.

Now let’s evaluate what it takes to commercialize our data science or AI prototype. We will pursue two different tracks: product and operational improvement. The product track represents commercializing the technology with a product delivered to end users in the market. Amazon product recommendations or Google Maps “next destination suggestions” are examples of commercialized data science or AI projects. Conversely, the operational improvement track focuses on delivering benefits to internal users within the company. Demand forecast for products or machine vision for production quality assurance are two examples.

The parameters that we use for our formula echo those from the P(T) evaluation. However each of the categories must be reviewed through a production lens. When evaluating the data for the prototype, we ask if the data is clean and accurate. If the data is not clean when generated at the source, then can we properly clean and prepare it for consumption. We may be able to accomplish this in the prototype phase, but in production a new set of complexities arise. For example, we may receive data in very high volume and at extreme velocity; both of these complicate cleaning and validation of the data in near-real time, which production deployments often require. Sometimes these added complexities turn a tractable task in the prototype phase into an impossible task in the production phase. Thus similar questions posed for P(T) and P(C) may produce diametrically opposed answers. Keep this shift in perspective in mind when conducting P(C) analysis.

The following list is geared towards internal company rollouts. A commercialization of products requires a different focus, which I may cover in the future. I invite you to think of this list as a mutable starting point that you may adjust to fit a particular vertical and scenario. Once you establish your own list, you evaluate each item on a 1-10 point scale, then plug those values into the ORA Score formula. The resulting ORA Score is a probability of commercial success expressed in percentages. If it is above the comfortable threshold, you do the project; if it hover immediately around the threshold, then you may give it further consideration.

Corporate Support and Business:

  1. Defined project champion at executive level

  2. Buy-in from corporate stakeholders

  3. Stakeholder alignment on the post-prototype scope

  4. Clearly articulated business value

  5. Adequate budget allocation to projects (past and current)


  1. We have all required data features

  2. The data for each feature is complete

  3. The data for each feature is clean and accurate

  4. We can validate and adjust data as necessary at production speeds

  5. We can access data without bureaucratic roadblocks

Infrastructure and Team:

  1. We have tools to handle high throughput and high volume

  2. We have a process for data science model testing, re-training and deployment

  3. We have experience with and tools for Continuous Integration

  4. Adequate assignment of development resources (current and past)

  5. Familiarity with the data science libraries

  6. Charter to expand the team as required

Deployment and rollout

  1. End-user involvement in prototype development

  2. Existing process for training users

  3. Existing process for technology rollout to users

  4. Adherence to timelines during development in the past

  5. Adherence to timelines during rollout in the past

  6. User’s willingness to accept technological change

  7. Process for Beta testing and validation prior to launch

  8. Process and budget for post-launch maintenance engineering

The following is the (almost) verbatim excerpt from the previous blog. There is no point in re-writing the formulation in different words, but it is also more convenient to restate it here rather than referencing another article.

"We assign a score to each of the 20 parameters, then plug the numbers into the formula we developed. We call the resulting number the “ORA Score” -- Ordinal Risk Assessment Score.

ORA Score

, is an array of individual parameter scores

N is a number of parameters, in our case N = 20

C is a compression coefficient, C = aN, where a = 3 (empirically chosen)

, is an array of parameter weights that determine the criticality of each parameter. w = 6 for a single parameter when s = 1 will have an effect of suppressing the P(T) below 0.5.

is the safest, unless the reasons for particular weights are well understood.

If you said “Huh?” to the above, no worries—give us a call and we will do the Feasibility assessment for you. That is our function. If you followed along then a word of caution on the weights. They matter very much and must make sense in the context of your business. They are not equal to 1 for all features all the time. How we derive our weight vectors is a bit of a trade secret, but with much experimentation you may arrive at a reasonable set for your context."

The primary purpose of defining the P(C) is to decide whether to continue to the next phase. What is the right threshold for moving forward? I don’t know. We generally gate our projects at 80% likelihood of commercial success, but the context matters as well as the risk tolerance of the client. The proper threshold should be determined by a business discussion with stakeholders."

25 views0 comments