Skip to content

Probability math notes part 2 – Bayes Theorem – Example with Software Quality Control

Bayes Theorem can be derived from the condition probability principles stated in Part 1 below.

⁃ Pr(A|B) = Pr(A and B)/Pr(B), if Pr(B) not = 0
⁃ Pr(B|A) = Pr(A and B)/Pr(A), if Pr(A) not = 0
⁃ Pr(A and B) = Pr(A|B)*Pr(B) = Pr(B|A)*Pr(A)
Pr(A|B) = Pr(B|A)*Pr(A)/Pr(B)

**Note: Pr(A and B) can be visualised with a Venn Diagram. The set that is common to two or more sets overlap, the overlapping set is the and space. Example: of 10 people that like fruit, five people like apples and four people like oranges. Two people like apples and oranges. In this case, one person likes neither apples or oranges, three people like apples but not oranges and two people like oranges but not apples.

Example: Quality and Control with Software

Suppose you have 7 programmers at your company developing Java classes. The following table describes their track record with quality:

Programmer | Proportion of development | Probability of defective class (code)
1                                 .10                                        .03
2                                 .05                                        .03
3                                 .20                                        .02
4                                 .15                                        .02
5                                 .25                                        .01
6                                 .15                                        .02
7                                 .10                                        .03

Suppose that the Quality and Assurance department (Q&A) reviewed classes at random and found them to be defective. What is the probability that Programmer 1 developed this defective code?

If Programming is an event, then these are seven mutually exclusive events and their union is the entire sample space. A class was only developed by one and only one of the Programmers. (We’re assuming in this case that one programmer develops one class, which usually isn’t the case since programmers work in teams, however, this example may be extrapolated to groups, i.e. group 1 develops 10% of the code with a quality rate of 97%).

Let Bi (i = 1,2, … 7) be the event that the class was developed by programmer i, and let A be the event that the class is defective. Then, for example:

Pr(B1) = .10 and Pr(A|B1) = .03

In this case we have to calculate the reversed conditional probability. That is, we need to calculate the probability that the event, B1 (probability of Programmer 1 developing the code) is from the sample space A (probability of defective code). By applying Bayes’ Theorem the formula would be:

Pr (B1|A) = Pr(A|B1)*Pr(B1) / Pr(A|B1)*Pr(B1) + Pr(A|B2)*Pr(B2) + … + Pr(A|B7)*Pr(B7)
Pr(B1|A) = (.10)(.03) / (.10)(.03) + (.05)(.03) + (.20)(.02) + (.15)(.02) + (.25)(.01) + (.15)(.02) + (.10)(.03)
Pr(B1|A) = .15

So there is a 15% probability that the defective code is from Programmer 1.

Probability math notes part 1 – Combinations and Permutations

I found these notes for finite math principles to be helpful. They have to do with combinations and permutations…there are plenty of resources on the web but you may find these useful. I personally learned this stuff a long time ago but it’s always helpful to brush up on the basics. Enjoy!

Combinations and Permutations

  • If the order doesn’t matter it’s a combination.
  • If the order does matter it’s a permutations.

Two types of Permutations (order does matter):

  • Repetition is allowed: 
  • Formula: n^r
  • Example: a lock with ten numbers to choose from (0, 1…9) numbers with three to choose from. This would equate to 10^3 = 1,000. 
  • Repetition is not allowed 
  • Formula: n! / (n – r)!
  • Example: if you were to figure out how to order 16 pool balls, the result would be 16! = 20,922,789,888,000. If however you only wanted to calculate the number of permutations required to choose 3 out of the 16 pool balls then you would have 16! / (16-3)! = 16! / 13! = 3,360

Two types of Combinations (order doesn’t matter):

* Note there will always be more permutations than combinations

  • Repetition is allowed: 
  • Formula: (n + r -1)! / r!(n-1)!
  • Example: choose three scoops of ice-cream out of five possible flavours. In this case, the result would be (5+3-1)! / 3!(5-1)! = 7! / 3!*4! = 35.
  • Repetition is not allowed:
  • Formula: n! / r!(n-r)! . Also known as the binomial coefficient, also known as saying, for example 16 choose 3 (choose 3 pool balls out of the 16 available).
  • Example: choose 3 pool balls out of the available 16, but in this case the order does not matter. The result would be 16! / 3! X (16-3)! = 16! / 3! X (13)! = 16! / 3! X (13)! = 560. 

Number of probabilities in an Event

The number of probabilities in an event is: 

Pr(E) = [Number of outcomes in E] / N

Where E is the event and N is the sample space.

Example: there are 8 white balls and two green balls in an urn. If you were to choose 3 balls out of the urn, what are the chances that they are all white balls?

Solution: number of white balls are 8. Since order doesn’t matter and repetition is not allowed, the formula would be for a combination without repetition: C(8,3) = 56. The probability that this event would occur would have to be divided by the sample space, which is C(10,3). So the result would be C(8,3) / C(10,3) = 7/15.

Calculating ‘at least’ questions (these are always on quizzes and tests and are brain teasers)

Two possible ways may be used to calculate ‘at least’ questions. For example, what is the probability of selecting one green ball out of three selected from an urn that has 2 green balls and 8 white balls. 

One: use the multiplication and addition principles

  • Event for first green ball is C(2,1) X C(8,2).
  • Event for second green ball is C(2,2) X C(8,1).
  • Probability for both events are added: C(2,1) X C(8,2) + C(2,2) X C(8,1).
  • Divide by sample space using probability of an event formula: C(2,1) X C(8,2) + C(2,2) X C(8,1) / C(10,3) = 8/15 

Two: use the complement rule

  • Complement rule: Pr(E) = 1 – Pr(E’)
  • Since the probability of selecting ‘all three balls are white’ = C(8,3) / C(10,3) = 7/15.
  • The event ‘at least one ball is green’, is F.
  • The complement of F, F’, is equal to E. Therefore E = F’.
  • So by the complement rule, Pr(F) = 1 – Pr(F’) = 1 – Pr(E) = 1 – 7/15 = 8/15.

Conditional Probability

Pr(E|F). Sample space, F, is restricted. Written as Pr(E|F) = P(E and F) / Pr (F)

If E and F are independent, then Pr(E and F) = Pr(E) * Pr(F).  Example: rolling dice, one roll has no affect on the other. 

May also be expressed as Pr(E|F) = Pr(E) and Pr(F|E) = P(F). This is obtained from the above by: Pr(E|F) = Pr(E and F)/Pr(F) = Pr(E)*Pr(F)/Pr(F), so Pr(F) cancels out leaving Pr(E|F) = Pr(E)

A set of events are independent from each other if Pr(E1 and E2 and E3 and … En) = Pr(E1 * E2 * E3 … En). For example, a stereo has the probability of 2% of having a defective CD, 3% of having defective amplifier and 7% of defective speakers. Let these be events E, F and S^2, respectively. Then the chance of the system not being defective is E’, F’, S’^2. So the result would be Pr(E’) * Pr(F’) * Pr(S’)^2 = .98*.97*.93^2 = .822.

 

Example Python Script – Calculating ABV percentage for beer or wine

Hi everyone,

I have been playing around with Python on my Mac OSX Lion, version 2.7. (If you need to confirm the version just type using your Terminal:

$ Python -version

I came up with this handy script to check the ABV (alcohol percentage) for Beer or Wine. It’s a handy script if you’re into home brewing. It also helped get me going on how simple it is to use Python and it’s also handy for prototyping different algorithms before getting to some serious coding in another language!

By the way, you may use this script without restrictions, but if I ever meet you buy me a beer, preferably a DIPA :=)).

/*
* ----------------------------------------------------------------------------
* "THE BEER-WARE LICENSE" (Revision 42):
* wrote this file. As long as you retain this notice you
* can do whatever you want with this stuff. If we meet some day, and you think
* this stuff is worth it, you can buy me a beer in return Poul-Henning Kamp
* ----------------------------------------------------------------------------
*/

# Program used to calculate the alcohol percentage by volume in beer.
# @ Greg Werner

# Variable definitions for command prompt inputs. Includes conversion from strings to floats

og = raw_input('Please enter your Original Gravity reading: ')
o = float(og)
tempog = raw_input('Please enter the temperature, in F, when obtaining your Original Gravity: ')
tempOG = float(tempog)
fg = raw_input('Please enter your Final Gravity reading: ')
f = float(fg)
tempfg = raw_input('Please enter the temperature, in F, when obtaining your Final Gravity: ')
tempFG = float(tempog)

# Function definition for original gravity and final gravity parameters with three floating point positions.

def abv(o, f):
x = '%.3f' % (o)
y = '%.3f' % (f)
og = float(x)
fg = float(y)

# Conditions for adjusting original y final gravity readings according to temperature in F.

if tempFG <= 69:
fg = fg + 0
elif tempFG >= 70 and tempFG <= 76:
fg = fg + 0.001
elif tempFG >= 77 and tempFG <= 83:
fg = fg + 0.002
elif tempFG >= 84 and tempFG <= 94:
fg = fg + 0.003
elif tempFG >= 95 and tempFG <= 104:
fg = fg + 0.004
else:
fg == 105
fg = tempFG + 0.007

if tempOG <= 69:
og = og + 0
elif tempOG >= 70 and tempOG <= 76:
og = og + 0.001
elif tempOG >= 77 and tempOG <= 83:
og = og + 0.002
elif tempOG >= 84 and tempOG <= 94:
og = og + 0.003
elif tempOG >= 95 and tempOG <= 104:
og = og + 0.004
else:
og == 105
og = og + 0.007

abv = ((1.05 * (og - fg))/ fg) / 0.79 * 100

return '%.3f' % abv

ABV = float(abv(o, f))

# Print results for ABV calculation.

if ABV <= 5.00:
print ('Your beer has an alcohol content of ' + str(ABV) +'. You have a low alcohol beer.')
elif ABV >= 5.01 and ABV <= 8.00:
print ('Your beer has an alcohol content of ' + str(ABV) +'. Drink in moderation, you have a medium alcohol beer.')
else:
print ('Your beer has an alcohol content of ' + str(ABV) +'. Be careful, you have yourself a bomb!')

November: living without the Internet and watching TV…

Ok! Internet is misson critical and TV sucks.

I moved during November and lost my Internet for about a month. I did have Internet at work, but needless to say it didn’t feel right to work on my personal stuff at work…but wow was I impressed with how much we rely on the Internet! Some of the things that really caught me off guard:

  • Had to have my wife pay my bills remotely, she turned into my virtual private assistant (straight from the 4 hour work week – thanks Ferris). Except I could understand her and was fully audited on all my transactions…
  • Could only check my emails on my clunky 3G phone. Blah.
  • Getting home at night and watching junk on TV is WAY worse that surfing on the web, even if your not surfing for anything in particular. At least you can read, watch videos, virtually socialize, man TV is such a dud. SUCH A DUD.
  • Not being able to work at night stinks. When else can you catch up on all those ignored emails? During the day in the middle of meetings? Don’t think so.
  • Missed out on Cyber Monday, dang.
  • Couldn’t write in this blog, aggh!
  • Couldn’t conference with family and co-workers in other time zones

And the list goes on.

Maybe Santa Clause can bring my a redundant fiber link and perma Internet connection LTE chip embedded in my nose.

 

How much cash to the large tech companies have?

A few weeks ago I published an article in Spanish about how much revenue large tech companies have in comparison to some of the countries around the world. To make a long story short, just two to three of the big tech companies put together have more revenue than the Gross Domestic Product (GDP) of many countries around the world. And with money comes power.

But how liquid are these companies, i.e. how much do they have in cash and other short term investments, i.e. current assets? As the old saying goes, cash is king, and boy do these tech juggernauts hold the crown. After a few minutes on Google Finance, Wikipedia and some other sites, I found some information, when compared to several governments, helps put things into perspective.

Cash and short term assets on some of the big tech companies balance sheets

  • Microsoft has about 75 billion dollars in current assets. It’s not uncommon for them to add between 1-4 billion dollars in cash per quarter.
  • Apple has about 47 billion dollars in current assets. They are also generating cash on a quarterly basis at an insane level.
  • IBM also has about 47 billion in current assets. Do they generate cash every quarter? Just about, say a billion here, a billion there.
  • Google is no slouch either and is in good company, they hold about 47 billion in current assets as well. They hold about 39 billion dollars in cash and short term investments.
  • HP surprised me. Even with all their troubles of late they “only” have about 56 billion dollars in current assets.
  • Cisco is also insane. They have about 57 billion in current assets.
  • Let’s add EMC2 into the mix as well. They have about 10.2 billion dollars in current assets. Their operating cash flow is also solid.
  • Dell…. about 30 billion in current assets. Not bad huh?
Let’s add the current assets from these eight companies: that’s about 370 billion dollars in current assets. Remember we’re not talking about total assets here, only cash and stuff they can sell relatively quickly to get cash in a hurry.

Cash balances for some governments

Governments don’t follow the same accounting principles that us mortals do. The closest thing to current assets are international reserves, also known as official reserves or foreign exchange reserves. These are holdings in international currencies (such as the US dollar and Yen), gold, special drawing rights (SDRs) and Internation Monetary Fund reserve positions.
Ok, so we all know China has a lot of cash and cash equivalents handy, to the tune of about 3.2 trillion USD dollars. But they are by far and away the country with the most international reserves.
So, where would the eight companies above rank if they were a country? Number six in the world. That’s right. A few of the countries that would be ranked below:
Brazil (35o billion), United States (142 billion – isn’t that nuts?), India (311 billion), Germany (230 billion), Mexico (131 billion), the list goes on and on.
Considering how much cash these companies have generated in the last ten years, and considering how most developed countries have not improved their international reserve positions, how will they compare 2022?

 

Using regular expressions with your web site and Ruby on Rails application

Regular expressions, also known as regex or regexp, are a useful and powerful way to match strings of text. I first ran into regular expressions when reading chapter 6 of Michael Hartl’s Rails Tutorial. After testing some examples and doing some research, I realized that using regex is also helpful for filtering Google Analytics reports and improving your organic search results in Google. (I don’t know if regex works with Bing but would assume so). So it is definitely worth it to learn a little regex, especially if your site is similar to Twitter or Groupon, i.e. your web application is very tied into your www website.

A brief overview

Regular expressions are used to match strings of text. This is done by using characters and metacharacters.

  • Characters are literal and case sensitive. The letter A matches the letter A, the letter b matches the letter b, the number 1 matches the number 1, etc.
  • Metacharacters are not treated literally. Here are the most common ones:
    • Dot . is a wildcard for any character. .ind will match with Windows, windows, or lindows.
    • Backlashes \ are used to match characters that need to be matched with metacharacters, such as 192.168.1.1. Since the dot (.) is a metacharacter, matching 192.168.1.1 needs to be done with 192\.168\.1\.1 adding the backlash before the standard matacharacter.
    • Brackets [], similar to lists in most programming languages, are used to group a set of characters that you need to match. If you need to match a character regardless of case, you could match [Dd]og for matching Dog and dog.
    • Repetition ? + * {} allows you to search for sequences of characters.
      • The ? allows you to match for no characters or up to one character. For example, 123? would match 12 and 1233 but not 12333.
      • The + matches one or more characters. For example, 123+ would match 123, 1233 or 1233333.
      • The * combines ? and +, in other words, in matches none or any number of characters after. For example, 123* matches 12, 123, 123333.
      • The {} matches repetitions. [0-9]{2} will match any two digits, [a-z]{6,8} will match any number of lower case characters between six and eight digits.
    • Grouping () |. Parantheses are used to group characters together. If you wanted to match results with your short name and your full name for example (in may case it would be Greg or Gregory) you could use Greg(ory). Pipes (|) are used for the OR operator. For example you could use (Greg|greg)ory for Gregory and gregory.
    • Anchors ^ $ are used mainly for matching the beginning set of characters, which is handy for URL matching. The caret matches ^services/ would match services/myservices and services/myservices/theseservices. The $ works the other way around, that is, /services$ would match myservices/services but not services/these services.
By the way, kudos to http://www.seomoz.org/ for helping clarify some of the above definitions.

Using Regular Expressions to Improve Organic Search Results

Gone are the days of focusing on plastering your logo on a big bill board over you local interstate, particularly if you are a new company with limited marketing resources. And if you sell your products and services online, having your site positioned well with organic search results is extremely important to improve web site traffic and sales. Well, regular expressions can also help you with this task.

Filter Internal IP Addresses from Web Site Traffic

If you are serious about SEO you are probably using Google Analytics to monitor the quantity and quality of your web site traffic. Therefore your first step is to make sure your data within Google Analytics reflects external traffic only, not traffic generated by your internal network. You can use a regular expression with Google Analytics to filter out your internal network IP addresses. Below is a screen shot on how to use User Defined filters with regex:

Here the regular expression ^201\.159\.133\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-4]))$ looks for any IP address that matches 2o1.158.133.1 through 2o1.158.133.254.

If you don’t want to code from scratch, Google Analytics has a handy feature to create the regex for you at http://www.google.com/support/analytics/bin/answer.py?answer=55572. There new custom filter feature allows you to create all kinds of filters without necessarily having to know regular expressions, but understanding them will help you tweak how the filters work.

Rewrite URLs

As you have probably heard, if your keywords are within your URL then organic placement in search results will improve. Most content managers these days, particularly Word Press, allow you to edit how your URL displays within your visitors web browser without knowing regex. Lots of web sites still use Apache though, so knowing a little regex will help when editing your .htaccess files to Rewrite URLs (make sure that RewriteEngine On is set).

Using regular expressions within your RoR application

One of the cool topics in the Rails Tutorial by Michael Hartl is the explanation on how to simplify field validations. The following example shows how the code looks in your *.rb file, for example in <your application>/app/models/user.rb:

class User < ActiveRecord::Base
  attr_accessible :name, :email

  email_regex = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i  validates :name,  :presence => true,
                    :length   => { :maximum => 50 }
  validates :email, :presence => true,
                    :format   => { :with => email_regex }
end

The above example looks for a specific pattern of characters in order to determine if it complies with an email format. At first, regular expressions look like gibberish but if you learn how to use regex it will make you life a lot easier.

For Ruby, there is a site named Rubular (www.rubular.com), which allows you to use the site as a reference guide for Ruby based regex. It also allows you to test your regex before you add it to your code.

So next time you are adding validations or other types of string matching tasks to your Ruby on Rails site, consider using regex in order to simplify your life. Just practice a little and in no time you’ll be able to use regex within your RoR application for field validations and the likes, use it to filter Google Analytics reports and improve your organic search results by modifying your web server URLs.

Yahoo! needs to change its Roots!

There’s an old saying out there: really good management is only necessary if you need to cover up for faulty product. Not like everything takes care of itself by just having a great product, but it certainly makes things a lot easier, doesn’t it?

A Little History

After several underpreforming quarters, the board brought co-founder Yang back in to try to get the company back to its roots. For the most part it worked, Microsoft bid 33 bucks a share for Yahoo!, but Yang and the board got greedy and rejected the offer. Then Silicon Valley veteran Bartz was brought in to shape the ship…that didn’t work so well either and investors acted. How would you like to be fired over the phone?

Bartz’s recent ouster from Yahoo! is no surprise, considering the company has been underperforming the S&P 500 for some time now. I mean, really, you could put any CEO into that position and anyone could flounder, unless their product and strategy were to change drastically. Issue is, they lost track of their core business.

Yahoo!’s Roots Are No Longer Applicable

I go to Yahoo!’s site once in a while, but it’s stale. News items are from Thompson Reuters or from the Associated Press. They also have a Travel site. Have you ever bought a plane ticket from Yahoo!? I bet you haven’t, you probably use Expedia, Priceline, Orbitz or go directly to your favorite airline’s website. By the way, Yahoo!’s site is very handy for testing your Internet connection: just type “ping http://www.yahoo.com”; and the site responds, one of the few to do so.

What’s the first thing we think about when when we think of Yahoo!? We think Internet portal and content, a.k.a. Internet mag, but I bet you don’t think of it as a premier search engine, heck that’s an afterthought as it’s not even their technology anymore. “I’m going to Bing for my favorite Yoga studio through Yahoo!”…no way. I also bet you don’t think of it as an B2B auction site, such as Alibaba, even though Yahoo! has a significant stake in the company.

When Yang started Yahoo! back in the 90′s he set it up to categorize information that was on the Internet so that he could find it better. So at its core, Yahoo! was and still attempts to be a portal that categorizes information so that you can find it. Trouble is the site is so full of content now that it’s more like a news magazine that a site to manage content via categories. Just type http://www.yahoo.com into your favorite web browser and you’ll see what I mean. Then Google came along and we all know how that went. Searching for content by first placing it into categories is cumbersome and no longer applicable!

Yahoo!’s Future

If I were CEO of Yahoo! I would…hide under my desk. Anyone that has to take over the CEO has a huge challenge in front of them. Basically, there would be two choices: sell the company in pieces or shift their strategy. The former would be more appealing to the likes of Proxy-Battle-Guru Icahn, the latter would be more appealing to long term strategic investors. In any case, here are some ideas:

  • Start selling applications in the Cloud: they could either penetrate this space via acquisition or could build their own technology by leveraging data centers. Google does it, Microsoft does it, heck all companies are doing it one way or another. They could sell applications specialized in SMB’s, such as something similar to Google Docs.
  • Sell infrastructure to provide content on an automated basis: similar to the way Amazon sells its fulfillment services to third parties, Yahoo! could sell content publishing infrastructure to those who need it. I bet they have some significant IP to offer, not a lot of companies offer so much content at an aggregate level.
  • Categorize Internet content by rank: SEO companies help position websites organically within Google results, but these are specific strategies to improve the relevancy of a website based on certain key words. However, it would be nice if Yahoo! could leverage all their categories and place “Top 10 Autoparts Websites” using their ranking algorithms.

What else could they do? Next few years shall be very interesting for Yahoo! Time to die a slow death or re invent themselves!

Follow

Get every new post delivered to your Inbox.

Join 79 other followers

%d bloggers like this: