sql login permissions: logistic regression model formual

hello ,

i am using neural network model to build logistic regression

how can i see the formula of the model

the paramters that came out and their coefficiant

i have look at the model content but couldn't find it

The coefficients of the regression formula are available in the model content. Here are some queries which should help you find them. Assume you have a LogReg model, named [BIke Buyer] which attempts to predict BikeBuyer based on other attributes:

1. Detect the mappings of all the inputs:

SELECT NODE_UNIQUE_NAME, NODE_RULE FROM [BikeBuyer].CONTENT WHERE NODE_TYPE=21

(21 is the code for log reg and neural net input node)

The result will be something more or less like below:

NODE_UNIQUE_NAME: 6000000000, NODE_RULE: <NormDIscrete field="Gender" value="M">

NODE_UNIQUE_NAME: 6000000001, NODE_RULE: <NormDIscrete field="Gender" value="F>

basically, one node unique name for each input continuous attribute or input discrete attribute state

2. Identify the output node associated with your target attribute (BikeBuyer)

SELECT NODE_UNIQUE_NAME, NODE_RULE FROM [BikeBuyer].CONTENT WHERE NODE_TYPE=23 AND ATTRIBUTE_NAME='BikeBuyer'

( 23 is the code for output nodes. One output node is associated with each state of the output. The query above returns all the output nodes for the BikeBuyer attribute. NODE_DESCRIPTION will allow you to identify the respective state. Typical result:

NODE_UNIQUE_NAME=800000000000, NODE_RULE=<NormDiscrete field="BikeBuyer" value="1">

NODE_UNIQUE_NAME=800000000001, NODE_RULE=<NormDiscrete field="BikeBuyer" value="0">

3. Choose which regression formula you need to extract. Assume you want to extract the formula for BikeBuyer=1, then the interesting node is 800000000000

4. Extract the regression coefficients computed for each input for this node :

SELECT FLATTENED (SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE FROM NODE_DISTRIBUTION WHERE VALUETYPE=7) FROM [BikeBuyer].CONTENT WHERE NODE_UNIQUE_NAME='800000000000'

The query:

- returns the attribute name and attribute value fields of the nested distribution table for those distribution rows with VALUETYPE=7 (coefficients)

- applies only to your node of interest (BikeBuyer=1)

Typical result:

ATTRIBUTE_NAME: 6000000000, ATTRIBUTE_VALUE:-0.16951552773

ATTRIBUTE_NAME: 6000000001, ATTRIBUTE_VALUE:0.11941652788

The last row will have an empty attribute name. That is the free coefficient score

ATTRIBUTE_VALUE is the score associated with the input described by ATTRIBUTE_NAME. For example,

-0.16951552773 is the coefficient associated with Gender=M (see the first query)

One of

5. Put everything together:

Compute

z(BikeBuyer=1)=Sum(InputNormalizedValue*InputScore) + FreeCoeffScore

ez(BikeBuyer=1) = exp(z)

z is computed for all the BikeBuyer states.

Prob( BikeBuyer=1) = ez(BikeBuyer=1) / (SUM(ez(BikeBuyer=*))

The only thing that is not directly available from content is the InputNormalizedValue, the numerical value associated with each individual input. That is computed as a z-score based on data distribution in the training set. The data distribution in the training set is exposed in the content, you can retrieve it with a query like:

SELECT FLATTENED NODE_DISTRIBUTION FROM [BikeBuyer].CONTENT WHERE NODE_TYPE=24 // distribution node

Hope this helps

|||

tx very mutch

|||

Hi Bogdan,

Thanks for the very useful information. Could you also provide examples of how to compute the InputNormalizedValue (i.e. z-score) for Discrete attributes (NODE_DISTRIBUTION.VALUETYPE = 4)? In this case, it would seem that the NODE_DISTRIBUTION.PROBABILITY represents the MEAN, but NODE_DISTRIBUTION.VARIANCE is always set to 0. I tried computing the STDEV based on a Bernoulli distribution with the specified probabilities, but the resulting z-scores didn't seem to jive with a MS SQL example I saw online (http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/1509.aspx).

Also, just to verify, for Continuous attributes (NODE_DISTRIBUTION.VALUETYPE = 3), the z-score would equal (Input Value - NODE_DISTRIBUTION.ATTRIBUTE_VALUE) / SQRT(NODE_DISTRIBUTION.VARIANCE), correct?

Thanks,

Doug

|||

I'd love to see the answers as well...

Best, Adam.

Monday, March 12, 2012

logistic regression model formual

No comments:

Post a Comment

sql login permissions

Blog Archive

About Me