From perlman@mail.cis.ohio-state.edu Wed Jul 24 10:44:40 1996
Return-Path: <perlman@mail.cis.ohio-state.edu>
Received: from mailer.oclc.org (mailer.dev.oclc.org) by fssun09.dev.oclc.org (4.1/SMI-4.1)
	id AA11986; Wed, 24 Jul 96 10:44:39 EDT
Received: from mail.cis.ohio-state.edu by mailer.oclc.org (5.0/SMI-4.1)
	id AA22212; Wed, 24 Jul 1996 10:44:38 +0500
Received: from colon.cis.ohio-state.edu (colon.cis.ohio-state.edu [164.107.17.3]) by mail.cis.ohio-state.edu (8.6.7/8.6.4) with ESMTP id KAA18442 for <perlman@mail.cis.ohio-state.edu>; Wed, 24 Jul 1996 10:44:30 -0400
From: Gary Perlman <perlman@cis.ohio-state.edu>
Received: (perlman@localhost) by colon.cis.ohio-state.edu (8.6.7/8.6.4) id KAA16272 for perlman; Wed, 24 Jul 1996 10:44:30 -0400
Date: Wed, 24 Jul 1996 10:44:30 -0400
Message-Id: <199607241444.KAA16272@colon.cis.ohio-state.edu>
To: perlman@cis.ohio-state.edu
Subject: Notes/stat
Content-Length: 6402
Status: O

Sugggested changes for |STAT

PC-SIG #s 990-992

@name1 name2 ... nameN

use in regress anova pair contab etc (any parseline)
also used as comment lines for easy filtering
	@labels: name1 name2 ... nameN
in colex - output is automatic for fields
	(unless type checking or formatting is sued)
to extract colums requires lookup by specol
Only do analysis of named columns (e.g., in regress)
1. Use of column names in expressions
	dm s1 or x1 -> height width
		e.g., height*width -> x2*x3
	colex height width
		height width -> 2 3
2. Outputting names
	linex - just echo, but don't count?
	colex - output, but truncate to fmt?
	dm - use ptress to produce new column label?

See Mail/unixbox for background

	Modular design vs monolithic design
		how to avoid creeping featurism - POET
	Analyze WRT Norman's 7 stages of execution/evaluation
		A | B | C | D
		result can be probed at any point (using less or (ugh) intermediate files)
		remember the IO program? /n/ship/0/perlman/stat/Old/io.c
		How does this generalize to the good aspects of the UNIX philosophy
	

Problems with |STAT survey
	little idea of background of respondent
	most respondednts will be UNIX users

|STAT Survey
 
The following survey is to help me answer a question about user interfaces
that has been interesting me for several years:  Given that many packages
are available at low cost, especially to educational institutions, why
do people continue to use |STAT programs?  Despite having a command-line
interface, minimal documentation, and missing functionality, |STAT seems
to satisfy some needs that are not met by other packages that are available. 
It might be that |STAT can be copied freely, but my intuition is that there
are some fundamental user interface issues involved, but I do not want to
discuss them before I try to get some survey results.  So, if you are a
|STAT user, then please take a few minutes to answer the survey by placing
numbers and text after the double colons.
 
I am concerned about this appearing to be a market survey, which it is not.
I will, in anti-marketing fashion say that I have no plans to change |STAT.
|STAT will not suddenly become a commercial product but will continue to be
distributed under the same conditions as it has for the past ten years.
Anyone who wants the raw data or analyzed data form the survey can request
that I email it to them.  I reserve the right to remove identifying marks
from the responses.
 
For answers requiring an estimate of frequency, use the scale:
	1 never
	2 seldom
	3 sometimes
	4 often
	5 all the time
For answers requesting multiple answers, feel free to add extra lines.
 
[1] How often do you use |STAT programs? :: 
[2] Which |STAT programs do you use most often for analyzing data?
	[2.1] ________ :: 
	[2.2] ________ :: 
	[2.3] ________ :: 
	[2.4] ________ :: 
	[2.5] ________ :: 
[3] How often do you use:
	[3.1] individual commands :: 
	[3.2] pipelines of commands :: 
	[3.3] scripts or batch files :: 
[4] What non|STAT programs do you use most often with |STAT?
	[4.1] ________ :: 
	[4.2] ________ :: 
	[4.3] ________ :: 
	[4.4] ________ :: 
	[4.5] ________ :: 
[5] What |STAT programs do you use outside of data analysis (e.g., formatting)?
	[5.1] ________ :: 
	[5.2] ________ :: 
	[5.3] ________ :: 
	[5.4] ________ :: 
	[5.5] ________ :: 
[6] What other data analysis packages do you have easy access to, and how
often do you use them?
	[6.1] ________ :: 
	[6.2] ________ :: 
	[6.3] ________ :: 
	[6.4] ________ :: 
	[6.5] ________ :: 
[7] In decreasing order of importance, list 5 reasons why you use |STAT.
	[7.1] 
	[7.2] 
	[7.3] 
	[7.4] 
	[7.5] 
[8] In decreasing order of importance, list 5 ways that |STAT is easy to use.
	[8.1] 
	[8.2] 
	[8.3] 
	[8.4] 
	[8.5] 
[9] In decreasing order of importance, list 5 ways that |STAT is hard to use.
	[9.1] 
	[9.2] 
	[9.3] 
	[9.4] 
	[9.5] 
[10] What new functionality would you like to see in |STAT?
	[10.1] 
	[10.2] 
	[10.3] 
	[10.4] 
	[10.5] 
[11] Please write comments on the following areas about |STAT:
	[11.1] The functionality provided :: 
	[11.2] The documentation (paper and online) :: 
	[11.3] The input formats :: 
	[11.4] The command line options :: 
	[11.5] The output formats :: 

topic index for stat
keyword section for manual entries

:desc (runtime allocation)
:desc incorporate stats printout options
:is dm the only program using strings (yes) convert
:			anova: polynomial
:			number: length checks for loss of precision
:			contab: crosstab breakdown

pair	add way to add labels to points
Possible, and relatively easy to add a label to the same line as the data pair,
however, I am not sure how to handle something like:
	X   Y   A
	X   Y   B
	where X and Y are data pairs and A and B are labels.  What would you plot?
	I guess something like a * could indicate a conflict of labels.

:use NA for unknown values in programs
:   calc:UNDEFINED
:   incorporate in validata for checking
:   NA would cause removal of all related data
:	NA is an integer
:   # of NA data should be mentioned
	NA handling in DM

:	comment lines (could make own fgets, or make ncols == 0 (blank line))

:confidence intervals for desc -c and -i used (-i could be made -w)

dsort could produce the same effect as Jay McClelland's dt
	in the event of a tie, data columns are averaged, listed, or summed
	a	b	x
	a	c	y
	a	b	z
	==============
	a	b	f(x,z)
	a	c	y
	tie reoslution can be done on output

anova	onlint declared as int/void between UNIX and DOS
linex	print all but the lines named

series	add -n option to specify the number of elements in the series
	series 10 20 15
		incr = (high - low) / n;
		for (i = 0; i < n; i++)
			element = low + incr * i;
	simple transformations (e.g., exp, log, geom) in dm?

parseline could work by setting pointers to fields and not clobbering lines

parseline:
	leading space can be part of a field, with no effect on atoi or atof
	(some effect on strings, especially if left justified)
	This would allow outputs of parselined inputs to match the inputs

colspec needs ot be able to refer to Nth, N-1st column
	1-N
	N-1 --> N..1
	may need to change range character to : or ..

Multivariate
	factor analysis
	discriminant analysis
	canonical correlation
	multidimensional scaling
GLIM
Comparisons
	confidence intervals
	contrasts
	polynomial decomposition
Graphics Library
	ticklines
	label lines
Storage
	compare time for calloc (HUGE) vs data[HUGE]