| |
So let's show a wicked simple example of how you would use the Vilno programming language:
Here's a sample program:
directoryref a="/home/tom/mydata" ;
printoptions a/printout1 ;
convertfileformat asciitobinary("/home/tom/asciidata.txt"->a/datafile1) delimiter=',' varnames(name age weight) datatypes(str int float) strlengths(15) ;
inlist a/datafile1 ; if (name=="Sam") name="Samuel" ; sendoff(a/datafile2) name age weight ; turnoff ;
convertfileformat binarytoascii(a/datafile2->"/home/tom/newasciidata.txt") delimiter='|' varnames(name age weight) ;
print(a/datafile1) "Here is the input dataset" ; print(a/datafile2) "Here is the new output dataset, created by the data processing function" ;
OK, first off, the core of the Vilno programming language is the data processing function (DPF for short). See the paragraph of code that begins with "inlist" and ends with "turnoff" ? That's a data processing function. This data processing function is as simple as you can get, because it does a very slight and easy data modification, changing the spelling of "Sam" to "Samuel" in the NAME column. The data processing function reads in input datasets, crunches data, and writes out output datasets. The different types of data transformations that the data processing function can do are huge, I deliberately made a wicked simple data processing function here. Also, this program has only one data processing function, again to keep it simple. And the data processing function here reads one and only one input dataset, and writes only one output dataset, again to keep it simple.
The data processing function is where you get the real work done, the data transformation.
"turnoff" really means "end of paragraph" or "end of this data processing function". The spelling was a bad choice.
If you are reading and writing datasets that are in the binary format native to the Vilno software product, there is no need for the convertfileformat statements that are shown above. The first convertfileformat (just before the DPF) creates a binary dataset from the ascii data file. This binary dataset is a/datafile1, which, via the directoryref statement, is actually /home/tom/mydata/datafile1.dat . Then the data processing function reads in a/datafile1, does some calculations, and writes out a/datafile2 ( which is actually /home/tom/mydata/datafile2.dat) . Then the second convertfileformat statement, (just after the DPF) converts a/datafile2 to a new ascii data file.
So the convertfilestatement imports data from ascii data files ( typically with a comma or vertical bar as a delimiter ), and exports data out to ascii data files. The current version of Vilno does not yet read/write directly specialized formats (such as Oracle, MySQL, SAS, SPSS, etc.). But of course, you have the option of exporting/importing ascii data files from such products.
The print statements create data listings, printed to /home/tom/mydata/printout1.prt (because of the printoptions statement). These are data listings of the input dataset (a/datafile1) and the output dataset (a/datafile2). Printout1.prt is an ascii file, but it is not an ascii data file: it has page breaks, and the columns are aligned, with a title at the top of a page. It's a printout for the human eye to look at, not a dataset that a later program can read.
If you are tempted to play around with this thing, and you have a Linux computer, and you haven't updated your Linux distribution for eighteen months, by all means, go to the www.my.opera.com/datahelper site, go to the August 31 blog article, and there you will find a tarball-file to download, called vilnoAUG2006package.tgz . I'm sorry, but the GCC 3.x and GCC 4.x toolchains appear to be binary incompatible, from testing last summer. I will upload a tarball compatible with the newer GCC at a later date.
Here's an interesting idea: suppose one added functionality to import/export data from a variety of different sources( not just ascii data files), oh, and if you like a user interface, suppose one put a graphical user interface on top of the programming language( the GUI gives ease-of-use to beginners for very simple data situations, but the programming language gives the power and flexibility to deal with unexpected and messy data situations - that's why a GUI layer on top of a programming language layer is really the best of both worlds, kind of like SPSS). So suppose you added that sort of stuff? What do you get?
ETL software. (Extract, Transform, Load).
The data processing function does the "T" part of the "ETL" , (Transformation).
The convertfileformat statements do the "E" and "L" part, but only with ascii data files.
So the current version of Vilno is very strong on Transformation, but pretty simplistic on Extract and Load.
Vilno has transformation features ( data preparation ) but not yet statistical features (ANOVA, regression).
Well, if you have a Linux computer, you can use Vilno for the preparation of (often messy) data, and use R for statistics after the data is ready. (R uses the S programming language, which is not used that much for data preparation).
|
| | Posted 1/30/2007 7:25 PM - 63 Views - 4 eProps - 2 comments
- recommend
    - recs0
- share
- email
 - sent0
Give eProps or Post a Comment |