I have a database of patterns containing 252,295 records.
Each record contains:
1. item #
2. group #
3. sequence #
4. context
5. Type1 sequence pattern
6. Type2 sequence pattern
7. Next in sequence - Type 1
8. Next in sequence - Type 2
9. Next "next in sequence" - Type 1
Both sequence types are classified using letters. The letters come from 2 sets of letters: A-H and S-Z
For example, a pattern might be AZUBD. The next in sequence will be a single letter, e.g., "E." So that, the next record will have a pattern of ZUBDE.
**Major Study #1:
**
For each of the last 113,871 of these records I want to query all prior records (i.e., prior to the current record) for similar patterns.
From the query results, I want to know from the preponderence of the evidence which set of letters the pattern in previous history foreshadowed for the next in sequence - whether the next in sequence was likely to be a member of the set of letters A-H or S-Z. I'll want to know the statistics. [For type1 I will also want to know if the next "next in sequence was in the same set of letters as the next in sequence.]
I want the study of each record to be limited to records prior to the current record, to records of the same group#, and of the same context; and, of course, limited to the pattern of the current record.
In addition, I want this study done on the 5 letter pattern of the record; as well as a 4 letter pattern, a 3 letter pattern, and a 2 letter pattern, each constructed from the 5 letter pattern. [My thinking here is that the 5 letter pattern might be too restrictive, so I want to compare the results with less restrictive patterns.]
In addition, I want the study run 4 ways:
I want to see how type1 predicts type1, how type1 predicts type2, how type2 predicts type1, and how type2 predicts type2.
**Major Study #2
**
I want the Major Study #2 to be exactly the same as #1 with the following exception: I don't want each record's query limited to its group#.
The output should be in comma-delimited format or in some format that I can import into my database for further cross-tabulation.
[NOTE: It is critical to understand that the predictions cannot use "future" data. So, for each record only the prior records should be considered.]
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
4) Results must be verifiably accurate.
## Platform
I have a 1.33Ghz processor with 512Mb of RAM.
I run XP with service pack 2.