I would like to get a tree structured text file(s) (in basic HTML format) into JSON. The tree structure is done using indents in HTML, every branch is also commented. The tree contains 720 branches out of which 20 are main ones; the others are sub-branches. Every branch contains 0 to ~20 ids. It is vital to maintain the structure so that one could parse down the tree. Each tree branch has an tag/label that I would like to link to two other files (they have the same identifier as one column; all the information per id is in a single row). So if the identifiers match I would like to get those rows into JSON too.
Now these results need to searchable. For example I would like to know everything about a certain node, filter data by position or find everything by a certain author etc. And in the future link to other resources. I don't know what would be the best way to do this with JSON data... I'm open to suggestions. Also the code should be easily maintainable - I need to add entries and cross references later on.
This would be a task for someone familiar with ontologies and creating searchable nomenclature(s). Building SPARQL enpoints was mentioned earlier (triples can be easily created from the data) and I'm keen to the idea, however, will listen to good new ones. The most important thing is to maintain solid logic when going down the tree, I would like to have very thorough syntax regarding all the elements. In addition I would like to add geographical positions of the tree branches later on, so having that taken into account would be beneficial.
I would prefer Java, Perl and/or javascript for this project but again good alternatives are welcome - the main point is to get the job done!