ASVToolbox
Mục lục bài viết
ASV Toolbox
This is the mirrored homepage for the ASV Toolbox project, which has not been actively supported since 2008. For legacy reasons, you can download version 1.0 of the ASV Toolbox. It is written in JAVA (compiled with version 1.5). The toolbox is distributed under the MIT license.
The program was developed at ASV Leipzig.
Introduction
ASV Toolbox is a modular collection of tools for the exploration of written language data. They work either on word lists or text and solve several linguistic classification and clustering tasks. The topics covered contain language detection, POS-tagging, base form reduction, named entity recognition, and terminology extraction. On a more abstract level, the algorithms deal with various kinds of word similarity, using pattern based and statistical approaches. The collection can be used to work on large real world data sets as well as for studying the underlying algorithms. The ASV Toolbox can work on plain text files and connect to a MySQL database. While it is especially designed to work with corpora of the Leipzig Corpora Collection, it can easily be adapted to other sources.
Installation
Download the zip file and unzip it into a directory of your choice.
ASV Toolbox modules and modules resources (examples, documentation, languages, …)
Download the zip file. Unzip the zip file to the directory containing the ASV Toolbox home.
Windows users might simply use “extract here”, UNIX users should use “unzip -o <filename>.zip”
If you download a module you have edit the file toolbox.start which you will find in config folder in your ASV toolbox home. Every module has a copy of this file named toolbox.start.modulename. After unzipping the module, this file is located in the config folder. Copy the line into the toolbox.start file (use a new line). Example: if you want to include Genetomorph and ViterbiTagger, your toolbox.start file should look like this:
de.uni_leipzig.asv.toolbox.genetoMorph.GenetoMorph
de.uni_leipzig.asv.toolbox.viterbitagger.gui.ViterbiTagger
The complete ASV Toolbox package contains the following modules:
- Chinese Whispers: graph clustering tool
- Levenshtein: spell checking tool
- Baseforms: baseform reduction and splitting compound nouns tool
- Pretree: training tool for pretrees and classify tool
- TE: terminology extraction tool
- Pendulum: gazetteer bootstrapping tool (for Named Entity Recognition)
- Namerec: Named Entity Recognition system
- JLanI: language identification tool
- Viterbitagger: POS tagging tool
- Zipfel: tool for Zipf’s law
- AHC: agglomerative hierarchical clustering tool
- Genetomorph: finding morphological structure with a genetic algorithm
- Your Tool: template tool for your program
Version: 1.0
file format: zip
file size: 258MB
file link: ASV Toolbox.zip
alternative file link: ASVToolbox.zip
Citing ASV Toolbox
Chris Biemann, Uwe Quasthoff, Gerhard Heyer and Florian Holz (2008): ASV Toolbox: a Modular Collection of Language Exploration Tools. Proceedings of LREC-08, p. 1760-1767 Marrakech, Morocco (pdf)