ASVToolbox

ASV Toolbox

This is the mirrored homepage for the ASV Toolbox project, which has not been actively supported since 2008. For legacy reasons,  you can download version 1.0 of the ASV Toolbox. It is written in JAVA (compiled with version 1.5). The toolbox is distributed under the MIT license.
The program was developed at ASV Leipzig.

Introduction

ASV Toolbox is a modular collection of tools for the exploration of written language data. They work either on word lists or text and solve several linguistic classification and clustering tasks. The topics covered contain language detection, POS-tagging, base form reduction, named entity recognition, and terminology extraction. On a more abstract level, the algorithms deal with various kinds of word similarity, using pattern based and statistical approaches. The collection can be used to work on large real world data sets as well as for studying the underlying algorithms. The ASV Toolbox can work on plain text files and connect to a MySQL database. While it is especially designed to work with corpora of the Leipzig Corpora Collection, it can easily be adapted to other sources.

Installation

Download the zip file and unzip it into a directory of your choice.
ASV Toolbox modules and modules resources (examples, documentation, languages, …)

Download the zip file. Unzip the zip file to the directory containing the ASV Toolbox home.
Windows users might simply use “extract here”, UNIX users should use “unzip -o <filename>.zip”

If you download a module you have edit the file toolbox.start which you will find in config folder in your ASV toolbox home. Every module has a copy of this file named toolbox.start.modulename. After unzipping the module, this file is located in the config folder. Copy the line into the toolbox.start file (use a new line). Example: if you want to include Genetomorph and ViterbiTagger, your toolbox.start file should look like this:

de.uni_leipzig.asv.toolbox.genetoMorph.GenetoMorph

de.uni_leipzig.asv.toolbox.viterbitagger.gui.ViterbiTagger

The complete ASV Toolbox package contains the following modules:

  • Chinese Whispers: graph clustering tool
  • Levenshtein: spell checking tool
  • Baseforms: baseform reduction and splitting compound nouns tool
  • Pretree: training tool for pretrees and classify tool
  • TE: terminology extraction tool
  • Pendulum: gazetteer bootstrapping tool (for Named Entity Recognition)
  • Namerec: Named Entity Recognition system
  • JLanI: language identification tool
  • Viterbitagger: POS tagging tool
  • Zipfel: tool for Zipf’s law
  • AHC: agglomerative hierarchical clustering tool
  • Genetomorph: finding morphological structure with a genetic algorithm
  • Your Tool: template tool for your program

Version: 1.0
file format: zip
file size: 258MB
file link: ASV Toolbox.zip

alternative file link: ASVToolbox.zip

Citing ASV Toolbox

Chris Biemann, Uwe Quasthoff, Gerhard Heyer and Florian Holz (2008): ASV Toolbox: a Modular Collection of Language Exploration Tools. Proceedings of LREC-08, p. 1760-1767 Marrakech, Morocco (pdf)