NOTE: The tags described here are entirely different from the "tags" used for marking up text in the html display mode. It is merely an unfortunate coincidence that the same term has traditionally been used for two different things. To the best of my knowledge, Elvis is the first project to use both the symbol-indexing tags described here, and the mark-up "tags" used in HTML.
Traditionally, each tag has three attributes: its name (generally the name of a function, or some other symbol from your program), the name of the source code file in which that function is defined, and the address of its line within that file. The tag name is used for selecting a particular tag.
Elvis supports a more sophisticated model. The extensions are intended to allow Elvis to handle C++, and similar languages, which allow different functions to have the same name. Since tag names are derived from function names, the tag name alone isn't sufficient to select a single tag. Elvis permits tags to have other attributes, which help it select the correct tag.
All of this is described in more detail below, in the following sections:
The first attribute is the tag's name. It is typically the name of a function, variable, or data type -- a name that you could guess by looking at the source code of your project. Traditionally, this attribute has been the sole means for selecting a tag, so tag names should ideally be unique. The lines of the tags file are sorted by this attribute.
The second attribute is the name of source code file in which the corresponding function (or whatever) is defined. If it isn't an absolute file name (relative to the root directory) then it should be relative to the directory where the tags file resides -- which isn't necessarily the current working directory.
The third attribute is the address of the line within that file, where the function (or whatever) is defined. This address can either be a line number, or a nomagic style of regular expression. If it is a regular expression, it must be bound by '/' or '?' characters, and it may contain tab characters. Typically, the entire source line is encoded as a regular expression by inserting "/^" onto the front, appending "$/" onto the end, and inserting a backslash character before each / or \ character within the line.
Elvis actually supports a superset of this format (by permitting extra attributes) but we'll start with the basics.
You will usually invoke ctags on all source files in the current directory via a command similar to this:
ctags *.c *.h
The ctags program can also generate other types of output. Be sure to look at its manual page to see the options.
ref selects tags just like Elvis, and then displays information about them. The simplest way to use it is to pass it the name of the tag you're interested in. The following example would display the definition of the "main" function:
ref main
There are some other options. One of the most useful is -a which instructs ref to display all selected tags. (Without -a it just displays one of the selected tags.) For example, if your current directory contains many programs, each with its own "main" function, then this would display the headers for all of them:
ref -a main
You can also use ref to generate an HTML document listing all tags, or just the ones that match some criteria. Here's an example which lists all tags as an HTML document:
ref -ha >tags.html
ref uses the same syntax for restrictions and sorting hints as Elvis' :tag command. This syntax will be described later. You should check the manual page for a list of options.
When Elvis is in visual command mode, you can
move the cursor onto a word and hit ^] to
perform a :tag
search on that current word, or
^T to perform a :pop
command.
If you have a mouse, then you can use the left button to double-click on a
word in the text, to have Elvis perform a :tag
search on that word.
Double-clicking the right button anywhere in the text will perform a
:pop
command.
The uppercase K command runs program on the word at the cursor position. The program is chosen by setting the keywordprg option. By default, it runs the ref program, so the word's definition is displayed temporarily at the bottom of the screen.
In a typical large project, you will have some directories which contain library functions, and some which contain the code for specific programs. With this arrangement, you would set TAGPATH to search the current directory followed by each of the library directories. Something like this...
setenv TAGPATH=tags:/usr/src/libproj/tags:/usr/src/libio/tags
The exact syntax depends on your command interpreter. And of course the exact directory names will depend on your project.
When your current directory is one which contains the source code for
some program, and you do a search for (as an example) "showitem", Elvis
would look for it first in that program's tags file, and if it isn't
found there then it'll look in each library's tags files until it does
find it.
The ref
program searches the same way.
Actually, Elvis uses an option named tags
to store the search path.
The default value of that option is taken from the TAGPATH environment variable,
though.
If you don't set TAGPATH (or the tags
option),
then Elvis will search only in the current directory.
The default path for ref
is a little more sophisticated.
That's because ref
is intended to be general reference utility for
all library functions, while Elvis' tags facility is mostly intended for
navigating through the source code of a single program.
Note to system administrators:
ref
can be so handy that I suggest you make a tags file for
the functions in your system's standard libraries.
If licensing restrictions prevent you from making the library source code
available to all users, then you should use ctags -r
to generate a "refs" file.
If you don't have access to the library source code yourself, then perhaps
you can make something useful from the lint libraries.
There are two tactics for solving the problem. The first tactic is to be more selective; i.e., use information other than just the tag name to select tags. This definitely helps, but it is an absolute impossibility to resolve all such ambiguities prior to run-time, so we also need a second tactic: collect all possible tags into a list, and use heuristics or explicit hints from the user to sort the list so the most likely alternative is tried first, the second most likely if the first was rejected, and so on down the list. Elvis uses both tactics.
In the tags file, Elvis permits tags to have extra attributes. Each attribute has a name and a value. The first three fields are named tagname, tagfile and tagaddress. Those names are implicit; the names don't appear in the tags file, only the values do.
If a tag has any extra attributes, they will be appended to the tag line. In order to allow the original vi/ex to read tags files which have additional attributes, a semicolon-doublequote character pair is appended to the tagaddress, before the first extra attribute. Due to an undocumented quirk of the original vi/ex, this will cause vi/ex to ignore the remainder of the line. The extra attributes will not adversely affect the behavior of the original vi/ex.
The extra attributes have explicit names. In the tags file, the extra attributes are generally given in the form <TAB>name:value. Different tags may have different extra attributes; many will have no extra attributes at all. The attributes may appear in a different sequence for each tag.
In a single tags file, Elvis supports up to 10 distinct attribute names -- the 3 implicit names for the standard fields, plus up to 7 explicit names for extra attributes. (This is a limitation of Elvis, not the enhanced tag format.)
The name can be any series of letters or digits. Lowercase letters are preferred.
The value can contain any character except NUL. Any backslash, tab, or newline characters should be stored as \\, \t, or \n, respectively.
If an extra attribute has a value but no name or colon, then the name is understood to be "kind".
The extra attributes are intended to describe the contexts in which the corresponding program symbol can appear. Typically the name is a type of lexical scope, and the value is the name of that scope; e.g., "function:init" for a tag which is only defined inside the init() function. Elvis can use these as hints to figure out which tags might make sense in the current context, and ignore those that don't. Although the extra attributes have no preset names, the following names are recommended:
Note that since the default attribute name is kind, a solitary letter can denote the tag's type (e.g, "f" for a function).
If the value is given as an empty string (just "file:"),
then it is understood to be the same as the tagfile field;
this special case was added partly for the sake of compactness, and partly
to provide an easy way handle tags
files that aren't in the
current directory.
The value of the tagfile field always relative to the
directory in which the tags file itself resides.
The ctags program has been hacked slightly to support some of these, but not all. Its new -h flag enables generation of the extra hint attributes; if you invoke ctags without any flags, then -h is one of the flags that it uses by default. For example, the usual command for generating tags for all source files in the current directory is...
ctags *.c *.cpp *.h
The current hacked-up ctags
distributed with elvis
will only generate file and class hints,
and even class isn't as effective as one might hope.
Some pseudo-tags may be inserted at the top of the tags file, to describe the characteristics of that particular tags file. These tags all begin with a "!_" so that even if the tags are sorted, the pseudo-tags will always appear at the top of the file. The pseudo-tags all use the old tags format, so they can be parsed (and then ignored) by older tag reading programs.
!_TAG_FILE_FORMAT 2 /supported features/ !_TAG_FILE_SORTED 1 /0=unsorted, 1=sorted/The
!_TAG_FILE_FORMAT
pseudo-tag's tagfile
field is 2 for new-style tags, or 1 for old-style tags.
The !_TAG_FILE_SORTED
pseudo-tag's tagfile
field is 1 if sorted, or 0 if unsorted.
The tagaddress field is used simply as a comment in both tags.
If these tags are missing from a tags file, then the file is assumed to be
in the new format (which is still backwards compatible with the old format),
and sorted. If a tags file is unsorted then it must contain a
!_TAG_FILE_SORTED
field indicating that.
These may be followed by more pseudo-tags describing the ctags
program itself.
Elvis' version of ctags
produces the following information:
!_TAG_PROGRAM_AUTHOR Steve Kirkendall /kirkenda@cs.pdx.edu/ !_TAG_PROGRAM_NAME Elvis Ctags // !_TAG_PROGRAM_URL ftp://ftp.cs.pdx.edu/pub/elvis/README.html // !_TAG_PROGRAM_VERSION 2.2 //
The new tags file format also addresses another limitation of the old format:
the old format allows fields to be delimited with any whitespace.
This is a problem because space characters are becoming more common in
file names these days, so we occasionally need to put spaces into the
tagfile field.
To support this, the new format dictates that fields must be delimited by a single
tab character, not spaces.
This shouldn't cause any backward compatibility problems because traditionally
ctags
has always used tab as the delimiter.
Also, the interpretation of the tagaddress field has
been refined.
Traditionally, it has been defined as either a line number or a
nomagic
regular expression,
but it has actually been implemented in vi/ex to support any ex command
there.
Supporting any command could produce a security hole, so the new format
only supports addresses.
It supports more complex addresses though, because they can be useful
in some circumstances.
For example, the tag line for a "val
" field in a struct
named "item_s
" could look like...
val file.h /^struct item_s {$/;/^ int val;$/ struct:item_s... which would allow the editor to skip past any "int val;" definitions in other structs, to find the correct "int val;" in the
item_s
struct.
This form of tags file is also supported by Darren Hiebert's Exuberant ctags and by Vim, in addition to Elvis.
If you ever need to convert a new-style tags file back to the old style, you can do so via the ref utility. Run it like this:
ref -ta >oldtags
The arguments of the :tag
command are now
whitespace-delimited expressions of the following forms, to define a set
of restrictions that possible tags must meet to be selected:
The parser also allows you to add some sorting hints to the command line. These hints are added to the history that Elvis uses to guess which overloaded tag to list first.
All of these restriction expressions and the sorting hint expressions allow you to give multiple acceptable values. You can either give each value in a separate expression, or give a comma-delimited list of values to a single expression.
A null value string matches anything. So "struct:=" would accept any tag with a "struct" attribute, and reject those without it. This would be handy when you're trying to do tag lookup for a word which follows a '.' character -- you know it is a field name, but you don't know which struct type.
The :tag
command automatically adds a
file:filename restriction
(where filename is the name of the file being edited in the
current window)
to any tag search you request.
This causes it to ignore tags which are static to other files.
The :browse command doesn't do that.
See the Browsing section, below.
While searching for a tag, Elvis builds a list of tags which matched the restrictions. That list is sorted primarily by the tagname attribute's value, but when multiple tags have the same name, Elvis looks for the attributes of those tags in the lists of successes and failures, and uses the weights of any matches to compute the likelyhood that a particular tag is the one that the user really wants. The more likely tags are inserted into the list before any less likely tags with the same name.
Expressions of the form name:+value add a name/value pair to the success list, and expressions of the form name:-value add a name/value to the failure list. Name/value pairs are also added automatically in the following circumstances:
It should be stressed that the tag history has no effect on which tags are selected from the tags file. It only affects the order in which they're presented, if more than one tag meets your restrictions.
:tag
command keeps this list hidden, and moves the cursor to
the single most likely member of that list.
This is not always the best way to select a tag.
Elvis has a :browse command which performs a tag search, and then builds an HTML document from the list. The document shows all tags which matched your search criteria; the current window will then switch to this document. There is also a :sbrowse command which displays the same document in a new window.
The arguments to :browse
differ from :tag
in the
following ways:
:browse
does not automatically add any
restrictions.
(:tag
adds file:filename to each search.)
:browse
with no arguments, Elvis will
assume you wanted tagfile:filename,
where filename is the name of the file being edited in the
current window.
:browse
with a single argument, Elvis first
tries to interpret it as a restriction or sorting hint in the normal way.
But if that search yields no tags, Elvis may retry the search using your
argument as a file name (tagfile:argument), or
as a class name (class:/argument).
By default, :browse
only searches through the "tags" file
in the current directory. When invoked as :browse!
(with a "!"
suffix) it collects matching tags from all "tags" files as specified by the
tags option.
Here are some examples of :browse
commands.
:browse term
:browse
:browse foo.c
:browse tagname:=
:browse class:/DbItem
:browse DbItem
:browse class:/DbItem
Each matching tag in the generated document has a hypertext link to the
point in your source where the corresponding symbol is defined.
By following the hypertext link, you can go directly to the appropriate
point in your source code.
As usual, the tag stack can be used to :pop
back to the same browser
document, from which you may then proceed to a different tag, or :pop
back one more level to wherever the cursor was located before you gave the
:browse
command.
If you wish, you can define your own format for the browser document. Elvis searches through the elvispath for a file named "elvis.bro". If found, then blank lines in it will be used to delimit it into three sections:
The ref -ha restrictions... program generates a similar HTML document. It always interprets its arguments as restrictions, and the format of the HTML document can't be reconfigured. Those are the only differences.
When you give a :tag command,
Elvis evaluates the tagprg
option's value using the
simpler expression syntax.
Any instance of $1 in the value will be replaced with the
command-line arguments. Also, any text inside parentheses will be evaluated;
this gives you a way to access other options' values, so you can do things
like pass the value of the tags option to the
program so it knows which tags files to search through.
The resulting string is then executed, and its output is parsed as though
it was a tags file. All of the tags that it outputs are considered to be
matches, since using tagprg
disables the use of restrictions.
Elvis builds a list of the matches, and sorts them using the same history mechanism that is uses with restrictions. However, the "name:+value" and "name:-value" sorting hints are not detected in the arguments.
Once the list has been built, Elvis moves the cursor to the first match. You can step through all matches in the list by hitting ^] or by giving the :ta command with no arguments, as usual.
Note that the program's output should be in the standard tags file format. At a minimum, this means "tagname TAB filename TAB address". If you want to use a function searching program that uses a different format, you'll need to pipe its output through a custom-made filter that converts its output to the standard tags format.
One common technique is to use the :local
command in an alias, to set
tagprg
temporarily for a single search.
The :text alias in
lib/elvis.ali
is an example of this.