Message 2004-10-0093: Re: Phylogenetic Notation

Wed, 15 Sep 2004 00:31:29 -0700 (PDT)

[Previous by date - Re: PROPOSED AMENDMENT TO THE PHYLOCODE: Article 10.2]
[Next by date - Re: Phylogenetic Notation]
[Previous by subject - Re: Phylogenetic Notation]
[Next by subject - Re: Phylogenetic Notation]

Date: Wed, 15 Sep 2004 00:31:29 -0700 (PDT)
From: [unknown]
To: Mailing List - PhyloCode <phylocode@ouvaxa.cats.ohiou.edu>
Cc: David Marjanovic <david.marjanovic@gmx.at>
Subject: Re: Phylogenetic Notation

--- David Marjanovic <david.marjanovic@gmx.at> wrote:
>
> > And will definitions written in either of these always be unambig=
uous?
>=20
> If the registration database administrator is vigilant enough, prob=
ably they
> will... :-}

Not good enough, says I.
=20
> > > I'll try to retrieve the system I proposed a month or so ago (i=
t's
> simpler,
> > > uses non-ASCII characters but only such that occur in iso-8859-=
1
> "Western
> > > European", and is not capable of expressing the more complex of=
 your
> > > examples), to see if I could find something about your system t=
o quibble
> > > about... :-)
> >
> > I'd be very interested to see that. Perhaps it can be instructive=
 in
> making the
> > system more accessible.
>=20
> Here is it, updated from my post from June 15th (and another from t=
he 18th
> that seemingly didn't get through). In fact, I think all characters=
 are
> ASCII after all.
> -------------------------------------------
>=20
> A through G are taxa, M is an apomorphy. Parentheses indicate optio=
nal
> additions, such as more than two specifiers.
>=20
> Node-based:
> {A(, B, C...) + D}
> "{}" used instead of "Clade()" because it's shorter, already used o=
n
> a few websites, language-free, and avoids confusion with the method=
 to write
> a tree -- (A + (B + C))). It keeps definition and description apart=
. The
> identity to the brackets used for mathematical sets is a fortunate
> coincidence.

Or not-so-fortunate, in my eye. Because you've appropriated them for =
clades,
now they can't be used for other types of sets of organisms. The shor=
thand in
the current draft of PhyloCode does specify "clade", but I think the =
real issue
here is grammar, not vocabulary. Vocabulary (e.g., the word "clade") =
can be
defined rigorously and independently in any language. Grammar cannot,=
 and that
is what I was trying to overcome by using mathematical notation.

> Stem-based:
> {A(, B, C...) # D(, E, F...)}
[snipped]
> Apomorphy-based (should those be allowed):
> {M @ A (+ B, C...)}
[snipped]

These (and node-based clades) are already provided with shorthand not=
ations
(which are also ASCII-friendly) in the current draft of PhyloCode.

>         Hey, wait!!! Actually we don't need _any_ mention of "in" h=
ere. We
> could just write {M A (+ B, C...)}, couldn't we? :-)

I suppose ... somewhat discombobulating, though.

>         (The apomorphy itself would still have to be written in a l=
anguage.
> Theoretically, this could be used as an argument to ban apomorphy-b=
ased
> definitions -- but if the apomorphy is well enough described and fi=
gured,
> _this_ shouldn't produce any problem in the real world.)

As mentioned before, I can't think of any way around this, either. A =
necessary
evil, unless you don't view apomorph-based clades as necessary.
=20
> One kind of qualifying clause:
> {[...] \ G}
>         "\" is the mathematical "without" sign, and exists on every=
 computer
> keyboard. Does not work for Art. 11.9 Example 1, but for Example 2:
> *Lepidosauriformes* =3D {*Lacerta agilis* + *Crocodylus niloticus* =
\ *Youngina
> capensis*}.

_C. niloticus_ and _Y. capensis_ should be switched there, right?
Didn't know about that usage of the "backslash".

>         (Should math be preferred, this could be "{*Lacerta agilis*=
 +
> *Crocodylus niloticus*} \ {*Youngina capensis*}" instead; however, =
this can
> make it confusing to tell how many definitions there are or where i=
t ends.)

I don't think that's a problem, since there is clearly an operator br=
idging the
two expressions.

> Another kind of qualifying clause:
> {[...] | [condition]}
>         "|" is the mathematical sign that is used in a similar way.=
 Let's
> see... it works for Art. 11.9 Example 1: *Pinnipedia* =3D {*Otaria =
byronia*,
> *Odobenus rosmarus* + *Phoca vitulina* | flippers @ *Otaria byronia=
*,
> *Odobenus rosmarus*, *Phoca vitulina*}. More examples will need to =
be tested
> to see if this notation can become confusing.

"|" is usually translated orally to "such that" or "where". But it se=
ems to me
what you really want here is a conditional, usually written as an arr=
ow and
orally translated as "if X, then Y", e.g. {"flippers" @ _Otaria byron=
ia_ +
_Odobenus rosmarus_ + _Phoca vitulina_} !=3D =D8 -> _Pinnipedia_ =
=3D {_Otaria
byronia_ + _Odobenus rosmarus_ + _Phoca vitulina_}

("!=3D =D8" should be read as "not equal to a null set")

>         (Another question is if this is needed at all, even if
> apomorphy-based definitions will be allowed. For example, despite t=
he
> emphasis on the apomorphy, *Pinnipedia* is a crown-group here; it w=
ould be
> _the very same clade_ if it were defined {*Otaria*, *Odobenus* + *P=
hoca* > [your favorite terrestrial Carnivora]}.)

No, it could, in theory, still be a crown clade not including any oth=
er extant
carnivorans ("fissipeds") AND have an ancestor that did not possess f=
lippers.

>         Several conditions could be separated with ";", for example=
.
>=20
> Stem-modified crown definition (Note 9.4.1):
> {=A5 A # B}
>         =A5 is the symbol for "crown-group". Totally straightforwar=
d. It
> depicts a cladogram with a node that is marked by double underlinin=
g. =3D8-)
> =3D8-) =3D8-)

Hehe ... international traders might disagree.

> Disadvantage: Not available on German keyboards, at least.

or North American

> Advantage: Seems to be ASCII.
>         Perhaps this could be shortened to {A =A5 B} -- if this is =
not too
> confusing (A is the internal, B is the external specifier).
>         (I have only just noticed that such definitions, too, can
> self-destruct, namely if A is extinct; then there's a possibility t=
hat there
> is nothing alive that's closer to A than to B.)

Good point, although I don't think there's anything wrong with self-d=
estructing
names. (Nor do you, judging from your abstract.)

> Apomorphy-modified crown definition:
> {=A5 M @ A}
>=20
> Ancestor-based definition (like "*Homo sapiens* and all its descend=
ants"):
> {A}
>         A is the ancestor. The format is straightforward because a =
species
> or specimen cannot by itself constitute a clade if it has any desce=
ndants.

Here's where I really dislike this notation, because it looks like "t=
he set of
A".

>         Not applicable for Panbiota/Nominata/Nominanda*; its defini=
tion
> would have to be interpreted as apomorphy-based, {life @ *Homo sapi=
ens*}
> respectively {life *Homo sapiens*}.

I've provided a method of notating the definition without apomorphs. =
You seem
to be equating "life" with ancestor-descendant relationships, which i=
s probably
a good idea, but possibly presumptuous?
=20
[snipped]
> And now the big test: Can I manage to express the definition of
> *Ichthyornis*?
> {*Ichthyornis dispar* # *Struthio camelus*, *Tinamus major*, *Vultu=
r
> gryphus* | amphicoelous cervical vertebrae, [rest of the list] @
> *Ichthyornis dispar*}
>         I think this works. Does it?

Nope. We know that the characters appear in _I. dispar_; the question=
 is hwo
far back they go. The actual prose definition is worded not so much a=
s a
definition with a qualifying clause, but as an intersection of two cl=
ades.
Rendering this is not really possible in your notation or the shortha=
nd
proposed in PhyloCode.

It seems to me there are not enough really good single ASCII characte=
r
approximations for expression of set notation. Boolean notation could=
 get by to
some extant on the symbols used in C-based computer code:

&: "and"
|: "or"
~: "not"

But some of these conflict with other symbols ("&" can appear in cita=
tions, and
"|" is important in set notation, as discussed above).

Perhaps a better ASCII solution, then, would be words marked off by o=
therwise
unused characters, such as backslashes (\).

\member of\ (lower case epsilon)
\not member of\ (lower case epsilon with slash)
\for all\ (upside-down A)
\exists\ (backwards E)
\union\ (U-like curve)
\intersection\ (U-like curve upside-down)
\not\ (=AC; it is ASCII, but in the interest of consistency....)
\and\ (angle pointing up)
\or\ (angle pointing down)
\subset of\ (c-like curve with line underneath)
\proper subset of\ (c-like curve)
\not subset of\ (c-like curve with slash)
\unequal\ (equals sign with slash)

And some can be approximated pretty well:
-> (right arrow: "if ... then ...")
<- (left arrow: reversed "if ... then ...")
<-> (double arrow: reversed "if and only if ... then ...")
=D8 ("null set")
| (straight line: "such that" or "where")
' (prime tick)

Giving this a try on the example definitions (which I've since also
incorporated PhyloCode's current proposed shorthand into):

_Pinnipedia_ =3D nodeClade(Specifiers)
<- Specifiers =3D {_Otaria byronia_ de Blainville 1820, _Odobenus ros=
marus_
Linnaeus 1758, _Phoca vitulina_ Linnaeus 1758}
/and/ nodeClade(Specifiers) /subset of/ apomorphClade({=93flippers=
=94}, Specifiers)


_Lepidosauriformes_ =3D Content
<- Content =3D clade(_Lacerta agilis_ Linnaeus 1758 not _Youngina cap=
ensis_ Broom
1914)
/and/ Content /subset of/ Sauria
/and/ Sauria =3D clade(_Lacerta agilis_ Linnaeus 1758 and _Crocodylus=
 niloticus_
Laurenti 1768)

_Halecostomi_ =3D clade(_Amia calva_ Linnaeus 1766, _Perca fluviatili=
s_ Linnaeus
1758 not _Lepisosteus osseus_ Linnaeus 1758)

_Dinosauria_ =3D clade(_Iguanodon bernissartensis_ Boulenger in van B=
eneden 1881
and _Megalosaurus bucklandi_ von Meyer 1832 and _Hylaeosaurus armatus=
_ Mantell
1833)

_Nominata_ =3D clade(firstAncestors({_Homo sapiens_ Linnaeus 1758}))

_Saurischia_ =3D clade(_Megalosaurus bucklandi_ von Meyer 1632 not _I=
guanodon
bernissartensis_ Boulenger in van Beneden 1881)

_Panaves_ =3D panstemClade(_Aves_)
<- _Aves_ =3D clade(_Struthio camelus_ Linnaeus 1758 and _Tetrao majo=
r_ Gmelin
1789 and _Vultur gryphus_ Linnaeus 1758)

_Predentata_ =3D clade("predentary bone" in _Iguanodon bernissartensi=
s_ Boulenger
in van Beneden 1881)

_Ichthyornis_ =3D apomorphClade(SelectedDiagnosticCharacters, Specifi=
ers)
/intersection/ stemClade(Specifiers, AvesSpecifiers)
<- SelectedDiagnosticCharacters =3D {"cervical vertebrae: amphicoelou=
s or
=91biconcave=92", "bicipital crest on humerus with pit-shaped fossa f=
or muscular
attachment located directly at the distal end of the bicipital crest"=
,
"dimensions of the ulna=92s dorsal condyle such that the length of th=
e trochlear
surface along the posterior surface of the distal ulna is approximate=
ly equal
to the width of the trochlear surface taken across its distal end", "=
oval scar
located on the posteroventral surface of the distal radius, in the ce=
nter of a
depression", "large tubercle developed close to the articular surface=
 for the
first phalanx of the second digit where the deep tendinal groove for =
the m.
extensor digitorum communis ends"}
/and/ Specifiers =3D {_Ichthyornis dispar_ Marsh 1872b}
/and/ AvesSpecifiers =3D {_Struthio camelus_ Linnaeus 1758, _Tetrao m=
ajor_ Gmelin
1789, _Vultur gryphus_ Linnaeus 1758}

_Ichthyornis dispar_ Marsh 1872b =3D species(YPM 1450)


And on some of the derived functions:

e /member of/ Organisms ->
ancestors(e) =3D {x /member of/ Organisms | x /member of/ parents(e) =
/or/
/exists/ y /member of/ parents(e): x /member of/ ancestors(y)}

/for all/ x /member of/ S: x /member of/ Organisms /union/ Specimens =
/union/
Species ->
specifiedOrganisms(S) =3D {x /member of/ Organisms | /exists/ y /memb=
er of/ S: (y
/member of/ Organisms /and/ x =3D y) /or/ (y /member of/ Specimens /a=
nd/ x =3D
organism(y)) /or/ (y /member of/ Species /and/ x =3D organism(type(y)=
))}

S' =3D specifiedOrganisms(S) ->
commonAncestors(S) =3D {x /memebr of/ Organisms | /exists/ y /member =
of/ S': x
/member of/ ancestors(y)}

S' =3D specifiedOrganisms(S) /and/ S' /member of/ AncestralSets ->
ancestorClade(S) =3D S' /union/ {x /member of/ Organisms | /exists/ y=
 /member of/
S': x /member of/ descendants(y)}

I' =3D specifiedOrganisms(I) /and/ E' =3D specifiedOrganisms(E) ->
stemClade(I, E) =3D ancestorClade({x /member of/ Organisms | x /membe=
r of/
commonAncestors(I') /and/ x /not member of/ lineage(E')})

Illegible? More legible than the mathematic symbols? Equally difficul=
t? You be
the judge because I need some sleep.

=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D> T. Michael Keesey <http://dino.lm.com/contact>
=3D=3D=3D=3D=3D> The Dinosauricon <http://dinosauricon.com>
=3D=3D=3D=3D=3D> Instant Messenger <Ric Blayze>
=3D=3D=3D=3D=3D


=09=09
__________________________________
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail=20

  

Feedback to <mike@indexdata.com> is welcome!