Discussion:
[xmonad] Migrate from String to Data.Text [proposal]
Matt Walker
2015-12-18 17:12:17 UTC
Permalink
Hi everyone,

I noticed that xmonad and xmonad-contrib both prefer to use of String =
[Char] for their stringy data-types. This is probably a terrible idea. I
cite some sources here, then outline their arguments below.

http://www.alexeyshmalko.com/2015/haskell-string-types/
https://mail.haskell.org/pipermail/haskell-cafe/2014-June/114745.html

__String is Bad__

1) Char is a horribly inefficient representation of a character, being an
entire machine word in length (at least 32 bits). Actually, it's worse:
each Char takes up _two_ machine words in GHC, since it needs one to store
GC information in. See the slide in the first link for more details.
Data.Text stores the characters in compact arrays.

2) Lists are lazy, which makes their evaluation slower. You have to thunk
on each character, which is pretty silly most of the time. Normally you
want to read in at least _chunks_ of string all at once. Data.Text is
strict, but Data.Text.Lazy exists and is (as you would assume) lazy when
you need it.

The long and the short of it is that [Char] is a suboptimal choice to use
for anything except possibly short identifiers; Haskell (via GHC) is a
compiled language, and yet performs orders of magnitudes worse than even
Perl and Python on text processing when using Data.String. There is simply
no good reason to use String when Text exists.

__Alternatives__

The other alternative is ByteString. Although ByteString is a great type
for binary data, and specifically for data exchange protocols, it seems
that it would inappropriate in this situation, due to the replacement of
most (if not all) instances being actual textual data, which obviously Text
is optimized for.

__Migration Issues__

Assuming we can agree that Text > String then, the main problem to
switching would be the pain of migration, and whether this would be worth
it. I argue it wouldn't be so bad, and is worth doing on principle alone.

The LANGUAGE pragma of OverloadedStrings allows you to use String literals
as Text literals, so that wouldn't be the main problem. The main issue is
changing all the interfaces so they accept Text instead of String, and how
this would impact existing user configs, and the xmonad-contrib archive.
Every time you use ++ you would have to replace it with <>, the Monoid
infix mappend operator. I doubt many people use : to build Strings, but in
those instances those would have to be changed too. Finally, pattern
matching on Strings like (x:xs) would break as well. All other functions
would require changing from their String/List counterpart to the Text one.
Since the names clash, one would have to import qualified as, for instance,
T and call T.intersperse or whatever. It would be a non-trivial
undertaking, but certainly doable.

__Other Breaking Changes__

Are there other niggling issues that exist in the codebase that would cause
breaking changes? Perhaps it would be a good idea to get a list of them
all and see if it's worth breaking backwards compatibility to fix them all
at once? I'm a purist when it comes to code, but I would like to hear what
other people think, and just how angry they would be with this change. I
have no idea as to what xmonad and xmonad-contrib's breaking changes policy
is.

Obviously I'm not proposing this change be undertaken for 0.13 -- I was
aiming for more 0.14 or later.

Let me know what you think.

Sincerely,
Matt
Brandon Allbery
2015-12-18 17:17:44 UTC
Permalink
Post by Matt Walker
The long and the short of it is that [Char] is a suboptimal choice to use
for anything except possibly short identifiers
Almost all uses of String in xmonad are very short; if not, you're likely
doing something wrong. (The exception is the help text which does not need
to be optimal, and is simply output as is in what amounts to the most
optimal use case for lists. Meanwhile the overhead of Text is significant
for very short strings such as xmonad uses. Use of Text in this case is a
pessimization.

It would be on you to demonstrate that switching to Text is a net gain.
--
brandon s allbery kf8nh sine nomine associates
***@gmail.com ***@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
Gwern Branwen
2015-12-18 17:18:21 UTC
Permalink
Post by Matt Walker
The LANGUAGE pragma of OverloadedStrings allows you to use String literals
as Text literals, so that wouldn't be the main problem. The main issue is
changing all the interfaces so they accept Text instead of String, and how
this would impact existing user configs, and the xmonad-contrib archive.
Every time you use ++ you would have to replace it with <>, the Monoid infix
mappend operator. I doubt many people use : to build Strings, but in those
instances those would have to be changed too. Finally, pattern matching on
Strings like (x:xs) would break as well. All other functions would require
changing from their String/List counterpart to the Text one. Since the
names clash, one would have to import qualified as, for instance, T and call
T.intersperse or whatever. It would be a non-trivial undertaking, but
certainly doable.
I don't see a single benefit for the users to undergo this invasive
and painful upgrade, which is particularly harsh on the less
experienced Haskellers as it involves subtleties of types and an
unfamiliar Text type.

Neither of your two listed benefits is at all relevant to users:
Xmonad and all its extensions are not doing more than trivial amounts
of string manipulation, and Xmonad as a whole is not even a
performance bottleneck - X and the windows being displayed are the
usual slow parts.
--
gwern
http://www.gwern.net
Brent Yorgey
2015-12-18 17:39:34 UTC
Permalink
I would also note that it's not as if xmonad developers are unaware of the
existence of Text. The fact is that the first version of xmonad was
released two years prior to the first release of text (2007 vs 2009). So
at the time there was simply no alternative to String. By the time text
became stable and widely accepted, xmonad-contrib was already quite large.

I think converting all of xmonad-contrib from String to Text would be a
much larger and more tedious undertaking than you seem to think. Even if
someone put in the effort to do that, it would indeed break pretty much
every user config ever, for little benefit. By and large, xmonad users
(some of whom do not even know very much Haskell) have come to expect
extreme stability from xmonad. (I have been running essentially the same
config unchanged for many, many years.) Forcing a bunch of people with
little Haskell experience to upgrade their configs from String to Data.Text
would probably result in many of them abandoning xmonad.

Personally, I definitely prefer doing the Right Thing over preserving
backwards compatibility (witness how many breaking changes we routinely
introduce with each new release of the diagrams library). But I just don't
think this makes sense for xmonad.

-Brent
Post by Matt Walker
Hi everyone,
I noticed that xmonad and xmonad-contrib both prefer to use of String =
[Char] for their stringy data-types. This is probably a terrible idea. I
cite some sources here, then outline their arguments below.
http://www.alexeyshmalko.com/2015/haskell-string-types/
https://mail.haskell.org/pipermail/haskell-cafe/2014-June/114745.html
__String is Bad__
1) Char is a horribly inefficient representation of a character, being an
each Char takes up _two_ machine words in GHC, since it needs one to store
GC information in. See the slide in the first link for more details.
Data.Text stores the characters in compact arrays.
2) Lists are lazy, which makes their evaluation slower. You have to thunk
on each character, which is pretty silly most of the time. Normally you
want to read in at least _chunks_ of string all at once. Data.Text is
strict, but Data.Text.Lazy exists and is (as you would assume) lazy when
you need it.
The long and the short of it is that [Char] is a suboptimal choice to use
for anything except possibly short identifiers; Haskell (via GHC) is a
compiled language, and yet performs orders of magnitudes worse than even
Perl and Python on text processing when using Data.String. There is simply
no good reason to use String when Text exists.
__Alternatives__
The other alternative is ByteString. Although ByteString is a great type
for binary data, and specifically for data exchange protocols, it seems
that it would inappropriate in this situation, due to the replacement of
most (if not all) instances being actual textual data, which obviously Text
is optimized for.
__Migration Issues__
Assuming we can agree that Text > String then, the main problem to
switching would be the pain of migration, and whether this would be worth
it. I argue it wouldn't be so bad, and is worth doing on principle alone.
The LANGUAGE pragma of OverloadedStrings allows you to use String literals
as Text literals, so that wouldn't be the main problem. The main issue is
changing all the interfaces so they accept Text instead of String, and how
this would impact existing user configs, and the xmonad-contrib archive.
Every time you use ++ you would have to replace it with <>, the Monoid
infix mappend operator. I doubt many people use : to build Strings, but in
those instances those would have to be changed too. Finally, pattern
matching on Strings like (x:xs) would break as well. All other functions
would require changing from their String/List counterpart to the Text one.
Since the names clash, one would have to import qualified as, for instance,
T and call T.intersperse or whatever. It would be a non-trivial
undertaking, but certainly doable.
__Other Breaking Changes__
Are there other niggling issues that exist in the codebase that would
cause breaking changes? Perhaps it would be a good idea to get a list of
them all and see if it's worth breaking backwards compatibility to fix them
all at once? I'm a purist when it comes to code, but I would like to hear
what other people think, and just how angry they would be with this
change. I have no idea as to what xmonad and xmonad-contrib's breaking
changes policy is.
Obviously I'm not proposing this change be undertaken for 0.13 -- I was
aiming for more 0.14 or later.
Let me know what you think.
Sincerely,
Matt
_______________________________________________
xmonad mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/xmonad
Matt Walker
2015-12-22 14:01:26 UTC
Permalink
Hey there!

Having thought it over after listening to what you've said, you're all
probably right. It's just not worth breaking backwards compatibility for,
and I didn't properly consider the difficulties faced by people who are
unfamiliar with GHC and Haskell in general.

It's kind of a shame that OverloadedStrings can't be coerced into some sort
of DWIM mode, where it converts to and from strings within existing code
where needed. I've fallen into the hell of dealing with String, [Word8],
ByteString, and Text all within the same program before; it's not fun.

It's amazing that xmonad is almost 10 years old now!

I will focus my efforts on extending xmonad in the way I'd like, and not
worry so much about String. If there is a point where I need to do lots of
string processing within xmonad I can revisit this.

Thanks again for the insight.

Sincerely,

Matt
Post by Brent Yorgey
I would also note that it's not as if xmonad developers are unaware of the
existence of Text. The fact is that the first version of xmonad was
released two years prior to the first release of text (2007 vs 2009). So
at the time there was simply no alternative to String. By the time text
became stable and widely accepted, xmonad-contrib was already quite large.
I think converting all of xmonad-contrib from String to Text would be a
much larger and more tedious undertaking than you seem to think. Even if
someone put in the effort to do that, it would indeed break pretty much
every user config ever, for little benefit. By and large, xmonad users
(some of whom do not even know very much Haskell) have come to expect
extreme stability from xmonad. (I have been running essentially the same
config unchanged for many, many years.) Forcing a bunch of people with
little Haskell experience to upgrade their configs from String to Data.Text
would probably result in many of them abandoning xmonad.
Personally, I definitely prefer doing the Right Thing over preserving
backwards compatibility (witness how many breaking changes we routinely
introduce with each new release of the diagrams library). But I just don't
think this makes sense for xmonad.
-Brent
Post by Matt Walker
Hi everyone,
I noticed that xmonad and xmonad-contrib both prefer to use of String =
[Char] for their stringy data-types. This is probably a terrible idea. I
cite some sources here, then outline their arguments below.
http://www.alexeyshmalko.com/2015/haskell-string-types/
https://mail.haskell.org/pipermail/haskell-cafe/2014-June/114745.html
__String is Bad__
1) Char is a horribly inefficient representation of a character, being an
each Char takes up _two_ machine words in GHC, since it needs one to store
GC information in. See the slide in the first link for more details.
Data.Text stores the characters in compact arrays.
2) Lists are lazy, which makes their evaluation slower. You have to
thunk on each character, which is pretty silly most of the time. Normally
you want to read in at least _chunks_ of string all at once. Data.Text is
strict, but Data.Text.Lazy exists and is (as you would assume) lazy when
you need it.
The long and the short of it is that [Char] is a suboptimal choice to use
for anything except possibly short identifiers; Haskell (via GHC) is a
compiled language, and yet performs orders of magnitudes worse than even
Perl and Python on text processing when using Data.String. There is simply
no good reason to use String when Text exists.
__Alternatives__
The other alternative is ByteString. Although ByteString is a great type
for binary data, and specifically for data exchange protocols, it seems
that it would inappropriate in this situation, due to the replacement of
most (if not all) instances being actual textual data, which obviously Text
is optimized for.
__Migration Issues__
Assuming we can agree that Text > String then, the main problem to
switching would be the pain of migration, and whether this would be worth
it. I argue it wouldn't be so bad, and is worth doing on principle alone.
The LANGUAGE pragma of OverloadedStrings allows you to use String
literals as Text literals, so that wouldn't be the main problem. The main
issue is changing all the interfaces so they accept Text instead of String,
and how this would impact existing user configs, and the xmonad-contrib
archive. Every time you use ++ you would have to replace it with <>, the
Monoid infix mappend operator. I doubt many people use : to build Strings,
but in those instances those would have to be changed too. Finally,
pattern matching on Strings like (x:xs) would break as well. All other
functions would require changing from their String/List counterpart to the
Text one. Since the names clash, one would have to import qualified as,
for instance, T and call T.intersperse or whatever. It would be a
non-trivial undertaking, but certainly doable.
__Other Breaking Changes__
Are there other niggling issues that exist in the codebase that would
cause breaking changes? Perhaps it would be a good idea to get a list of
them all and see if it's worth breaking backwards compatibility to fix them
all at once? I'm a purist when it comes to code, but I would like to hear
what other people think, and just how angry they would be with this
change. I have no idea as to what xmonad and xmonad-contrib's breaking
changes policy is.
Obviously I'm not proposing this change be undertaken for 0.13 -- I was
aiming for more 0.14 or later.
Let me know what you think.
Sincerely,
Matt
_______________________________________________
xmonad mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/xmonad
_______________________________________________
xmonad mailing list
http://mail.haskell.org/cgi-bin/mailman/listinfo/xmonad
Brandon Allbery
2015-12-22 14:06:04 UTC
Permalink
Post by Matt Walker
I will focus my efforts on extending xmonad in the way I'd like, and not
worry so much about String. If there is a point where I need to do lots of
string processing within xmonad I can revisit this.
In general, if you find yourself dealing with sufficiently large strings to
justify Text, it probably doesn't belong in the window manager. You really
want it to be small and fast and do as much as possible outside the WM;
otherwise yoy'll find opening / moving / etc. windows to be sluggish at
best and possibly even cause hangs because the WM is doing something else
instead of responding to window requests.
--
brandon s allbery kf8nh sine nomine associates
***@gmail.com ***@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
Continue reading on narkive:
Loading...