Jam less (jless)
Less is one of the best text viewer. It is a successor of
more. It allows you to scroll forward,
scroll backward, search, etc through multiple text files.
However, it doesn't support multi bytes characters.
So, I made a patch to enhance it in order to view texts
with multi character sets using using ISO 2022 code extention
techniques. And I also support some code conversion
among Japanese encoding schemes, JIS X 0208, JIS X 0213,
SJIS, and UJIS.
Overview
The less is one of the best text viewer.
I enhanced it for reading texts using ISO 2022 code extention
techniques and using multiple Japanese encoding schemes.
ISO 2022 describes techniques to represents encoding
techniques that allow making a text from more than 100
character sets. It contains ISO 646 IRV, ISO 646 UK,
ISO 646 US, ISO 646 Swedish, ISO 646 German, ISO 8859-1,
ISO 8859-2, ISO 8859-7 Greek, ISO 8859-6 Arabic,
ISO 8859-8 Hebrew, GB 2312-80 Chinese, JIS X 0208:1997
Japanese, KS C 5601-1987 Korean, CNS 11643, etc.
By supporting ISO 2022 standards, jless now
has ability to show all of them.
However, what you can see depends on your terminal or
terminal emulator. If your terminal supports ISO 2022
and has all character sets, you can see all. If your
terminal has only few character sets, you can see only
what your terminal has.
Therefore, jless also contains a mechanism for code
conversion in order to reduce the number of character
sets that jless needs to show texts.
For example, it can convert JIS C 6226-1978,
JIS X 0208:1997, SJIS, and UJIS into JIS X 0213:2000.
Moreover, It is possible to implement a partial conversion from
GB into JIS X 0213:2000, but it is not yet implemented.
Here are README about my enhancement and
its Japanese version.
If you are interested about ISO 2022, please look at my character encoding page. It currently
supports these character
sets
Note
Version Number
I would like to explain about the name and version numbfer.
Original less is called less-XXX. The XXX is its version number.
I called my patch as less-XXX-isoYYY. The YYY is my patch's version number.
Copyright and future of this patch
I, sometimes, submitted my patches and asked Mark to merge mine
to the original tree. However, these are not merged. He said
he may merge in future. I also think this code is too compilcated
to understand without ISO 2022 knowledge, so I think this
will not be merged.
On the other hand, there is a copyright issue. The copyright of
an original less was BSD-style licnese. Then, it is changed to
GPL once. Currently, it is using both, GPL and less license.
I personally don't want to publish my libraries under the GPL.
So, my patch is under only BSD-style less license.
News
- Released iso262 patch for less-382 at 24 Feb. 2006.
- Released iso261 patch for less-382 at 24 Feb. 2006.
- Released iso260 patch for less-382 at 18 Feb. 2006.
- Released iso259 patch for less-382 at 7 Feb. 2006.
- Released iso258 patch for less-382 at 4 Sep. 2005.
- Released iso254 patch for less-358 at 6 Dec. 2000.
Download
Latest Version
Old Versions
Other contibutions.
- iso233 3/10/98
- Fixed typo and made multi.h.
- iso234 3/12/98
- Removed prewind_multi and pdone_multi because it depend on less.
Add init_multi and clear_multi instead of them.
- iso235 3/13/98
- Add unify.c for chcmp_cs function.
- iso236 3/14/98
- Fixed MSB_ENABLE bugs.
- iso237 3/16/98
- Add unification among JIS X 0208, ASCII, Cyrillic and Greek.
- iso238 3/17/98
- Add NULLCS to represent a terminator.
Changed a character set for control characters to WRONGCS.
Add chunify_cs and chconvert_cs as external function.
- iso239 3/20/98
- Fixed a bug in match() and add assertion in chunify_cs().
- iso240 3/25/98
- Corrected all cmdbuf and cmdcs buffers' handling.
Fixed a control character handling bug.
Changed to remove padded codes from search pattern.
- iso241 4/2/98
- Fixed small bugs in search.c.
- iso242 5/18/98
- Fixed a buffering problem of search.
- iso243 7/1/98
- Add elimination of wrong characters for JIS C 6226-1978,
JIS X 0208-1983, and JIS X 0208:1990.
- iso244 7/2/98
- Add elimination of wrong characters for SJIS and UJIS.
- iso245 7/2/98
- Fix a bug about elimination for SJIS.
- iso246 8/8/98
- Add one locale for Win32, eliminate all MSB_ENABLE stuff
from unify.c, and fix eliminating table for JIS C 6226-1978.
- iso247 8/8/98
- Add -W option. And change the point of putting a mark. Now
multi.c call checking function, then mark wrong characters.
- iso248 8/12/98
- Fix a problem of outputting WRONGCS. Add checking table
for JIS X0212.1990.
- iso249 10/29/00
- Joined with less-358. Fixed some bugs caused by join.
- iso250 11/21/00
- Support JIS X 0213:2000. Added support of cygwin.
Thanks to nayuta-san.
- iso251 11/22/00
- Support SJIS and UJIS using JIS X 0213:2000.
- iso252 11/24/00
- Fixed a problem to output JIS X 0212:1990 using jis style.
- iso253 12/2/00
- Fixed a problem to output SJIS. Thanks to nayuta-san.
Fixed assertion problem in search.c. Thanks to SAKAKI
Kiyotake, Tanaka Akira, and Yuichi SATO.
- iso254 12/5/00
- Fixed a problem to output JIS X 0213:2000 plane 2 into SJIS.
Thanks to Shinya Hanataka.
- iso255 8/30/05
- Joined with less-378.
- iso256 8/30/05
- Joined with less-381.
- iso257 9/4/05
- Fixed problems caused by merge.
- Changed buffering mechanism to track exact POSITION through
code set conversion. This helped hiliting routine and improved
less running speed.
- Changed to parse text from the beginning of physical line when
less jumps into the middle of text. This fixed major problems
on stateful text like ISO-2022.
- Fixed JIS X 0213:2000 related problems. Thanks to Takeshi
WATANABE. Also, fixed a problem reported by him. Less will
not split one wrong multi-byte character into different lines
even if it is not fit in first line. Less moves entire text to
second line.
- iso258 9/4/05
- Joined with less-382.
- iso259 9/6/05
- Changed an algorithm to detect the gap of parsing input stream.
This fixed a problem on long JIS/English text.
- Fixed '\r' problem.
- iso260 9/19/05
- Changed the algorithm handling input and output character sets.
Now jless use two variables, one represents supporting
character sets for input stream, and the other represents
encoding scheme for output stream.
- Changed to support JISX0213:2004.
- iso261 2/24/06
- Changed put_wrongmark function to make it work with new iso260
buffering semantics. And applied a patch provied by Takuji.
Thanks to Takuji.
- iso262 2/24/06
- Removed POSITION variable from member variables of M_BUFDATA.
It was added to make multi-byte character buffering function
work better with less. However, it degraded abstraction level
of data structure (multi.h). This time, add POSITION* as an
additional argument of few functions and keep data structure
as simple as possible.
- This modification make regex_cs-lwp9k be able to compile.
Mailing List
Subscribe to jless ML in English
Subscribe to jless ML in Japanese
Jam's welcome page --
Jam@pobox.com --
last modified
February 24 2006 --
67102