utf8proc

A clean C library for processing UTF-8 Unicode data
git clone https://git.sinitax.com/juliastrings/utf8proc
Log | Files | Refs | README | LICENSE | sfeed.txt

commit 1c84d08b01c94278218085a57f5c83113455529b
parent df71da45dfbdf68bcc6fd656d1260d609c728ad7
Author: Steven G. Johnson <stevenj@alum.mit.edu>
Date:   Sun,  7 Dec 2014 21:29:34 -0500

README updates

Diffstat:
MLICENSE.md | 2+-
MREADME.md | 36++++++++++++++++++------------------
2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/LICENSE.md b/LICENSE.md @@ -7,7 +7,7 @@ whose copyright and license statements are reproduced below, all new work on the libmojibake library is licensed under the [MIT "expat" license](http://opensource.org/licenses/MIT): -*Copyright &copy; 2014 by Steven G. Johnson.* +*Copyright &copy; 2014 by Steven G. Johnson, Jiahao Chen, Tony Kelman, and other contributors listed in the git history.* Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), diff --git a/README.md b/README.md @@ -1,28 +1,30 @@ # libmojibake [![Build Status](https://travis-ci.org/JuliaLang/libmojibake.png)](https://travis-ci.org/JuliaLang/libmojibake) -[libmojibake](https://github.com/JuliaLang/libmojibake) is -a lightly updated fork of the [utf8proc +[libmojibake](https://github.com/JuliaLang/libmojibake) is a +development fork of the [utf8proc library](http://www.public-software-group.org/utf8proc) from Jan Behrens and the rest of the [Public Software Group](http://www.public-software-group.org/), who deserve *nearly all of the credit* for this package: a small, clean C library that provides Unicode normalization, case-folding, and other operations for -data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8). +data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8). The +main difference from utf8proc is that the Unicode support in +libmojibake is more up-to-date (Unicode 7 vs. Unicode 5). -The reason for this fork is that `utf8proc` is used for basic Unicode +The reason for this fork is that utf8proc is used for basic Unicode support in the [Julia language](http://julialang.org/) and the Julia developers wanted Unicode 7 support and other features, but the Public -Software Group is currently occupied with other projects. We hope -that our fork can be merged back into the mainline `utf8proc` package -before too long. +Software Group is currently occupied with other projects. As we implement +and test new features in libmojibake, we are contributing patches back +to utf8proc with the hope that they can be merged upstream. -(The original `utf8proc` package also includes Ruby and PostgreSQL plug-ins. -We removed those from `libmojibake` in order to focus exclusively on the C +(The original utf8proc package also includes Ruby and PostgreSQL plug-ins. +We removed those from libmojibake in order to focus exclusively on the C library for the time being. We will strive to keep API changes to a minimum, -so `libmojibake` should still be usable with the old plug-in code.) +so libmojibake should still be usable with the old plug-in code.) -Like `utf8proc`, the `libmojibake` package is licensed under the +Like utf8proc, the libmojibake package is licensed under the free/open-source [MIT "expat" license](http://opensource.org/licenses/MIT) (plus certain Unicode data governed by the similarly permissive [Unicode data @@ -39,9 +41,9 @@ The C library is found in this directory after successful compilation and is named `libmojibake.a` (for the static library) and `libmojibake.so` (for the dynamic library). -The Unicode version being supported is 5.0.0. -*Note:* Version 4.1.0 of Unicode Standard Annex #29 was used, as -version 5.0.0 had not been available at the time of implementation. +The Unicode version being supported is 7.0.0. (Grapheme segmentation +is currently based on version 4.1.0 of Unicode Standard Annex #29, but +we hope to update this soon.) For Unicode normalizations, the following options are used: @@ -58,12 +60,10 @@ strings, unless you want to allocate memory yourself. ## To Do ## -* detect stable code points and process segments independently in order to save memory -* do a quick check before normalizing strings to optimize speed -* support stream processing +See the Github [issues list](https://github.com/JuliaLang/libmojibake/issues). ## Contact ## Bug reports, feature requests, and other queries can be filed at -the [libmojibake page on Github](https://github.com/JuliaLang/libmojibake/issues). +the [libmojibake issues page on Github](https://github.com/JuliaLang/libmojibake/issues).