utf8proc

A clean C library for processing UTF-8 Unicode data
git clone https://git.sinitax.com/juliastrings/utf8proc
Log | Files | Refs | README | LICENSE | sfeed.txt

commit 0d7224a6d8a77e5eebf5e18bded742490f3b20fd
parent c0f2b512a055c667cb751ef4526ea744f2428826
Author: Steven G. Johnson <stevenj@mit.edu>
Date:   Tue, 15 Jul 2014 16:04:36 -0400

markdown and other cosmetic updates

Diffstat:
A.gitignore | 10++++++++++
DLICENSE | 64----------------------------------------------------------------
ALICENSE.md | 93+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
MMakefile | 41+++--------------------------------------
DREADME | 63---------------------------------------------------------------
AREADME.md | 68++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 174 insertions(+), 165 deletions(-)

diff --git a/.gitignore b/.gitignore @@ -0,0 +1,10 @@ +*.tar.gz +*.exe +*.dll +*.do +*.o +*.so +*.a +*.dll +*.dylib +*.dSYM diff --git a/LICENSE b/LICENSE @@ -1,64 +0,0 @@ - -Copyright (c) 2009, 2013 Public Software Group e. V., Berlin, Germany - -Permission is hereby granted, free of charge, to any person obtaining a -copy of this software and associated documentation files (the "Software"), -to deal in the Software without restriction, including without limitation -the rights to use, copy, modify, merge, publish, distribute, sublicense, -and/or sell copies of the Software, and to permit persons to whom the -Software is furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING -FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER -DEALINGS IN THE SOFTWARE. - - -This software distribution contains derived data from a modified version of -the Unicode data files. The following license applies to that data: - -COPYRIGHT AND PERMISSION NOTICE - -Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed -under the Terms of Use in http://www.unicode.org/copyright.html. - -Permission is hereby granted, free of charge, to any person obtaining a -copy of the Unicode data files and any associated documentation (the "Data -Files") or Unicode software and any associated documentation (the -"Software") to deal in the Data Files or Software without restriction, -including without limitation the rights to use, copy, modify, merge, -publish, distribute, and/or sell copies of the Data Files or Software, and -to permit persons to whom the Data Files or Software are furnished to do -so, provided that (a) the above copyright notice(s) and this permission -notice appear with all copies of the Data Files or Software, (b) both the -above copyright notice(s) and this permission notice appear in associated -documentation, and (c) there is clear notice in each modified Data File or -in the Software as well as in the documentation associated with the Data -File(s) or Software that the data or software has been modified. - -THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY -KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF -MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF -THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS -INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR -CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF -USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER -TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR -PERFORMANCE OF THE DATA FILES OR SOFTWARE. - -Except as contained in this notice, the name of a copyright holder shall -not be used in advertising or otherwise to promote the sale, use or other -dealings in these Data Files or Software without prior written -authorization of the copyright holder. - - -Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be -registered in some jurisdictions. All other trademarks and registered -trademarks mentioned herein are the property of their respective owners. - diff --git a/LICENSE.md b/LICENSE.md @@ -0,0 +1,93 @@ +== libutf8proc license == + +**libutf8proc** is a lightly updated version of the **utf8proc** +library by Jan Behrens and the rest of the Public Software Group, who +deserve nearly all of the credit for this library. Like utf8proc, +whose copyright and license statements are reproduced below, all new +work on the libutf8proc library is licensed under the [MIT "expat" +license](http://opensource.org/licenses/MIT): + +*Copyright &copy; 2014 by Steven G. Johnson.* + +Permission is hereby granted, free of charge, to any person obtaining a +copy of this software and associated documentation files (the "Software"), +to deal in the Software without restriction, including without limitation +the rights to use, copy, modify, merge, publish, distribute, sublicense, +and/or sell copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. + +== Original utf8proc license == + +*Copyright (c) 2009, 2013 Public Software Group e. V., Berlin, Germany* + +Permission is hereby granted, free of charge, to any person obtaining a +copy of this software and associated documentation files (the "Software"), +to deal in the Software without restriction, including without limitation +the rights to use, copy, modify, merge, publish, distribute, sublicense, +and/or sell copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. + +== Unicode data license == + +This software distribution contains derived data from a modified version of +the Unicode data files. The following license applies to that data: + +**COPYRIGHT AND PERMISSION NOTICE** + +*Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed +under the Terms of Use in http://www.unicode.org/copyright.html.* + +Permission is hereby granted, free of charge, to any person obtaining a +copy of the Unicode data files and any associated documentation (the "Data +Files") or Unicode software and any associated documentation (the +"Software") to deal in the Data Files or Software without restriction, +including without limitation the rights to use, copy, modify, merge, +publish, distribute, and/or sell copies of the Data Files or Software, and +to permit persons to whom the Data Files or Software are furnished to do +so, provided that (a) the above copyright notice(s) and this permission +notice appear with all copies of the Data Files or Software, (b) both the +above copyright notice(s) and this permission notice appear in associated +documentation, and (c) there is clear notice in each modified Data File or +in the Software as well as in the documentation associated with the Data +File(s) or Software that the data or software has been modified. + +THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY +KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF +THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS +INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR +CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF +USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER +TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR +PERFORMANCE OF THE DATA FILES OR SOFTWARE. + +Except as contained in this notice, the name of a copyright holder shall +not be used in advertising or otherwise to promote the sale, use or other +dealings in these Data Files or Software without prior written +authorization of the copyright holder. + +Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be +registered in some jurisdictions. All other trademarks and registered +trademarks mentioned herein are the property of their respective owners. diff --git a/Makefile b/Makefile @@ -9,20 +9,12 @@ cc = $(CC) $(cflags) # meta targets -c-library: libutf8proc.a libutf8proc.so - -ruby-library: ruby/utf8proc_native.so - -pgsql-library: pgsql/utf8proc_pgsql.so +all: c-library -all: c-library ruby-library ruby-gem pgsql-library +c-library: libutf8proc.a libutf8proc.so -clean:: +clean: rm -f utf8proc.o libutf8proc.a libutf8proc.so - cd ruby/ && test -e Makefile && (make clean && rm -f Makefile) || true - rm -Rf ruby/gem/lib ruby/gem/ext - rm -f ruby/gem/utf8proc-*.gem - cd pgsql/ && make clean # real targets @@ -39,30 +31,3 @@ libutf8proc.so: utf8proc.o libutf8proc.dylib: utf8proc.o $(cc) -dynamiclib -o $@ $^ -install_name $(libdir)/$@ - -ruby/Makefile: ruby/extconf.rb - cd ruby && ruby extconf.rb - -ruby/utf8proc_native.so: utf8proc.h utf8proc.c utf8proc_data.c \ - ruby/utf8proc_native.c ruby/Makefile - cd ruby && make - -ruby/gem/lib/utf8proc.rb: ruby/utf8proc.rb - test -e ruby/gem/lib || mkdir ruby/gem/lib - cp ruby/utf8proc.rb ruby/gem/lib/ - -ruby/gem/ext/extconf.rb: ruby/extconf.rb - test -e ruby/gem/ext || mkdir ruby/gem/ext - cp ruby/extconf.rb ruby/gem/ext/ - -ruby/gem/ext/utf8proc_native.c: utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c - test -e ruby/gem/ext || mkdir ruby/gem/ext - cat utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c | grep -v '#include "utf8proc.h"' | grep -v '#include "utf8proc_data.c"' | grep -v '#include "../utf8proc.c"' > ruby/gem/ext/utf8proc_native.c - -ruby-gem:: ruby/gem/lib/utf8proc.rb ruby/gem/ext/extconf.rb ruby/gem/ext/utf8proc_native.c - cd ruby/gem && gem build utf8proc.gemspec - -pgsql/utf8proc_pgsql.so: utf8proc.h utf8proc.c utf8proc_data.c \ - pgsql/utf8proc_pgsql.c - cd pgsql && make - diff --git a/README b/README @@ -1,63 +0,0 @@ - -Please read the LICENSE file, which is shipping with this software. - - -*** QUICK START *** - -For compilation of the C library call "make c-library", for compilation of -the ruby library call "make ruby-library" and for compilation of the -PostgreSQL extension call "make pgsql-library". - -For ruby you can also create a gem-file by calling "make ruby-gem". - -"make all" can be used to build everything, but both ruby and PostgreSQL -installations are required in this case. - - -*** GENERAL INFORMATION *** - -The C library is found in this directory after successful compilation and -is named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of -the files "utf8proc.rb" and "utf8proc_native.so", which are found in the -subdirectory "ruby/". If you chose to create a gem-file it is placed in the -"ruby/gem" directory. The PostgreSQL extension is named "utf8proc_pgsql.so" -and resides in the "pgsql/" directory. - -Both the ruby library and the PostgreSQL extension are built as stand-alone -libraries and are therefore not dependent the dynamic version of the -C library files, but this behaviour might change in future releases. - -The Unicode version being supported is 5.0.0. -Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as - version 5.0.0 had not been available at the time of implementation. - -For Unicode normalizations, the following options have to be used: -Normalization Form C: STABLE, COMPOSE -Normalization Form D: STABLE, DECOMPOSE -Normalization Form KC: STABLE, COMPOSE, COMPAT -Normalization Form KD: STABLE, DECOMPOSE, COMPAT - - -*** C LIBRARY *** - -The documentation for the C library is found in the utf8proc.h header file. -"utf8proc_map" is most likely function you will be using for mapping UTF-8 -strings, unless you want to allocate memory yourself. - - -*** TODO *** - -- detect stable code points and process segments independently in order to - save memory -- do a quick check before normalizing strings to optimize speed -- support stream processing - - -*** CONTACT *** - -If you find any bugs or experience difficulties in compiling this software, -please contact us: - -Project page: http://www.public-software-group.org/utf8proc - - diff --git a/README.md b/README.md @@ -0,0 +1,68 @@ +== libutf8proc == + +The [libutf8proc package](https://github.com/JuliaLang/libutf8proc) is +a lightly updated fork of the [utf8proc +library](http://www.public-software-group.org/utf8proc) from Jan +Behrens and the rest of the [Public Software +Group](http://www.public-software-group.org/), who deserve *nearly all +of the credit* for this package: a small, clean C library that +provides Unicode normalization, case-folding, and other operations for +data in the [UTF-8 encoding](http://en.wikipedia.org/wiki/UTF-8). + +The reason for this fork is that utf8proc is used for basic Unicode +support in the [Julia language](http://julialang.org/) and the Julia +developers wanted Unicode 7 support and other features, but the +Public Software Group currently does not seem to have the resources +necessary to update utf8proc. We hope that the fork can be merged +back into the mainline utf8proc package before too long. + +(The original utf8proc package also includes Ruby and PostgreSQL plug-ins. +We removed those from libutf8proc in order to focus exclusively on the C +library for the time being. We will strive to keep API changes to a minimum, +so libutf8proc should still be usable with the old plug-in code.) + +Like utf8proc, the libutf8proc package is licensed under the +free/open-source [MIT "expat" +license](http://opensource.org/licenses/MIT) (plus certain Unicode +data governed by the similarly permissive [Unicode data +license](http://www.unicode.org/copyright.html#Exhibit1)); please see +the included `LICENSE.md` file for more detailed information. + +=== Quick Start === + +For compilation of the C library run `make`. + +=== General Information === + +The C library is found in this directory after successful compilation +and is named `libutf8proc.a` (for the static library) and +`libutf8proc.so` (for the dynamic library). + +The Unicode version being supported is 5.0.0. +*Note:* Version 4.1.0 of Unicode Standard Annex #29 was used, as +version 5.0.0 had not been available at the time of implementation. + +For Unicode normalizations, the following options are used: + +* Normalization Form C: `STABLE`, COMPOSE` +* Normalization Form D: `STABLE`, `DECOMPOSE` +* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT` +* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT` + +=== C Library === + +The documentation for the C library is found in the `utf8proc.h` header file. +`utf8proc_map` is function you will most likely be using for mapping UTF-8 +strings, unless you want to allocate memory yourself. + +=== To Do === + +* detect stable code points and process segments independently in order to save memory +* do a quick check before normalizing strings to optimize speed +* support stream processing + +=== Contact === + +Bug reports, feature requests, and other queries can be filed at +the [libutf8proc page on Github](https://github.com/JuliaLang/libutf8proc). +