utf8proc

A clean C library for processing UTF-8 Unicode data
git clone https://git.sinitax.com/juliastrings/utf8proc
Log | Files | Refs | README | LICENSE | sfeed.txt

README.md (4204B)


      1# utf8proc
      2[![CI](https://github.com/NanoComp/meep/actions/workflows/build-ci.yml/badge.svg)](https://github.com/JuliaStrings/utf8proc/actions/workflows/build-ci.yml)
      3[![AppVeyor status](https://ci.appveyor.com/api/projects/status/ivaa0v6ikxrmm5r6?svg=true)](https://ci.appveyor.com/project/StevenGJohnson/utf8proc)
      4
      5[utf8proc](http://juliastrings.github.io/utf8proc/) is a small, clean C
      6library that provides Unicode normalization, case-folding, and other
      7operations for data in the [UTF-8
      8encoding](http://en.wikipedia.org/wiki/UTF-8).  It was [initially
      9developed](http://www.public-software-group.org/utf8proc) by Jan
     10Behrens and the rest of the [Public Software
     11Group](http://www.public-software-group.org/), who deserve *nearly all
     12of the credit* for this package.  With the blessing of the Public
     13Software Group, the [Julia developers](http://julialang.org/) have
     14taken over development of utf8proc, since the original developers have
     15moved to other projects.
     16
     17(utf8proc is used for basic Unicode
     18support in the [Julia language](http://julialang.org/), and the Julia
     19developers became involved because they wanted to add Unicode 7 support and other features.)
     20
     21(The original utf8proc package also includes Ruby and PostgreSQL plug-ins.
     22We removed those from utf8proc in order to focus exclusively on the C
     23library for the time being, but plan to add them back in or release them as separate packages.)
     24
     25The utf8proc package is licensed under the
     26free/open-source [MIT "expat"
     27license](http://opensource.org/licenses/MIT) (plus certain Unicode
     28data governed by the similarly permissive [Unicode data
     29license](http://www.unicode.org/copyright.html#Exhibit1)); please see
     30the included `LICENSE.md` file for more detailed information.
     31
     32## Quick Start
     33
     34Typical users should download a [utf8proc release](http://juliastrings.github.io/utf8proc/releases/) rather than cloning directly from github.
     35
     36For compilation of the C library, run `make`.  You can also install the library and header file with `make install` (by default into `/usr/local/lib` and `/usr/local/bin`, but this can be changed by `make prefix=/some/dir`).  `make check` runs some tests, and `make clean` deletes all of the generated files.
     37
     38Alternatively, you can compile with `cmake`, e.g. by
     39```sh
     40mkdir build
     41cmake -S . -B build
     42cmake --build build
     43```
     44
     45### Using other compilers
     46The included `Makefile` supports GNU/Linux flavors and MacOS with `gcc`-like compilers; Windows users will typically use `cmake`.
     47
     48For other Unix-like systems and other compilers, you may need to pass modified settings to `make` in order to use the correct compilation flags for building shared libraries on your system.
     49
     50For HP-UX with HP's `aCC` compiler and GNU Make (installed as `gmake`), you can compile with
     51```
     52gmake CC=/opt/aCC/bin/aCC CFLAGS="+O2" PICFLAG="+z" C99FLAG="-Ae" WCFLAGS="+w" LDFLAG_SHARED="-b" SOFLAG="-Wl,+h"
     53```
     54To run `gmake install` you will need GNU coreutils for the `install` command, and you may want to pass `prefix=/opt libdir=/opt/lib/hpux32` or similar to change the installation location.
     55
     56## General Information
     57
     58The C library is found in this directory after successful compilation
     59and is named `libutf8proc.a` (for the static library) and
     60`libutf8proc.so` (for the dynamic library).
     61
     62The Unicode version supported is 15.1.0.
     63
     64For Unicode normalizations, the following options are used:
     65
     66* Normalization Form C:  `STABLE`, `COMPOSE`
     67* Normalization Form D:  `STABLE`, `DECOMPOSE`
     68* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT`
     69* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT`
     70
     71## C Library
     72
     73The documentation for the C library is found in the `utf8proc.h` header file.
     74`utf8proc_map` is function you will most likely be using for mapping UTF-8
     75strings, unless you want to allocate memory yourself.
     76
     77## To Do
     78
     79See the Github [issues list](https://github.com/JuliaLang/utf8proc/issues).
     80
     81## Contact
     82
     83Bug reports, feature requests, and other queries can be filed at
     84the [utf8proc issues page on Github](https://github.com/JuliaLang/utf8proc/issues).
     85
     86## See also
     87
     88An independent Lua translation of this library, [lua-mojibake](https://github.com/differentprogramming/lua-mojibake), is also available.