utf8proc

A clean C library for processing UTF-8 Unicode data
git clone https://git.sinitax.com/juliastrings/utf8proc
Log | Files | Refs | README | LICENSE | sfeed.txt

NEWS.md (14365B)


      1# utf8proc release history #
      2
      3## Version 2.9.0 ##
      4
      52023-10-20
      6
      7 - Unicode 15.1 support ([#253]).
      8
      9## Version 2.8.0 ##
     10
     112022-10-30
     12
     13 - Unicode 15 support ([#247]).
     14
     15## Version 2.7.0 ##
     16
     172021-12-16
     18
     19 - Unicode 14 support ([#233]).
     20
     21 - Support `GNUInstallDirs` in CMake build ([#159]).
     22
     23 - `cmake` build now installs `pkg-config` file ([#224]).
     24
     25 - Various build and portability improvements.
     26
     27## Version 2.6.1 ##
     28
     292020-12-15
     30
     31 - Bugfix in `utf8proc_grapheme_break_stateful` for `NULL` state argument, which
     32   also broke `utf8proc_grapheme_break`.
     33
     34## Version 2.6 ##
     35
     362020-11-23
     37
     38 - New `utf8proc_islower` and `utf8proc_isupper` functions ([#196]).
     39
     40 - Bugfix for manual calls to `grapheme_break_extended` for initial characters ([#205]).
     41
     42 - Various build and portability improvements.
     43
     44## Version 2.5 ##
     45
     462019-03-27
     47
     48- Unicode 13 support ([#179]).
     49
     50- No longer report zero width for category Sk ([#167]).
     51
     52- `cmake` support improvements ([#173]).
     53
     54## Version 2.4 ##
     55
     562019-05-10
     57
     58- Unicode 12.1 support ([#156]).
     59
     60- New `-DUTF8PROC_INSTALL=No` option for `cmake` builds to disable installation ([#152]).
     61
     62- Better `make` support for HP-UX ([#154]).
     63
     64- Fixed incorrect `UTF8PROC_VERSION_MINOR` version number in header and bumped shared-library version.
     65
     66## Version 2.3 ##
     67
     682019-03-30
     69
     70- Unicode 12 support ([#148]).
     71
     72- New function `utf8proc_unicode_version` to return the supported Unicode version ([#151]).
     73
     74- Simpler character-width computation that no longer uses GNU Unifont metrics: East-Asian wide
     75  characters have width 2, and all other printable characters have width 1 ([#150]).
     76
     77- Fix `CHARBOUND` option for `utf8proc_map` to preserve U+FFFE and U+FFFF non-characters ([#149]).
     78
     79- Various build-system improvements ([#141], [#142], [#147]).
     80
     81## Version 2.2 ##
     82
     832018-07-24
     84
     85- Unicode 11 support ([#132] and [#140]).
     86
     87- `utf8proc_NFKC_Casefold` convenience function for `NFKC_Casefold`
     88  normalization ([#133]).
     89
     90- `UTF8PROC_STRIPNA` option to strip unassigned codepoints ([#133]).
     91
     92- Support building static libraries on Windows (callers need to
     93  `#define UTF8PROC_STATIC`) ([#123]).
     94
     95- `cmake` fix to avoid defining `UTF8PROC_EXPORTS` globally ([#121]).
     96
     97- `toupper` of ß (U+00df) now yields ẞ (U+1E9E) ([#134]), similar to musl;
     98  case-folding still yields the standard "ss" mapping.
     99
    100- `utf8proc_charwidth` now returns `1` for U+00AD (soft hyphen) and
    101  for unassigned/PUA codepoints ([#135]).
    102
    103## Version 2.1.1 ##
    104
    1052018-04-27
    106
    107- Fixed composition bug ([#128]).
    108
    109- Minor build fixes ([#94], [#99], [#113], [#125]).
    110
    111## Version 2.1 ##
    112
    1132016-12-26:
    114
    115- New functions `utf8proc_map_custom` and `utf8proc_decompose_custom`
    116  to allow user-supplied transformations of codepoints, in conjunction
    117  with other transformations ([#89]).
    118
    119- New function `utf8proc_normalize_utf32` to apply normalizations
    120  directly to UTF-32 data (not just UTF-8) ([#88]).
    121
    122- Fixed stack overflow that could occur due to incorrect definition
    123  of `UINT16_MAX` with some compilers ([#84]).
    124
    125- Fixed conflict with `stdbool.h` in Visual Studio ([#90]).
    126
    127- Updated font metrics to use Unifont 9.0.04.
    128
    129## Version 2.0.2 ##
    130
    1312016-07-27:
    132
    133- Move `-Wmissing-prototypes` warning flag from `Makefile` to `.travis.yml`
    134  since MSVC does not understand this flag and it is occasionally useful to
    135  build using MSVC through the `Makefile` ([#79]).
    136
    137- Use a different variable name for a nested loop in `bench/bench.c`, and
    138  declare it in a C89 way rather than inside the `for` to avoid "error:
    139  'for' loop initial declarations are only allowed in C99 mode" ([#80]).
    140
    141## Version 2.0.1 ##
    142
    1432016-07-13:
    144
    145- Bug fix in `utf8proc_grapheme_break_stateful` ([#77]).
    146
    147- Tests now use versioned Unicode files, so they will no longer
    148  break when a new version of Unicode is released ([#78]).
    149
    150## Version 2.0 ##
    151
    1522016-07-13:
    153
    154- Updated for Unicode 9.0 ([#70]).
    155
    156- New `utf8proc_grapheme_break_stateful` to handle the complicated
    157  grapheme-breaking rules in Unicode 9.  The old `utf8proc_grapheme_break`
    158  is still provided, but may incorrectly identify grapheme breaks
    159  in some Unicode-9 sequences.
    160
    161- Smaller Unicode tables ([#62], [#68]).  This required changes
    162  in the `utf8proc_property_t` structure, which breaks backward
    163  compatibility if you access this `struct` directly.  The
    164  functions in the API remain backward-compatible, however.
    165
    166- Buffer overrun fix ([#66]).
    167
    168## Version 1.3.1 ##
    169
    1702015-11-02:
    171
    172- Do not export symbol for internal function `unsafe_encode_char()` ([#55]).
    173
    174- Install relative symbolic links for shared libraries ([#58]).
    175
    176- Enable and fix compiler warnings ([#55], [#58]).
    177
    178- Add missing files to `make clean` ([#58]).
    179
    180## Version 1.3 ##
    181
    1822015-07-06:
    183
    184- Updated for Unicode 8.0 ([#45]).
    185
    186- New `utf8proc_tolower` and `utf8proc_toupper` functions, portable
    187  replacements for `towlower` and `towupper` in the C library ([#40]).
    188
    189- Don't treat Unicode "non-characters" as invalid, and improved
    190  validity checking in general ([#35]).
    191
    192- Prefix all typedefs with `utf8proc_`, e.g. `utf8proc_int32_t`,
    193  to avoid collisions with other libraries ([#32]).
    194
    195- Rename `DLLEXPORT` to `UTF8PROC_DLLEXPORT` to prevent collisions.
    196
    197- Fix build breakage in the benchmark routines.
    198
    199- More fine-grained Makefile variables (`PICFLAG` etcetera), so that
    200  compilation flags can be selectively overridden, and in particular
    201  so that `CFLAGS` can be changed without accidentally eliminating
    202  necessary flags like `-fPIC` and `-std=c99` ([#43]).
    203
    204- Updated character-width tables based on Unifont 8.0.01 ([#51]) and
    205  the Unicode 8 character categories ([#47]).
    206
    207## Version 1.2 ##
    208
    2092015-03-28:
    210
    211- Updated for Unicode 7.0 ([#6]).
    212
    213- New function `utf8proc_grapheme_break(c1,c2)` that returns whether
    214  there is a grapheme break between `c1` and `c2` ([#20]).
    215
    216- New function `utf8proc_charwidth(c)` that returns the number of
    217  column-positions that should be required for `c`; essentially a
    218  portable replacment for `wcwidth(c)` ([#27]).
    219
    220- New function `utf8proc_category(c)` that returns the Unicode
    221  category of `c` (as one of the constants `UTF8PROC_CATEGORY_xx`).
    222  Also, a function `utf8proc_category_string(c)` that returns the Unicode
    223  category of `c` as a two-character string.
    224
    225- `cmake` script `CMakeLists.txt`, in addition to `Makefile`, for
    226  easier compilation on Windows ([#28]).
    227
    228- Various `Makefile` improvements: a `make check` target to perform
    229  tests ([#13]), `make install`, a rule to automate updating the Unicode
    230  tables, etcetera.
    231
    232- The shared library is now versioned (e.g. has a soname on GNU/Linux) ([#24]).
    233
    234- C++/MSVC compatibility ([#17]).
    235
    236- Most `#defined` constants are now `enums` ([#29]).
    237
    238- New preprocessor constants `UTF8PROC_VERSION_MAJOR`,
    239  `UTF8PROC_VERSION_MINOR`, and `UTF8PROC_VERSION_PATCH` for compile-time
    240  detection of the API version.
    241
    242- Doxygen-formatted documentation ([#29]).
    243
    244- The Ruby and PostgreSQL plugins have been removed due to lack of testing ([#22]).
    245
    246## Version 1.1.6 ##
    247
    2482013-11-27:
    249
    250- PostgreSQL 9.2 and 9.3 compatibility (lowercase `c` language name)
    251
    252## Version 1.1.5 ##
    253
    2542009-08-20:
    255
    256- Use `RSTRING_PTR()` and `RSTRING_LEN()` instead of `RSTRING()->ptr` and
    257  `RSTRING()->len` for ruby1.9 compatibility (and `#define` them, if not
    258  existent)
    259
    2602009-10-02:
    261
    262- Patches for compatibility with Microsoft Visual Studio
    263
    2642009-10-08:
    265
    266- Fixes to make utf8proc usable in C++ programs
    267
    2682009-10-16:
    269
    270## Version 1.1.4 ##
    271
    2722009-06-14:
    273
    274- replaced C++ style comments for compatibility reasons
    275- added typecasts to suppress compiler warnings
    276- removed redundant source files for ruby-gemfile generation
    277
    2782009-08-19:
    279
    280- Changed copyright notice for Public Software Group e. V.
    281- Minor changes in the `README` file
    282
    283## Version 1.1.3 ##
    284
    2852008-10-04:
    286
    287- Added a function `utf8proc_version` returning a string containing the version
    288  number of the library.
    289- Included a target `libutf8proc.dylib` for MacOSX.
    290
    2912009-05-01:
    292- PostgreSQL 8.3 compatibility (use of `SET_VARSIZE` macro)
    293
    294## Version 1.1.2 ##
    295
    2962007-07-25:
    297
    298- Fixed a serious bug in the data file generator, which caused characters
    299  being treated incorrectly, when stripping default ignorable characters or
    300  calculating grapheme cluster boundaries.
    301
    302## Version 1.1.1 ##
    303
    3042007-06-25:
    305
    306- Added a new PostgreSQL function `unistrip`, which behaves like `unifold`,
    307  but also removes all character marks (e.g. accents).
    308
    3092007-07-22:
    310
    311- Changed license from BSD to MIT style.
    312- Added a new function `utf8proc_codepoint_valid` to the C library.
    313- Changed compiler flags in `Makefile` from `-g -O0` to `-O2`
    314- The ruby script, which was used to build the `utf8proc_data.c` file, is now
    315  included in the distribution.
    316
    317## Version 1.0.3 ##
    318
    3192007-03-16:
    320
    321- Fixed a bug in the ruby library, which caused an error, when splitting an
    322  empty string at grapheme cluster boundaries (method `String#utf8chars`).
    323
    324## Version 1.0.2 ##
    325
    3262006-09-21:
    327
    328- included a check in `Integer#utf8`, which raises an exception, if the given
    329  code-point is invalid because of being too high (this was missing yet)
    330
    3312006-12-26:
    332
    333- added support for PostgreSQL version 8.2
    334
    335## Version 1.0.1 ##
    336
    3372006-09-20:
    338
    339- included a gem file for the ruby version of the library
    340
    341Release of version 1.0.1
    342
    343## Version 1.0 ##
    344
    3452006-09-17:
    346
    347- added the `LUMP` option, which lumps certain characters together (see `lump.md`) (also used for the PostgreSQL `unifold` function)
    348- added the `STRIPMARK` option, which strips marking characters (or marks of composed characters)
    349- deprecated ruby method `String#char_ary` in favour of `String#utf8chars`
    350
    351## Version 0.3 ##
    352
    3532006-07-18:
    354
    355- changed normalization from NFC to NFKC for postgresql unifold function
    356
    3572006-08-04:
    358
    359- added support to mark the beginning of a grapheme cluster with 0xFF (option: `CHARBOUND`)
    360- added the ruby method `String#chars`, which is returning an array of UTF-8 encoded grapheme clusters
    361- added `NLF2LF` transformation in postgresql `unifold` function
    362- added the `DECOMPOSE` option, if you neither use `COMPOSE` or `DECOMPOSE`, no normalization will be performed (different from previous versions)
    363- using integer constants rather than C-strings for character properties
    364- fixed (hopefully) a problem with the ruby library on Mac OS X, which occurred when compiler optimization was switched on
    365
    366## Version 0.2 ##
    367
    3682006-06-05:
    369
    370- changed behaviour of PostgreSQL function to return NULL in case of invalid input, rather than raising an exceptional condition
    371- improved efficiency of PostgreSQL function (no transformation to C string is done)
    372
    3732006-06-20:
    374
    375- added -fpic compiler flag in Makefile
    376- fixed bug in the C code for the ruby library (usage of non-existent function)
    377
    378## Version 0.1 ##
    379
    3802006-06-02: initial release of version 0.1
    381
    382<!--- generated by NEWS-update.jl: -->
    383
    384[#6]: https://github.com/JuliaStrings/utf8proc/issues/6
    385[#13]: https://github.com/JuliaStrings/utf8proc/issues/13
    386[#17]: https://github.com/JuliaStrings/utf8proc/issues/17
    387[#20]: https://github.com/JuliaStrings/utf8proc/issues/20
    388[#22]: https://github.com/JuliaStrings/utf8proc/issues/22
    389[#24]: https://github.com/JuliaStrings/utf8proc/issues/24
    390[#27]: https://github.com/JuliaStrings/utf8proc/issues/27
    391[#28]: https://github.com/JuliaStrings/utf8proc/issues/28
    392[#29]: https://github.com/JuliaStrings/utf8proc/issues/29
    393[#32]: https://github.com/JuliaStrings/utf8proc/issues/32
    394[#35]: https://github.com/JuliaStrings/utf8proc/issues/35
    395[#40]: https://github.com/JuliaStrings/utf8proc/issues/40
    396[#43]: https://github.com/JuliaStrings/utf8proc/issues/43
    397[#45]: https://github.com/JuliaStrings/utf8proc/issues/45
    398[#47]: https://github.com/JuliaStrings/utf8proc/issues/47
    399[#51]: https://github.com/JuliaStrings/utf8proc/issues/51
    400[#55]: https://github.com/JuliaStrings/utf8proc/issues/55
    401[#58]: https://github.com/JuliaStrings/utf8proc/issues/58
    402[#62]: https://github.com/JuliaStrings/utf8proc/issues/62
    403[#66]: https://github.com/JuliaStrings/utf8proc/issues/66
    404[#68]: https://github.com/JuliaStrings/utf8proc/issues/68
    405[#70]: https://github.com/JuliaStrings/utf8proc/issues/70
    406[#77]: https://github.com/JuliaStrings/utf8proc/issues/77
    407[#78]: https://github.com/JuliaStrings/utf8proc/issues/78
    408[#79]: https://github.com/JuliaStrings/utf8proc/issues/79
    409[#80]: https://github.com/JuliaStrings/utf8proc/issues/80
    410[#84]: https://github.com/JuliaStrings/utf8proc/issues/84
    411[#88]: https://github.com/JuliaStrings/utf8proc/issues/88
    412[#89]: https://github.com/JuliaStrings/utf8proc/issues/89
    413[#90]: https://github.com/JuliaStrings/utf8proc/issues/90
    414[#94]: https://github.com/JuliaStrings/utf8proc/issues/94
    415[#99]: https://github.com/JuliaStrings/utf8proc/issues/99
    416[#113]: https://github.com/JuliaStrings/utf8proc/issues/113
    417[#121]: https://github.com/JuliaStrings/utf8proc/issues/121
    418[#123]: https://github.com/JuliaStrings/utf8proc/issues/123
    419[#125]: https://github.com/JuliaStrings/utf8proc/issues/125
    420[#128]: https://github.com/JuliaStrings/utf8proc/issues/128
    421[#132]: https://github.com/JuliaStrings/utf8proc/issues/132
    422[#133]: https://github.com/JuliaStrings/utf8proc/issues/133
    423[#134]: https://github.com/JuliaStrings/utf8proc/issues/134
    424[#135]: https://github.com/JuliaStrings/utf8proc/issues/135
    425[#140]: https://github.com/JuliaStrings/utf8proc/issues/140
    426[#141]: https://github.com/JuliaStrings/utf8proc/issues/141
    427[#142]: https://github.com/JuliaStrings/utf8proc/issues/142
    428[#147]: https://github.com/JuliaStrings/utf8proc/issues/147
    429[#148]: https://github.com/JuliaStrings/utf8proc/issues/148
    430[#149]: https://github.com/JuliaStrings/utf8proc/issues/149
    431[#150]: https://github.com/JuliaStrings/utf8proc/issues/150
    432[#151]: https://github.com/JuliaStrings/utf8proc/issues/151
    433[#152]: https://github.com/JuliaStrings/utf8proc/issues/152
    434[#154]: https://github.com/JuliaStrings/utf8proc/issues/154
    435[#156]: https://github.com/JuliaStrings/utf8proc/issues/156
    436[#159]: https://github.com/JuliaStrings/utf8proc/issues/159
    437[#167]: https://github.com/JuliaStrings/utf8proc/issues/167
    438[#173]: https://github.com/JuliaStrings/utf8proc/issues/173
    439[#179]: https://github.com/JuliaStrings/utf8proc/issues/179
    440[#196]: https://github.com/JuliaStrings/utf8proc/issues/196
    441[#205]: https://github.com/JuliaStrings/utf8proc/issues/205
    442[#224]: https://github.com/JuliaStrings/utf8proc/issues/224
    443[#233]: https://github.com/JuliaStrings/utf8proc/issues/233
    444[#247]: https://github.com/JuliaStrings/utf8proc/issues/247
    445[#253]: https://github.com/JuliaStrings/utf8proc/issues/253