- Type: Design proposal
- Authors: Ilya Gorbunov, Mikhail Zarechenskiy
- Contributors: Andrey Breslav, Roman Elizarov, Nikolay Igotti
- Status: Under consideration
- Prototype: Implemented in Kotlin 1.3-M1
- Related issues: KT-191
- Discussion: KEEP-135
Provide support in the compiler and the standard library in order to introduce types for unsigned integers and make them first-class citizen in the language.
Currently it's hard or even impossible to use hexadecimal literal constants that result in overflow of the corresponding
signed types. That overflow causes the constant to become more wider than expected (e.g. 0xFFFF_FFFE
is Long
)
or even impossible to express in Kotlin: 0x8000_0000_0000_0000
Colors
An example is specifying color as 32-bit AARRGGBB value:
fun takesColor(color: Int)
takesColor(0xFFCC00CC) // doesn't compile, requires explicit .toInt() conversion
This proposal doesn't ease the usage of such API, but at least it becomes possible to author Kotlin API with unsigned types as following:
fun takesColor(color: UInt) = takesColor(color.toInt())
takesColor(0xFFCC00CCu)
Byte arrays initialized in code
Currently creating a byte array with content specified in code looks extremely verbose in Kotlin:
val byteOrderMarkUtf8 = byteArrayOf(0xEF.toByte(), 0xBB.toByte(), 0xBF.toByte())
With the introduction of unsigned bytes and byte arrays this can be rewritten more compact:
val byteOrderMarkUtf8 = ubyteArrayOf(0xEFu, 0xBBu, 0xBFu)
There are tricks that make it possible to implement algorithms based on unsigned arithmetic with signed integers (see Unsigned int considered harmful for Java), but one has to be extremely careful when dealing with such tricks and remember which variable represents which type actually in code.
For an unsigned value represented by a signed integer special functions like
divideUnsigned
, remainderUnsigned
, toUnsignedString
has to be called instead of the standard ones.
It is very fragile and error prone especially when signed and unsigned values both are used.
When one provides an external declaration for some native platform API (either C API or JS IDL declarations KT-13541) that declaration can contain unsigned types natively. Unsigned types in Kotlin would allow to represent such declarations without unwittingly altering their semantics by substituting unsigned integers with signed ones.
We propose to introduce 4 types to represent unsigned integers:
kotlin.UByte
: an unsigned 8-bit integer, ranges from 0 to 255kotlin.UShort
: an unsigned 16-bit integer, ranges from 0 to 65535kotlin.UInt
: an unsigned 32-bit integer, ranges from 0 to 2^32 - 1kotlin.ULong
: an unsigned 64-bit integer, ranges from 0 to 2^64 - 1
Same as for primitives, each of unsigned type will have corresponding type that represents array:
kotlin.UByteArray
: an array of unsigned byteskotlin.UShortArray
: an array of unsigned shortskotlin.UIntArray
: an array of unsigned intskotlin.ULongArray
: an array of unsigned longs
To iterate through a range of unsigned values there will be range and progression types of unsigned ints and longs:
kotlin.ranges.UIntRange
: an iterable range of unsigned intskotlin.ranges.UIntProgression
: an iterable progression of unsigned intskotlin.ranges.ULongRange
: an iterable range of unsigned longskotlin.ranges.ULongProgression
: an iterable progression of unsigned longs
The unsigned types are to be released in Kotlin 1.3 as an experimental feature. This means we do not give compatibility guaranties for the API and language features related to unsigned types.
Their usage without an opt-in will produce a compiler warning about their experimentality.
The opt-in can be given in two ways:
- either annotate the code element that uses unsigned types with the
@UseExperimental(ExperimentalUnsignedTypes::class)
annotation - or specify
-Xuse-experimental=kotlin.ExperimentalUnsignedTypes
compiler option
If you develop a library and want to use unsigned types, we recommend to propagate the experimentality to the parts of your API
that depend on the unsigned types with the @ExperimentalUnsignedTypes
annotation:
// a function that exposes unsigned types in signature
@ExperimentalUnsignedTypes
fun upTo(limit: UInt): UInt {
}
// a function that uses unsigned types in its body
@ExperimentalUnsignedTypes
fun usesUnsignedUnderTheCover(): Boolean {
return upTo(10u) < 5u
}
Unsigned types support operations similar to their signed counterparts. These are:
- equality:
equals
andhashCode
- comparison:
compareTo
- arithmetic operators:
plus
,minus
,div
,rem
- increment and decrement operators:
inc
,dec
- bitwise operations:
and
,or
,xor
,inv
- bit shifts:
shl
,shr
- range creation:
rangeTo
operator - narrowing and widening conversions:
toUByte
,toUShort
,toUInt
,toULong
- unsigned->signed conversions:
toByte
,toShort
,toInt
,toLong
- signed->unsigned conversions:
toUByte
,toUShort
,toUInt
,toULong
extensions onByte
/Short
/Int
/Long
- number->string conversions:
toString
member function andtoString(radix)
extensions - string->number conversions:
String.toUInt/ULong/UByte/UShort()
extensions with optionalradix
and non throwingOrNull
-variant
Arithmetic operations on UByte
and UShort
work similar to the signed ones: first they widen their operands to unsigned ints
then perform the operation and return UInt
as a result.
This leads to the same compound assignment problem KT-7907 as with Byte
and Short
types:
val bytes: UByteArray = ...
bytes[i] += 0x80u // doesn't compile because the return type `UInt` doesn't fit back into `UByte`
We've decided to favor the consistency with the signed types and not try to solve this problem individually for the unsigned types. We hope to approach the problem later uniformly both for the signed and unsigned types.
Comparison and arithmetic operations are overloaded for each combination of unsigned type operands. The narrower operand is extended to the width of the other one and that type is the type of the result.
Arithmetic and comparison operations that mix signed and unsigned operands are not provided. The semantics of such operators is unclear.
In some languages there is unary minus operator on unsigned types which extends them to a wider signed type and then negates. We're going to omit this operator, requiring an explicit conversion to a signed type before negating.
Unary plus operator is not provided for the lack of use cases.
Bitwise operations are provided for operands of the same width only, same as in the signed counterparts.
Bitwise operators for UByte
and UShort
are provided, but they are in question, because for the signed counterparts they exist
as experimental extensions as of Kotlin 1.2.
Bit shifts are provided only for UInt
and ULong
, for the more narrow types, both for signed and unsigned, they are under consideration.
There's an open question how to call the operation that shifts an unsigned integer right.
Widening conversion of one unsigned type to a wider one, for example UInt.toULong()
is done by extending it with zero bits.
Narrowing conversion to a narrower unsigned type like UInt.toUByte()
is just a bit pattern truncation.
A conversion between signed and unsigned types of the same with to each other is done by reinterpreting the bit pattern of
a number as signed or unsigned. Therefore UInt.MAX_VALUE.toInt()
will turn into -1
.
Narrowing conversion between signed and unsigned types is done by reinterpreting
the bit pattern of a number truncated to the given width as signed or unsigned.
For example 511u.toByte()
will turn into -1
signed byte value.
Widening conversion of unsigned type to a wider signed type, for example UInt.toLong()
is done by extending it with zero bits,
so the resulting signed value is always non-negative.
Widening conversion of signed type to a wider unsigned type is done by signed extension to a wider signed type first and then reinterpreting it as unsigned type of the same width.
For each unsigned integer type there will be a specialized array: UIntArray
, ULongArray
etc.
Storage
These arrays must provide the compactness of storage same as their signed counterparts.
Equality contract
The arrays have the identity equality contact like the signed arrays rather than the structural equality like lists.
Therefore they do not override equals
and hashCode
implementations, defaulting to the ones inherited from Any
.
toString
implementation is in question, whether it should or shouldn't render array element values.
Implemented interfaces
Unlike the signed counterparts these arrays can implement interfaces. We find it advantageous to implement
the Collection<T>
interface and have a variety of collection extension functions available on arrays of unsigned integers
without having to provide specializations from them.
We also have considered if the arrays should implement List
interface, because they do support indexed access, but found that
infeasible due to conflicting equality contract of the List
.
Conversions between signed and unsigned arrays
It should be possible to convert an array of signed integers to an array of unsigned ones of the same width and vice versa by copying values to a new array and reinterpreting them as the desired type:
fun UByteArray.toByteArray(): ByteArray
fun ByteArray.toUByteArray(): UByteArray
Also it might be advantageous to provide functions that reinterpret an entire array as signed or unsigned one:
fun UByteArray.asByteArray(): ByteArray
fun ByteArray.asUByteArray(): UByteArray
so that the returned array is a view on the original array with a different signedness, but this is possible only if the specific implementation of unsigned arrays is chosen.
A progression (UIntProgression
or ULongProgression
) of unsigned values has unsigned start
and endInclusive
properties and a signed step
of the same width.
The sign of the step is used to represent the direction of a progression: either it's ascending or descending.
Therefore it isn't possible to create a progression that iterates from 0
to UInt.MAX_VALUE
in one step.
A range is a special case of a progression with the step 1.
Implementation of unsigned types heavily depends on inline classes feature. Namely, each unsigned class is actually an inline class, which uses the signed counterpart value as a storage.
Number
is an abstract class in Kotlin, and inline classes are not allowed to extend other classes, therefore unsigned types
can't be inherited from Number
class.
We haven't found compelling use cases to circumvent this limitation specially for the unsigned types.
Each unsigned class has its own wrapper class, which is used for autoboxing operations, see this section for more details. Basically, rules for boxing are the same as for primitives. Example:
val a: UInt? = 3.toUInt() // Boxing
val b: Comparable<*> = 0.toULong() // Boxing
val c: List<UInt> = listOf(1.toUInt(), 2.toUInt()) // Boxing
TODO: List operations intrinsified by the compiler
Here we list changes in the language that should be supported in the compiler specifically for the unsigned types.
In order to simplify the usage of unsigned integers we introduce unsigned literals. An unsigned literal is an expression of the following form:
{decimal literal}
{unsigned integer suffix}
{hex literal}
{unsigned integer suffix}
{binary literal}
{unsigned integer suffix}
Where {unsigned integer suffix}
can be one the following values: u
, U
, uL
and UL
.
Note that the order of [uU]
and L
characters matters, one cannot use unsigned literal 42LU
.
Semantically, type of an unsigned literal 42u
depend on an expected unsigned type or its supertype.
If there is no applicable expected type for an unsigned literal, then expression of such type will be approximated to UInt
if possible, or to ULong
,
depending on a value of the unsigned literal. Unsigned literal with suffix uL
or UL
always represents value of ULong
type.
Example:
val a1 = 42u // UInt
val a2 = 0xFFFF_FFFF_FFFFu // ULong, it's 281474976710655
val b: UByte = 1u // OK
val c: UShort = 1u // OK
val d: ULong = 1u // OK
val l1 = 1UL // ULong
val l2 = 2uL // ULong
val e1: Int = 42u // ERROR, type mismatch
Unsigned integer literals can represent numbers that are larger than any number representable with signed integer literals,
therefore the compiler should support parsing of unsigned number literals that are larger than Long.MAX_VALUE
:
val a1 = 9223372036854775807 // OK, it's Long
val a2 = 9223372036854775808 // Error, out of range
val u1 = 9223372036854775808u // OK, it's ULong
It might be useful to use values of unsigned types inside const vals
and annotations:
const val MAX: UByte = 0xFFu
const val OTHER = 40u + 2u
const val ERROR = -1
const val U_ERROR = ERROR.toUInt()
annotation class ExpectedErrorCode(val error: UInt)
To make it possible, expression of unsigned type should be evaluated at compile-time to a concrete value. Thus, we are going to tune
constant evaluator for basic operations on unsigned types in order to support const vals
and annotations with unsigned parameters.
Note that vararg
parameters of inline class types are forbidden, because it's not clear how to associate the type from vararg
parameter
with the underlying array type, see this section for the details.
However, since we provide a specialized array for each unsigned integer type, we can associate
types from vararg
with the corresponding array types:
fun uints(vararg u: UInt): UIntArray = u
fun ubytes(vararg u: UByte): UByteArray = u
fun ushorts(vararg u: UShort): UShortArray = u
fun ulongs(vararg u: ULong): ULongArray = u
Should we allow assignment of usual signed literals to unsigned types?
fun takeUByte(b: UByte) {}
takeUByte(0xFF)
If yes, then how signed and unsigned types are related to each other?
Should we support some kind of @Unsigned
annotation to load types from Java as unsigned ones?
public class Foo {
void test(@Unsigned int x) {} // takes UInt from Kotlin point of view
}
How to call the operation that shifts an unsigned integer right:
shr
, because it's clear that the shift is always unsigned and we don't needu
prefix to distinguish it.ushr
, because the shift is always unsigned and we want to emphasize that.- both of above, because they do the same and having both of them will ease the migration from signed types.
These operations are experimental for Byte
and Short
: we haven't decided yet on their contract.