Skip to content

Commit

Permalink
Merge branch 'nightly' into newbranch
Browse files Browse the repository at this point in the history
  • Loading branch information
sarfarazsiddiquii committed Apr 21, 2024
2 parents c72738a + 6af6934 commit df44cc8
Show file tree
Hide file tree
Showing 5 changed files with 128 additions and 5 deletions.
25 changes: 25 additions & 0 deletions proposals/byte-as-uint8.md
@@ -0,0 +1,25 @@
# Standardise the representation of byte sequence as a sequence of unsigned 8 bit integers

At this point in time, a sequence of bytes is often represented as a sequence of signed 8 bit integers in Mojo standard library.
Most noticeable example is the underlying data of string types `String`, `StringLiteral`, `StringRef` and `InlinedString`, but also APIs like for example the hash function `fn hash(bytes: DTypePointer[DType.int8], n: Int) -> Int:`.

# Motivation

Logically a byte is an integer value between `0` and `255`. Lots of algorithms make use of arithmetics ground by this assumption.
A signed 8 bit integer on the contrary represents values between `-128` and `127`. This introduces very subtle bugs, when an algorithm written for unsigned 8 bit integer is used on a signed 8 bit integer.

Another motivation for this change is that Mojo aims to be familiar to Python users. Those Python users are familiar with the `bytes` class, which itself is working with values between `0` and `255`, not values between `-128` and `127`.

## Examples:

### Division:
A value `-4` represented as `Int8` has the same bit pattern as value `252` represented as `UInt8`.
`-4 // 4` equals to `-1` (`bx11111111`), where `252 // 4` equals to `63` (`bx00111111`) as we can see the bit patterns are different.

### Bit shift:
Values `-1` and `255` have the same bit pattern as `Int8` and `UInt8` `bx11111111` but `-1 >> 1` results in `-1` (same bit pattern), where `255 >> 1` results in `127` (`bx01111111`)

# Proposal

A text based search for `DTypePointer[DType.int8]` and `Pointer[Int8]` on current open-sourced standard library revealed 29 results for `Pointer[Int8]` and 78 results for `DTypePointer[DType.int8]`.
Replacing `DTypePointer[DType.int8]` with `DTypePointer[DType.uint8]` and `Pointer[Int8]` with `Pointer[UInt8]` on case by case bases is a substantial refactoring effort, but it will prevent a certain class of logical bugs (see https://github.com/modularml/mojo/pull/2098). As it is a breaking change in sense of API design, it is sensible to do the refactoring as soon as possible.
12 changes: 12 additions & 0 deletions stdlib/src/builtin/value.mojo
Expand Up @@ -109,3 +109,15 @@ trait CollectionElement(Copyable, Movable):
"""

pass


trait StringableCollectionElement(CollectionElement, Stringable):
"""The StringableCollectionElement trait denotes a trait composition
of the `CollectionElement` and `Stringable` traits.
This is useful to have as a named entity since Mojo does not
currently support anonymous trait compositions to constrain
on `CollectionElement & Stringable` in the parameter.
"""

pass
61 changes: 61 additions & 0 deletions stdlib/src/collections/dict.mojo
Expand Up @@ -32,6 +32,7 @@ value types must always be Movable so we can resize the dictionary as it grows.
See the `Dict` docs for more details.
"""
from memory import UnsafePointer
from builtin.value import StringableCollectionElement

from .optional import Optional

Expand All @@ -45,6 +46,10 @@ trait KeyElement(CollectionElement, Hashable, EqualityComparable):
pass


trait StringableKeyElement(KeyElement, Stringable):
pass


@value
struct _DictEntryIter[
K: KeyElement,
Expand Down Expand Up @@ -499,6 +504,62 @@ struct Dict[K: KeyElement, V: CollectionElement](
"""
return len(self).__bool__()

@staticmethod
fn __str__[
T: StringableKeyElement, U: StringableCollectionElement
](self: Dict[T, U]) -> String:
"""Returns a string representation of a `Dict`.
Note that since we can't condition methods on a trait yet,
the way to call this method is a bit special. Here is an example below:
```mojo
var my_dict = Dict[Int, Float64]()
my_dict[1] = 1.1
my_dict[2] = 2.2
dict_as_string = __type_of(my_dict).__str__(my_dict)
print(dict_as_string)
# prints "{1: 1.1, 2: 2.2}"
```
When the compiler supports conditional methods, then a simple `str(my_dict)` will
be enough.
Args:
self: The Dict to represent as a string.
Parameters:
T: The type of the keys in the Dict. Must implement the
traits `Stringable` and `KeyElement`.
U: The type of the values in the Dict. Must implement the
traits `Stringable` and `CollectionElement`.
Returns:
A string representation of the Dict.
"""
var minimum_capacity = self._minimum_size_of_string_representation()
var result = String(List[Int8](capacity=minimum_capacity))
result += "{"

var i = 0
for key_value in self.items():
result += str(key_value[].key) + ": " + str(key_value[].value)
if i < len(self) - 1:
result += ", "
i += 1
result += "}"
return result

fn _minimum_size_of_string_representation(self) -> Int:
# we do a rough estimation of the minimum number of chars that we'll see
# in the string representation, we assume that str(key) and str(value)
# will be both at least one char.
return (
2 # '{' and '}'
+ len(self) * 6 # str(key), str(value) ": " and ", "
- 2 # remove the last ", "
)

fn find(self, key: K) -> Optional[V]:
"""Find a value in the dictionary by key.
Expand Down
7 changes: 2 additions & 5 deletions stdlib/src/collections/list.mojo
Expand Up @@ -22,6 +22,7 @@ from collections import List

from memory.unsafe_pointer import *
from memory import Reference, UnsafePointer
from builtin.value import StringableCollectionElement

# ===----------------------------------------------------------------------===#
# Utilties
Expand Down Expand Up @@ -85,10 +86,6 @@ struct _ListIter[
return self.index


trait StringableCollectionElement(Stringable, CollectionElement):
pass


struct List[T: CollectionElement](CollectionElement, Sized, Boolable):
"""The `List` type is a dynamically-allocated list.
Expand Down Expand Up @@ -582,7 +579,7 @@ struct List[T: CollectionElement](CollectionElement, Sized, Boolable):
```mojo
var my_list = List[Int](1, 2, 3)
print(__type_of_(my_list).__str__(my_list))
print(__type_of(my_list).__str__(my_list))
```
When the compiler supports conditional methods, then a simple `str(my_list)` will
Expand Down
28 changes: 28 additions & 0 deletions stdlib/test/collections/test_dict.mojo
Expand Up @@ -62,6 +62,32 @@ def test_big_dict():
assert_equal(2000, len(dict))


def test_dict_string_representation_string_int():
var some_dict = Dict[String, Int]()
some_dict["a"] = 1
some_dict["b"] = 2
dict_as_string = __type_of(some_dict).__str__(some_dict)
assert_true(
some_dict._minimum_size_of_string_representation()
<= len(dict_as_string)
)
assert_equal(dict_as_string, "{a: 1, b: 2}")


def test_dict_string_representation_int_int():
var some_dict = Dict[Int, Int]()
some_dict[3] = 1
some_dict[4] = 2
some_dict[5] = 3
some_dict[6] = 4
dict_as_string = __type_of(some_dict).__str__(some_dict)
# one char per key and value, we should have the minimum size of string possible
assert_equal(
some_dict._minimum_size_of_string_representation(), len(dict_as_string)
)
assert_equal(dict_as_string, "{3: 1, 4: 2, 5: 3, 6: 4}")


def test_compact():
var dict = Dict[String, Int]()
for i in range(20):
Expand Down Expand Up @@ -335,5 +361,7 @@ def test_owned_kwargs_dict():

def main():
test_dict()
test_dict_string_representation_string_int()
test_dict_string_representation_int_int()
test_owned_kwargs_dict()
test_bool_conversion()

0 comments on commit df44cc8

Please sign in to comment.