Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uTensor leaks memory #162

Open
janjongboom opened this issue Jun 22, 2019 · 4 comments
Open

uTensor leaks memory #162

janjongboom opened this issue Jun 22, 2019 · 4 comments
Assignees

Comments

@janjongboom
Copy link
Member

janjongboom commented Jun 22, 2019

I have a simple neural network with two hidden layers. Every time I classify through uTensor I'm leaking ~250 bytes of memory.

Here's how to log memory:

  1. Set the following macro:

    MBED_HEAP_STATS_ENABLED=1
    
  2. Use the following snippet:

    #include "mbed_mem_trace.h"
    
    void print_memory_info() {
        mbed_stats_heap_t heap_stats;
        mbed_stats_heap_get(&heap_stats);
        printf("Heap size: %lu / %lu bytes (max: %lu)\r\n", heap_stats.current_size, heap_stats.reserved_size, heap_stats.max_size);
    }

Then I'm invoking uTensor like this:

print_memory_info();

float window[186] = {
    0, 0, 0, 8, -17, 0, 0, 22, -6, 4, 75, -25, -30, 0, 0, 31, 0, 0, -31, 0, 0, 0, 0, 0, 0, 16, 0, 0, -16, 0, 29, 0, 16, -1, -71, 16, 0, 1, 26, -3, 70, 5, 0, 0, 0, -25, 0, 22, 0, 0, 9, 0, 0, 24, 18, 0, 8, 4, 0, 0, -5, 0, 32, 0, 0, 0, -18, 0, 32, 0, 0, 18, 0, -67, 13, 0, 1, 0, 0, -4, 17, 0, -2, -17, 0, 0, 0, 0, 2, 0, -74, 0, 0, -23, -7, 29, 0, -20, 67, 17, 19, -78, 6, -3, 1, 0, 0, 0, -6, -16, -3, -17, 0, -16, 17, 0, 0, 11, 0, -11, -1, 17, -21, 0, 0, 0, -4, -17, 0, 1, 0, 0, 7, 25, -8, 0, 72, -23, 0, 0, -8, -66, -70, 7, 0, 0, 0, 66, -4, 1, 17, 6, -6, 0, 68, -9, -17, -80, -16, 0, -17, 20, 23, 17, -21, 0, 0, 0, -23, 6, -13, 0, 2, -17, 21, 70, -16, -21, -65, -16, 0, 65, -7, 29, 17, -25
};

// so my board hardfaults when removing this later, I guess the memory is managed by uTensor
// this is not good practice I'd say
RamTensor<float> *input_x = new RamTensor<float>({ 1, 186 });
float *buff = (float*)input_x->write(0, 0);
for (int ix = 0; ix < 186; ix++) {
    buff[ix] = window[ix];
}

// in new block to force destruction of the Context before measuring again
{
    Context ctx;
    get_trained_ctx(ctx, input_x);
    ctx.eval();

    S_TENSOR pred_tensor = ctx.get("y_pred/Softmax:0");  // getting a reference to the output tensor

    uint32_t output_neurons = pred_tensor->getShape()[1];

    printf("Predictions:\n");
    const float* ptr_pred = pred_tensor->read<float>(0, 0);
    for (uint32_t ix = 0; ix < output_neurons; ix++) {
        printf("%lu: %f\n", ix, *(ptr_pred + ix));
    }
}

print_memory_info();

Note that if I delete the RamTensor after running this will hardfault so I assume the memory for that is managed by uTensor.

This leaks memory:

Heap size: 16721 / 66280 bytes (max: 16721)
Predictions:
0: 0.000000
1: 0.000000
2: 1.000000
Heap size: 17061 / 66232 bytes (max: 77479)

My model can be found here: https://github.com/janjongboom/utensor-test/tree/fixed

@janjongboom
Copy link
Member Author

janjongboom commented Jun 22, 2019

Here's the overview of active memory on the heap (generated via mbed-find-dangling-ptrs):

Extracting symbols from BUILD/DISCO_L475VG_IOT01A/GCC_ARM-DEBUG/firmware.elf
Extracting symbols OK
Free for untracked pointer 0x10004900
Free for untracked pointer 0x100048e8

Found 7 dangling pointers

-------------------------------------------------- 0x804e787
7 dangling pointers (total: 208 bytes): [ 0x10004e38 (28), 0x10004e60 (24), 0x10004e88 (28), 0x10004eb0 (28), 0x10004ed8 (28), 0x10004f00 (36), 0x10005288 (36) ]
     804e76e:   2800            cmp     r0, #0
     804e770:   d1ed            bne.n   804e74e <_Balloc+0xa>
     804e772:   2000            movs    r0, #0
     804e774:   bd70            pop     {r4, r5, r6, pc}
     804e776:   2101            movs    r1, #1
     804e778:   fa01 f604       lsl.w   r6, r1, r4
     804e77c:   1d72            adds    r2, r6, #5
     804e77e:   4628            mov     r0, r5
     804e780:   0092            lsls    r2, r2, #2
>>>  804e782:   f7c7 fa07       bl      8015b94 <__wrap__calloc_r>
     804e788:   d0f3            beq.n   804e772 <_Balloc+0x2e>
     804e78a:   6044            str     r4, [r0, #4]
     804e78c:   6086            str     r6, [r0, #8]
     804e78e:   e7e4            b.n     804e75a <_Balloc+0x16>
    
    0804e790 <_Bfree>:
     804e790:   b131            cbz     r1, 804e7a0 <_Bfree+0x10>
     804e792:   6cc3            ldr     r3, [r0, #76]   ; 0x4c
     804e794:   684a            ldr     r2, [r1, #4]
     804e796:   f853 0022       ldr.w   r0, [r3, r2, lsl #2]

I think these are related to the prediction tensor cause the allocations show up late in my log (not 100% in order as the mem tracer uses RawSerial and the application normal Serial):

Predictions:
0: 0.000000
1#c:0x10004da8;0x804e76b-4;33
#c:0x10004e38;0x804e787-1;28
#c:0x10004e60;0x804e787-1;24
#c:0x10004e88;0x804e787-1;28
#c:0x10004eb0;0x804e787-1;28
#c:0x10004ed8;0x804e787-1;28
#c:0x10004f00;0x804e787-1;36
#c:0x10005288;0x804e787-1;36

And are not free'd. Also, there's still some 132 bytes not accounted for, and the mem tracer can't find them either. Maybe it doesn't work so well with shared pointers?

@neil-tan
Copy link
Member

@mbartling Can you please share your memory debugging setup/info here as well?

@neil-tan
Copy link
Member

@janjongboom
By blanketing the entire inference code in a scope: { after line 15 and } after line 70. There's no indication of memory leak:

Hello world

Heap size: 5216 / 238032 bytes
Created WrappedRamTensor
Heap size: 5288 / 238032 bytes
Got trained context
Heap size: 5668 / 238032 bytes
Got pred_tensor
Heap size: 5668 / 238032 bytes
Called ctx_eval
Heap size: 5384 / 238032 bytes
Size: Predictions:
00 00 00 00 00 00 00 00 
Heap size: 5216 / 238032 bytes

@Knight-X @mbartling
I suspect what may have seen as a memory leak was cause by the main() holding on to the output tensor's share-pointer. It has never gone out of scope as the last print_memory_info() was called. This was originally by design, so that the application can hold on to the inference result independent of the state of the context class.

However, in my current attempt to remove share-pointers, application will be responsible to delete the output tensor. Re-running the context will override the output-tensor. A heads up.

@janjongboom
Copy link
Member Author

janjongboom commented Aug 28, 2019

@neil-tan My feeling for this API would be that uTensor should not handle input / output memory. The application passes in the input layer, with memory owned by the application; and the application passes in memory for the output layer (also memory owned by the application). Add some runtime guarantees for the types to ensure people don't pass in the wrong values. Also I believe the API will be much cleaner this way.

E.g.:

float input[33] = { 0 };
float output[4] = { 0 };

Context ctx;
get_trained_ctx(ctx, input, 33, output, 4);
ctx.eval();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants