-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context refactoring #181
Comments
FYI, this will not compile for more than 1 types T unless the typename args are all different types, You can get around it by appending an optional type at the end
|
I didn't have a lot of time, but here's a refactored draft proposal: |
@mbartling I don't get it. |
It currently compiles and works with the code below. But, I can dig deeper: class Factory {
public:
Factory() {printf("factory created\r\n");}
template<class T, class T2, typename... Args>
T* create(T2&& n, Args&&... args) {
printf("create with integer: %d\r\n", forward<T2>(n));
return new T(std::forward<Args>(args)...);
}
}; |
@dboyliao |
Oh, I see. |
An updated draft, see minor changes in the code snippet.
The updated proposal here address these points:
Most modifications are around the code relating to typedef utensor::string OpName;
typedef std::unordered_map<TName, Tensor*> TensorNamePtrMap;
struct TensorRecord {
public:
unsigned char ref_count = 0;
Tensor* t = nullptr;
//bool keep_alive = false;
TensorRecord(Tensor* _t, unsigned char _ref_count) :
t(_t), ref_count(_ref_count) {}
};
class Context {
private:
std::unordered_map<TName, TensorRecord> tTable;
//TODO: - Op data manager
// - Profiler
// - Allocator
public:
static TensorAllocator tensor_allocator;
Context(TensorAllocator t_alloc) : tensor_allocator(t_alloc) {};
//TODO: modifying this to support pooling
Tensor* add(Tensor* t, TName _name, unsigned char _ref_count = 1) {
t->setName(_name);
tTable[_name] = TensorRecord(t, _ref_count);
return t;
}
//Tensor lookup interface
//non-existing tensor: returns Tensor*& but Tensor* is null
Tensor*& operator[](TName name) {
//TODO: define behavior for tensor-not-found
Tensor*& t = tTable[name].t;
return t;
};
void invoke(operator *op, OpName _name = 0); //persistent op exists in heap
//intermediate ops exists on stack
void invoke(operator &op, OpName _name = 0) {
//trigger registered actions based on _name here
op.compute();
}
//This tensor removal function is meant to be called by code-gen directly
void rm(TName t_name) {
//TODO: check for t_name's existance
delete tTable[t_name].t;
tTable.erase(t_name);
}
//decrease ref count of used tensors and perform deletion
//NT: Template based function worries me, as one copy of the function may be generated per op use
template <class T>
void gc(T &t_struct) {
for (unsigned char i = 0; i < (sizeof(T) / sizeof(Tensor*)); i++) {
TName t_name = ((Tensor*) &t_struct)[i]->name;
unsigned char c = tTable[t_name].ref_count - 1;
if(c <= 0) {
rm(t_name);
} else {
tTable[t_name].ref_count = c;
}
}
}
};
//Code for operators
template <class T1, class T2, class TOut>
class MatMulOp : public Operator {
public:
struct {
Tensor* input;
Tensor* dim;
} inputs;
struct {
Tensor* output;
} outputs;
MatMulOp() {
//similar to TFLM's prepare function
}
virtual void compute() override {
MatMul2<T1, T2, TOut>(inputs.input, inputs.dim,
outputs.output);
}
};
class Operator : public uTensor {
public:
virtual void compute() = 0;
};
//// Example for Generated Code
//Old
{
RamTensor<float>* out_tensor;
out_tensor = new RamTensor<float>({ 1 });
ctx.add(out_tensor, "MatMul_eightbit/x__port__0/min:0", 1);
ctx.push(new MinOp(),
{ "MatMul_eightbit/x__port__0/reshape:0", "MatMul_eightbit/x__port__0/reduction_dims:0" },
{ "MatMul_eightbit/x__port__0/min:0" });
ctx.eval();
}
//New
{
//Keeping one-off operators on the stack
//Stateful operators can be allocated on the heap or a more global scope
MinOp op();
op.inputs.input = ctx["MatMul_eightbit/x__port__0/reshape:0"];
op.inputs.dim = ctx["MatMul_eightbit/x__port__0/reduction_dims:0"];
//ctx.add() registers Tensor* with the context and returns the same Tensor*
op.outputs.output = ctx.add(new RamTensor<float>({ 1 }), "MatMul_eightbit/x__port__0/min:0", 1);
//setting a breakpoint here should take you (almost) straight to the kernel
ctx.invoke(op, "op_name_from_code_gen"); //with or without supplying the name
//logic for reference-counting clean up
ctx.gc(op.inputs); //or, use code-gen to delete all the input tensors by calling rm()
} |
@neil-tan "While tensor-handle would enable more flexibility in-terms of runtime memory management and act as a layer of dense while programming with raw-pointers, developers and users are unlikely to work with pointers directly with the aid of code-gen. Offline memory planning should be adequate for most use-cases. With the update proposal, tensor-handle can be added in the future if necessary." False, literally all of our operators currently do something like |
@mbartling To reiterate, the raw tensor-pointers are exposed within the uTensor code-base itself. Application developers and the majority of uTensor users will not be interacting with tensors directly (upcoming user-api changes) nor dealing with operators. Though, the contributors and core-developers will have to deal with operators and tensor pointer; they need to know how to work with raw-pointers anyways. |
Assumptions:
Consequence: independent model evals maintain deterministic behavior of operation and therefore have a plannable dataflow graph at codegen.
Consequence: Runtime allocator should be able to operate on best effort goals. That is, it should attempt to use the offline memory plan as it can do this very quickly. However, it should also be able to handle error/edge cases, such as fragmentation, then attempt to handle them internally or notify the user application of error, and allow to grow, shrink, or simply fault on exceptions. This naturally leads to the Tensor handles. A Tensor Handle has the following properties:
Example) Tensor a = RomTensor<uint8_t>(a_data, a_shape);
Tensor b = RomTensor<uint8_t>(b_data, b_shape);
Tensor c = RamTensor<uint8_t>(max(a_shape, b_shape));
// Assume this is a 4x4x4 tensor
c = Add<uint8_t>(a, b);
Tensor c2 = c; // c2 is bound to the same data as c, deep copy must happen explicitly
// Read interface
std::cout << c(1, 2, 3) << std::endl;
// write interface
c(1,2,3) = 10;
std::cout << c2(1,2,3) << std::endl; // outputs 10;
// Check to see if tensor is still accessible
if(c) // if not null
foo(); |
Point #3 is taken: branching and non-deterministic simultaneous graph evaluation calls. With a runtime allocator, it's possible that Tensor will be shuffled around in memory in response to the system states. There are 2 ways to facilitate this:
** Tensor-lookups **
@mbartling We should keep the syntax simple. I prefer not explicitly making pass-by-reference a requirement for tensor-handles. How would you implement this? |
"We should keep the syntax simple. I prefer not explicitly making pass-by-reference a requirement for tensor-handles. How would you implement this? Tensor c2 = c;" |
@mbartling I tried to monkey around with
|
Can you give me some requirements for user interface here? I much prefer we only have one allocation of the Tensor handle with shared references. It makes cleanup 10000x easier. |
I think that update a tensor ptr which is also propagated to other tensors which are referenced to it. |
One common theme around the syntax here for both examples, I think what @Knight-X is suggesting is having reference to pointers. That might probably work? @mbartling , there are cases to be made about runtime allocation (multiple inferences and shared tensor-arena). Though, dynamically moving tensors after they are allocated, may not be an immediate requirement. My view is that, tensor-handle is ok, given that we can meet the requirement mentioned above. Otherwise, I think the complexity outweighs the benefit at this point. |
@neil-tan TensorHandles are references to pointers XD As for the That's easy as long as we differentiate between a tensor handle copy and an "original". Easiest solution here is to maintain a union ptr to either a // Tensors also appear on the same heap as the Tensor metadata. This way we can move tensors around and delete them without affecting user code
//template <typename Allocator=utensor::DefaultTensorMetaDataAllocator>
enum class TensorDataFieldType {
TensorType, TensorInterfaceType
};
class Tensor {
private:
TensorDataFieldType type;
union _reference {
utensor::TensorInterface* tensor_interface_ptr;
const Tensor& tensor_ref;
}
public:
Tensor(const Tensor& that) {
type = TensorType;
_reference.tensor_ref = &that;
} // Assume have the assignment op as well
utensor::TensorInterface* operator->(0) {
if(type == TensorType){
return _reference.tensor_ref.operator->(0); // Returns a TensorInterface*
} else{
return _reference.tensor_interface_ptr; // Also a TensorInterface*
}
}
Tensor(utensor::TensorInterface* ptr) : type(TensorInterfaceType), _reference.tensor_interface_ptr(ptr) {
Context::DefaultTensorMetaDataAllocator::bind(this, ptr);
}
// Add some bits to make the interface nicer to the user
// Force everything to be on the utensor allocator
void* operator new(size_t sz) { // Have to delegate this size from tensors somehow + sizeof(Tensor)
void* p = Context::DefaultTensorMetaDataAllocator::allocate(sz);
return p;
}
void operator delete(void* p) {
Context::DefaultTensorMetaDataAllocator::deallocate(p);
}
// KEY BIT
friend class utensor::AllocatorInterface;
}; |
Summarizing an offline discussion I had with @mbartling:
|
Draft for context changes
Things to consider:
Code snippet below:
The text was updated successfully, but these errors were encountered: