added specializations to linear unary non-reducing ops: copy and add ops now use existing CUDA functions;
ShiftNode backprop functional for non-packed case/no boundary state
↧