Python Line Profilers using Decorator Pattern

You can use any of the following decorators to profile your functions line by line.  The first one(Profiler1) is a decorator as a class and the second one(Profiler2) is a decorator as a function example. What each one does is as follows: They take a function to run as an argument, they run the function with its parameters and the cProfile module analyzes that function as it runs. After profiler finishes analyzing, it prints out the results ordered by the cumulative time passed during execution of that function. To run the profiler on a function just write @ClassLineProfiler or @fun_line_profiler above the function definition you want to analyze. See example below.

Code with sample usage:

import cProfile
import pstats
import StringIO
import numpy as np


# Profiler1
class ClassLineProfiler(object):
    def __init__(self, fn):
        self.fn = fn


    def __call__(self, *args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()
        retVal = self.fn(*args, **kwargs)
        pr.disable()
        s = StringIO.StringIO()
        sortBy = 'cumulative'
        ps = pstats.Stats(pr, stream=s).sort_stats(sortBy)
        ps.print_stats()
        print(s.getvalue())
        return retVal


# Profiler2
def fun_line_profiler(fn):
    def wrapper(*args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()
        retVal = fn(*args, **kwargs)
        pr.disable()
        s = StringIO.StringIO()
        sortBy = 'cumulative'
        ps = pstats.Stats(pr, stream=s).sort_stats(sortBy)
        ps.print_stats()
        print(s.getvalue())
        return retVal

    return wrapper


# Linear Forward Pass
def forward(x, w):
    return x * w


# Loss function
def loss(x, y, w):
    y_pred = forward(x, w)
    return (y_pred - y) ** 2

# @fun_line_profiler
@ClassLineProfiler
def run_linear_neural_network():
    x_data = np.array([1.0, 2.0, 3.0])
    y_data = np.array([2.0, 4.0, 6.0])
    N = x_data.size + 1

    for w in np.arange(0.0, 4.1, 0.1):
        print("w=", w)
        l_sum = 0
        for x_val, y_val in zip(x_data, y_data):
            y_pred_val = forward(x_val, w)
            loss_val = loss(x_val, y_val, w)
            l_sum += loss_val
            print("x:{}, y:{}, y_pred:{}, loss:{}".format(
                            x_val, y_val, y_pred_val, loss_val))
            print("MSE=", l_sum / N)

run_linear_neural_network()




Faster RCNN PyTorch Download, Train and Test on COCO 2014 dataset

1) Get the files from Ruotian Luo's github repository.
     git clone https://github.com/ruotianluo/pytorch-faster-rcnn.git
2) Get into the faster rcnn directory
     cd pytorch-faster-rcnn/
3) Determine your achitecture. My GPU model is nVidia Tesla P100 and  so the corresponding architecture according to this website is sm_60. The architecture will be provided to set the Nvidia Cuda Compiler's(nvcc) -arch flag in the 4th step below.
4) Compile and Build RoI Pooling module
     cd lib/layer_utils/roi_pooling/src/cuda
     nvcc -c -o roi_pooling_kernel.cu.o roi_pooling_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_60
     cd ../../
     python build.py

     cd ../../../
5) Compile and Build Non-maximum Suppression(NMS) module
     cd lib/nms/src/cuda
     nvcc -c -o nms_kernel.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_60
     cd ../../
     python build.py
     cd ../../

 6) Install python COCO API under data folder.
     cd data
     git clone https://github.com/pdollar/coco.git
     cd coco/PythonAPI
     make
     cd ../../..

7) (OPTIONAL) In case your COCO data set(images and annotation) have to reside under a different directory you can create a symbolic link to it under data folder.
#move all files under coco (obtained after 6th step above) to shared coco folder
     mv data/coco/* /YOURSHAREDDATASETS/coco
#create symbolic link to that coco folder
     cd data
     rm -rf coco
     ln -s
/YOURSHAREDDATASETS/coco coco
8) Download proposals and annotation json files from here
9) After you downloaded annotations, place them under coco/annotations folder. The coco folder structure should look like below. To download default COCO images and annotations please check here.

coco/
        annotations/
                             instances_minival2014.json
                             instances_valminusminival2014.json
                             ...(default coco json files)
        images/
                    train2014/...
                    val2014/...
                    test2014/...
        PythonAPI/...
10) Create imagenet_weights folder under data. A pre-trained model on ImageNet dataset will reside here.
     mkdir -p data/imagenet_weights
11) Download the pre-trained ResNet model(the resnet101-caffe one) model from here into the imagenet_weights folder and rename it.
    cd data/imagenet_weights
    mv resnet101-caffe.pth res101.pth
     cd ../..   

12)Run training script
     GPU_ID=0
     DATASET=coco
     MODEL=res101 
     ./experiments/scripts/train_faster_rcnn.sh $GPU_ID $DATASET $MODEL
13) Test your model upon successful trainig.
#change the iteration number in test_faster_rcnn.sh to
#the number of iterations you made. I made 460000 iterations the default is 490000. 
     GPU_ID=0
     DATASET=coco
     MODEL=res101 
     ./experiments/scripts/test_faster_rcnn.sh $GPU_ID $DATASET $MODEL




Install NVIDIA Driver on Ubuntu 16.04 LTS

1) Log out from GUI

2) Open a terminal (tty) by using the Ctrl + Alt + F6 key combinations.

3) Login to the system by typing your username and password

4) Stop X-server:
    sudo service lightdm stop

5) Disable nouveau driver
    - Create the following file /etc/modprobe.d/blacklist-nouveau.conf 
    - Inside the file write those two lines :
     blacklist nouveau 
     options nouveau modeset=0

6) Update the kernel
    sudo update-initramfs -u

7) Download latest compatible NVIDIA driver from NVIDIA web site( http://www.nvidia.com/Download/index.aspx ) My driver was NVIDIA-Linux-x86_64-384.59.run

8) To install the NVIDIA driver, make sure that you have the right version of the gcc chosen. For my NVIDIA driver, I needed to choose gcc 5.0. To achieve this use
    sudo update-alternatives --config gcc 

9) Uninstall previously installed NVIDIA drivers
    sudo apt-get purge nvidia-*
    sudo apt autoremove

10) Run installation file.
      sudo sh NVIDIA-Linux-x86_64-384.59.run

11) Check if you succesfully installed the driver. The following command should print your NVIDIA driver version ang GPU Memory Usage.
     nvidia-smi

 12) Start X-server
     sudo service lightdm start

Knapsack Problem

The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a mass and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items.
The problem often arises in resource allocation where there are financial constraints and is studied in fields such as combinatorics, computer science, complexity theory, cryptography and applied mathematics. (http://en.wikipedia.org/wiki/Knapsack_problem)

So the goal here is to program the following model. 

Maximize    \qquad \sum_{i=1}^n v_ix_i

subject to   \qquad \sum_{i=1}^n w_ix_i \leqslant W, \quad \quad x_i \in \{0,1\}

We are going to use dynamic programming technique to code the problem in python.
Input Format
A knapsack input contains n + 1 lines. The first line contains two integers, the first is the number of items in the problem, n. The second number is the capacity of the knapsack, W. The remaining
lines present the data for each of the items. Each line, i ∈ 0 . . . n − 1 contains two integers, the
item’s value v i followed by its weight w i .
Input File content Format
n W
v_0 w_0
v_1 w_1
...
v_n-1 w_n-1
The output contains a knapsack solution and is made of two lines. The first line contains two values
obj and opt. obj is the total value of the items selected to go into the knapsack (i.e. the objective
value). opt should be 1 if your algorithm proved optimality and 0 otherwise. The next line is a list
of n 0/1-values, one for each of the x i variables. This line encodes the solution.
the solver.py file contains the code that implements the dynamic programming way of knapsack problem. solver.py
#!/usr/bin/python
# -*- coding: utf-8 -*-

from collections import namedtuple
Item = namedtuple("Item", ['index', 'value', 'weight'])

def solve_it(input_data):
    # parse the input
    lines = input_data.split('\n')
    firstLine = lines[0].split()
    item_count = int(firstLine[0])
    capacity = int(firstLine[1])
    items = []
    for i in range(1, item_count+1):
        line = lines[i]
        parts = line.split()
        items.append(Item(i-1, int(parts[0]), int(parts[1])))

    value = 0
    weight = 0
    taken = [0]*len(items)
    #create  [2  by capacity+1] dimensioned capacity matrix
    #use dynamic programming
    n_rows=capacity+1  
    n_cols=item_count+1
    cap_matrix = [[0]*(2) for i in range(n_rows)] 
    for col_item in range(1,n_cols): 
        for row in range(1,n_rows):
            if items[col_item-1].weight <= row:
                #check if not taking current item yields in a better value
                cap_matrix[row][1] = max(cap_matrix[row-items[col_item-1].weight][0] + items[col_item-1].value, cap_matrix[row][0])
            else: 
                #our item doesnt fit yet get the value of previous item
                cap_matrix[row][1] = cap_matrix[row][0]
        f = open("helper.txt", "w")
        f.write(mylist)   
    
    #set the optimum value
    value = cap_matrix[n_rows-1][n_cols-1]
    #print the capacity matrix
    #print('\n'.join([''.join(['{:4}'.format(item) for item in row]) 
    #  for row in cap_matrix]))
    
    #lets traceback to see which items are taken 
    n_cols = n_cols-1
    n_rows = n_rows-1
    while(n_rows>0 and n_cols>0):
        if(cap_matrix[n_rows][n_cols] != cap_matrix[n_rows][n_cols-1]):
            n_cols = n_cols-1
            taken[n_cols] =1   #item selected
            n_rows = n_rows - items[n_cols].weight #decrease capacity by items weight
        else:
            n_cols = n_cols-1
            taken[n_cols] =0    
    # prepare the solution in the specified output format
    output_data = str(value) + ' ' + str(0) + '\n'
    output_data += ' '.join(map(str, taken))
    return output_data


import sys
if __name__ == '__main__':
    if len(sys.argv) > 1:
        file_location = sys.argv[1].strip()
        input_data_file = open(file_location, 'r')
        input_data = ''.join(input_data_file.readlines())
        input_data_file.close()
        print solve_it(input_data)
    else:
        print 'This test requires an input file.  Please select one from the data directory. (i.e. python solver.py ./data/ks_4_0)'

Simple Graph File

The directef graph file (dgraph.py) content:
#!/usr/bin/python
from collections import namedtuple
import Queue
import priority_dict

Edge = namedtuple("Edge", ['fromm','to', 'weight','color'])

""" directed graph class"""
class dgraph:
    def __init__(self):
        # empty underlying dictionary
        self.g = {}
        
    """ adds an edge from fromN to toN. if fromN or toN doesn't
        exist then they are created """
    def addEdge(self,fromN,toN,w=0,color='white'):
        #check if fromN Node is not available
        if not self.g.has_key(fromN): 
            self.g[fromN]=[]
        #check if toN Node is not available
        if not self.g.has_key(toN): 
            self.g[toN]=[]
        self.g[fromN].append(Edge(fromN,toN,w,color))
        
    """ adds a new node to the graph if it is not already there"""
    def addNode(self, N):
        if not self.g.has_key(N): 
            self.g[N]=[]
            
    """ returns the list of edges from node N"""
    def edgesOf(self, N):
        if self.g.has_key(N):
            edges=[]
            for edge in self.g[N]:
                edges.append(edge.to)
            return edges
        return None
    
    """ dyes edge from node fromN to node toN with the given color"""
    def setEdgeColor(self,fromN,toN,color):
        for edge in (self.g[fromN]):
            if edge.to == toN:
                self.g[fromN].remove(edge)
                self.g[fromN].append(Edge(fromN,toN,edge.weight,color))
                
    """ gets teh color of the edge from fromN node to toN node"""
    def edgeColor(self,fromN,toN):
        for edge in self.g[fromN]:
            if edge.to == toN:
                return edge.color
            
    """ returns nodes of the graph as a list"""
    def nodes(self):
        return self.g.keys()

    """ returns number of nodes in te graph"""
    def numOfNodes(self):
        return len(self.g.keys())

    """ returns number of edges in the graph"""
    def numOfEdges(self):
        num=0
        for key in self.g.keys():
            num = num + len(self.g[key])
        return num

    """ returns number of edges node N have """
    def numOfEdgesNode(self,N):
        return len(self.g[N])

    """ returns True if there is an edge from sourceN node to toN node
    False otherwise"""
    def hasEdge(self,fromN, toN):
        for edge in self.g[fromN]:
            if edge.to == toN:
                return True
        return False

    """ returns the weight from fromN node to toN node,
                None if there is no edge between them"""
    def weight(self,fromN,toN):
        for edge in self.g[fromN]:
            if edge.to == toN:
                return edge.weight
        return None

    def readFileContent(self,file_name):
        file_location = file_name.strip()
        input_data_file = open(file_location, 'r')
        input_data = ''.join(input_data_file.readlines())
        input_data_file.close()
        return input_data
            
    """ implements dijkstra's shortest path algorithm """
    def shortestPathDijkstra(self,fromN,toN=None):
        dist ={} #define distance dictionary
        prev ={} #define previous node dictionary
        S =[] #list of shortest path
        Q = priority_dict.priority_dict()
        #initialize distances and predecessors
        for node in self.g.keys():
            if node != fromN :
                dist[node]=float('inf')
            else:
                dist[node] =0
            prev[node]=None
            Q.setdefault(node,dist[node])

        while not Q.empty():
            u = Q.pop_smallest()
            if u == toN:
                break
            if dist[u] == float('inf'):
                break
            #traverse each neighbour of u
            for edge in self.g[u]:
                alt = dist[u] + self.weight(u,edge.to)
                if alt < dist[edge.to]:
                    dist[edge.to] = alt
                    prev[edge.to] = u
                    Q.update([(edge.to,alt)])

        while prev[toN] != None:
            S.append(toN)
            toN = prev[toN]
        S.append(toN)
        return  S
                    
if __name__ == '__main__':
    x = dgraph()
    x.addEdge('1','2',w=6)
    x.addEdge('1','3',w=2)
    x.addEdge('1','4',w=16)
    x.addEdge('2','4',w=5)
    x.addEdge('2','5',w=4)
    x.addEdge('3','2',w=7)
    x.addEdge('3','5',w=3)
    x.addEdge('3','6',w=8)
    x.addEdge('4','7',w=3)
    x.addEdge('5','4',w=4)
    x.addEdge('5','7',w=10)
    x.addEdge('6','7',w=1)
    
    """x.addNode('Yalova')
    x.addEdge('Adana','Yalova',weight=300)
    x.addEdge('Yalova','Istanbul',weight=200,color='blue')
    #x.addEdge('Yalova','Adana',600)
    print x.edgesOf('Yalova')
    print x.edgesOf('Adana')
    print x.edgeColor('Yalova','Istanbul')
    x.setEdgeColor('Adana','Istanbul','red')
    print x.edgeColor('Adana','Istanbul')
    print x.numOfNodes(),x.nodes(),x.numOfEdges(),x.numOfEdgesNode('Yalova')
    print x.hasEdge('Adana','Istanbul')
    print x.hasEdge('Adana','Adana')"""
    print x.shortestPathDijkstra('1','7')

The following file is taken from
http://code.activestate.com/recipes/522995-priority-dict-a-priority-queue-with-updatable-prio/
which has the implementation of a priority queue with updatable priorities.

priority_dict.py
from heapq import heapify, heappush, heappop
 
class priority_dict(dict):
    """Dictionary that can be used as a priority queue.
 
    Keys of the dictionary are items to be put into the queue, and values
    are their respective priorities. All dictionary methods work as expected.
    The advantage over a standard heapq-based priority queue is
    that priorities of items can be efficiently updated (amortized O(1))
    using code as 'thedict[item] = new_priority.'
 
    The 'smallest' method can be used to return the object with lowest
    priority, and 'pop_smallest' also removes it.
 
    The 'sorted_iter' method provides a destructive sorted iterator.
    """
     
    def __init__(self, *args, **kwargs):
        super(priority_dict, self).__init__(*args, **kwargs)
        self._rebuild_heap()
 
    def _rebuild_heap(self):
        self._heap = [(v, k) for k, v in self.iteritems()]
        heapify(self._heap)
 
    def smallest(self):
        """Return the item with the lowest priority.
 
        Raises IndexError if the object is empty.
        """
         
        heap = self._heap
        v, k = heap[0]
        while k not in self or self[k] != v:
            heappop(heap)
            v, k = heap[0]
        return k
 
    def pop_smallest(self):
        """Return the item with the lowest priority and remove it.
 
        Raises IndexError if the object is empty.
        """
         
        heap = self._heap
        v, k = heappop(heap)
        while k not in self or self[k] != v:
            v, k = heappop(heap)
        del self[k]
        return k
 
    def __setitem__(self, key, val):
        # We are not going to remove the previous value from the heap,
        # since this would have a cost O(n).
         
        super(priority_dict, self).__setitem__(key, val)
         
        if len(self._heap) < 2 * len(self):
            heappush(self._heap, (val, key))
        else:
            # When the heap grows larger than 2 * len(self), we rebuild it
            # from scratch to avoid wasting too much memory.
            self._rebuild_heap()
 
    def setdefault(self, key, val):
        if key not in self:
            self[key] = val
            return val
        return self[key]
 
    def update(self, *args, **kwargs):
        # Reimplementing dict.update is tricky -- see e.g.
        # http://mail.python.org/pipermail/python-ideas/2007-May/000744.html
        # We just rebuild the heap from scratch after passing to super.
         
        super(priority_dict, self).update(*args, **kwargs)
        self._rebuild_heap()
 
    def sorted_iter(self):
        """Sorted iterator of the priority dictionary items.
 
        Beware: this will destroy elements as they are returned.
        """
         
        while self:
            yield self.pop_smallest()
    def empty(self):
        return (len(self._heap) == 0)

Generic Circular Buffer

This is the generic version of the Circular Buffer I have posted before. You can find the non-generic version of the circular buffer here


The header file:
CircularBuffer.hpp

#ifndef MCP_CIRCULAR_BUFFER_GENERIC_H_
#define MCP_CIRCULAR_BUFFER_GENERIC_H_

template<class T>
class CircularBuffer {
public:
  // Creates a buffer with 'slots' slots.
  explicit CircularBuffer(int slots);
  // Destructor.
  ~CircularBuffer();
  // Writes 'value' to the next available slot. It may overwrite
  // values that were not yet read out of the buffer.
  void write(const T & value);
  // Returns the next value available for reading, in the order they
  // were written, and marks slot as read. 
  // If the buffer is empty returns -1.
  void read(T* val);
  // Removes all the elements from the buffer.
  void clear();
  // returns true if the Circular buffer is empty, false otherwise
  bool isEmpty();

private:
  //array of integers
  T* data_;
  // the size of the buffer
  int  num_of_slots_;
  //index to read the next integer from buffer
  int  read_index_;
  //index to write a new integer to buffer
  int  write_index_;
  // Non-copyable, non-assignable.
  CircularBuffer(CircularBuffer&);
  CircularBuffer& operator=(const CircularBuffer&);
};

template <class T>
CircularBuffer<T>::CircularBuffer(int slots) {
  if (slots <= 0) {
    num_of_slots_ = 10; /*pre-assigned value */
  } else {
      num_of_slots_ = slots;
  }
  clear();
}

template <class T>
CircularBuffer<T>::~CircularBuffer() {
  delete[] data_;
}

template <class T>
void CircularBuffer<T>::write(const T & value) {
  data_[write_index_] = value;
  if (read_index_ == -1) {
    //if buffer is empty, set the read index to the
    //current write index. because that will be the first
    //slot to be read later.
    read_index_ = write_index_;
  }
  write_index_ = (write_index_ + 1) % num_of_slots_;
}

template <class T>
void CircularBuffer<T>::read(T* val) {
  if (read_index_ != -1) {  // if buffer is not empty
    *val = data_[read_index_];
    read_index_ = (read_index_ + 1) % num_of_slots_;
    if (read_index_ == write_index_) {
      /*all available data is read, now buffer is empty*/
      read_index_ = -1;
    }
  }
}

template <class T>
void CircularBuffer<T>::clear() {
  read_index_ = -1; /* buffer empty */
  write_index_ = 0; /* first time writing to that buffer*/
  delete[] data_;
  data_ = new T[num_of_slots_]; /* allocate space for buffer */
}

template <class T>
bool CircularBuffer<T>::isEmpty() {
  return read_index_ == -1;
}

#endif  // MCP_CIRCULAR_BUFFER_GENERIC_H_

Sample main file main.cpp

#include <iostream>
#include "include/CircularBuffer.hpp"

using namespace std;

int main()
{
    CircularBuffer<unsigned int> cb(5);
    unsigned int val;
    for (unsigned int i=1;i<=15;i++)
        cb.write(i);
    while(!cb.isEmpty())
    {
        cb.read(&val);
        cout << val << endl;
    }
    return 0;
}

Circular Buffer C++

circular buffer, also known as cyclic buffer or ring buffer is a special data structure that uses a single, fixed-size buffer as if it were connected end-to-end. That is, if you reach writing to the last available slot in the buffer, the next slot to write into is the first slot of the buffer. You can find the gereric version here


The following code is a sample implementation of that structure. You can freely use them if you need to. The code standard I am trying to follow is Google's C++ Style.


The header file:
circular_buffer.hpp

#ifndef MCP_CIRCULAR_BUFFER_H_
#define MCP_CIRCULAR_BUFFER_H_

namespace base {
// This is a fixed-size, circular buffer of integer.
// The class is not thread-safe,
class CircularBuffer {
public:

  // Creates a buffer with 'slots' slots.
  explicit CircularBuffer(int slots);
  // Destructor.
  ~CircularBuffer();
  // Writes 'value' to the next available slot. It may overwrite
  // values that were not yet read out of the buffer.
  void write(int value);
  // Returns the next value available for reading, in the order they
  // were written, and marks slot as read. If the buffer is empty returns -1.
  int read();
  // Removes all the elements from the buffer.
  void clear();

private:
  //array of integers
  int* data_;
  // the size of the buffer
  int  num_of_slots_;
  //index to read the next integer from buffer
  int  read_index_;
  //index to write a new integer to buffer
  int  write_index_;

  // Non-copyable, non-assignable.
  CircularBuffer(CircularBuffer&);
  CircularBuffer& operator=(const CircularBuffer&);
};
} // namespace base
#endif  // MCP_CIRCULAR_BUFFER_H_

circular_buffer.cpp

#include "circular_buffer.hpp"

namespace base {

CircularBuffer::CircularBuffer(int slots) {
  if (slots <= 0) {
    num_of_slots_ = 10; /*pre-assigned value */
  } else {
      num_of_slots_ = slots;
  }
  clear();
}

CircularBuffer::~CircularBuffer() {
  delete[] data_;
}

void CircularBuffer::write(int value) {
  data_[write_index_] = value;
  if (read_index_ == -1) {
    //if buffer is empty, set the read index to the 
 //current write index. because that will be the first 
 //slot to be read later.
    read_index_ = write_index_; 
  }
  write_index_ = (write_index_ + 1) % num_of_slots_; 
}

int CircularBuffer::read() {
  if (read_index_ == -1) {
    // buffer empty
    return -1;
  }
  int ret_val = data_[read_index_];
  read_index_ = (read_index_ + 1) % num_of_slots_;
  if (read_index_ == write_index_) {
    /*all available data is read, now buffer is empty*/
    read_index_ = -1;
  }

  return ret_val;  
}

void CircularBuffer::clear() {
  read_index_ = -1; /* buffer empty */
  write_index_ = 0; /* first time writing to that buffer*/
  delete [] data_;
  data_ = new int[num_of_slots_]; /* allocate space for buffer */
}

}  // namespace base

Python Line Profilers using Decorator Pattern

You can use any of the following decorators to profile your functions line by line.  The first one(Profiler1) is a decorator as a class and...