http://compute.dtu.dk/~abll//Rants by ABLL2014-11-10T15:42:58+01:00Anders Boesen Lindbo Larsenhttp://compute.dtu.dk/~abll/http://compute.dtu.dk/~abll/blog/cudarrayIntroducing CUDArray2014-11-08T00:00:00+01:00<p>Neural networks and deep learning are booming (still).
Quite a few software frameworks have appeared over the last year, though none that allow high-level Python/NumPy programming with fast underlying array operations.
In this post, I present my attempt at making the two ends meet.
<a href="http://github.com/andersbll/cudarray">CUDArray</a> is a CUDA-accelerated subset of the NumPy library with support for neural networks as its primary goal.</p>
<p>A couple of weeks ago, I tried implementing a handful of NumPy functions using CUDA which turned out to be pretty fun (I just had to fill in the blanks for NumPy's fine interface).
With my appetite whetted, I started developing CUDArray and a deep learning library, <a href="http://github.com/andersbll/deeppy">deeppy</a> on top of it.
I plan on elaborating on deeppy in another post - in the meantime, I encourage you to look at the <a href="http://github.com/andersbll/deeppy/tree/master/examples">examples</a>.</p>
<p>The behavior of CUDArray resembles that of <a href="http://github.com/cudamat/cudamat">CUDAMat</a>/<a href="http://www.cs.toronto.edu/%7Etijmen/gnumpy.html">Gnumpy</a> to a large degree and you might ask why I didn't just build on top of these libraries.
Most notably, CUDAMat uses <a href="https://docs.python.org/3/library/ctypes.html">ctypes</a> whereas CUDArray uses <a href="http://cython.org/">Cython</a> to wrap C/C++ code.</p>
<p>I should warn you that CUDArray is still in its infancy and that there is a lot of work to be done to mature and extend the framework.
However, I find CUDArray/deepy a very promising approach to deep learning considering the few weeks it took to code.</p>
<p>Have a look at the <a href="http://github.com/andersbll/cudarray">source</a> and be sure to check out the <a href="/%7Eabll/pubs/larsen2014cudarray.pdf">technical report</a> since it's the only documentation I have written so far!</p>
<p>Neural networks and deep learning are booming (still).
Quite a few software frameworks have appeared over the last year, though none that allow high-level Python/NumPy programming with fast underlying array operations.
In this post, I present my attempt at making the two ends meet.
<a href="http://github.com/andersbll/cudarray">CUDArray</a> is a CUDA-accelerated subset of the NumPy library with support for neural networks as its primary goal.</p>
http://compute.dtu.dk/~abll/blog/fft_based_cnnFFT-based convolutional neural networks2014-07-14T00:00:00+02:00<p>A few weeks ago, I decided to implement my own convolution operations for the GPU.
My motivation was the need for an implementation that could be easily modified.
Unfortunately, most implementations available online are either slow or a big mess code-wise:</p>
<ul>
<li><a href="http://code.google.com/p/cuda-convnet/">cuda_convnet</a>: Very fast thanks to the highly tuned CUDA code. The convolutions functions alone (<a href="http://code.google.com/p/cuda-convnet/source/browse/trunk/src/cudaconv2/weight_acts.cu">1</a>, <a href="http://code.google.com/p/cuda-convnet/source/browse/trunk/src/cudaconv2/filter_acts.cu">2</a>, <a href="http://code.google.com/p/cuda-convnet/source/browse/trunk/src/cudaconv2/img_acts.cu">3</a>) are several 1,000 lines of code. Impossible to edit for anyone besides the original author.</li>
<li><a href="http://www.deeplearning.net/software/theano/">Theano</a>: Slower than cuda_convnet, still a bit messy using shared memory and other CUDA tricks.</li>
<li><a href="http://caffe.berkeleyvision.org/">Caffe</a>: In my experience a factor of 3 slower than cuda_convnet (the authors state otherwise). Their implementation is nice and simple consisting of <a href="http://www.mathworks.se/help/images/ref/im2col.html">im2col</a> operations and cuBLAS matrix multiplications.</li>
</ul>
<p>I played around with all three above and even tried to do my own vanilla CUDA implementation (Big mistake! The performance was ~8 times slower than cuda_convnet).</p>
<p>I then discovered the paper <a href="http://arxiv.org/abs/1312.5851">Fast Training of Convolutional Networks through FFTs</a>, which was quite an interesting read (if only I had found it earlier!).
FFT-based convolutions had crossed my mind before, but I suspected the filter sizes were too small for the convolutions to be worthwhile in Fourier domain.
As it turns out, FFT-based convolutions are quite competitive; mainly for the following reasons:</p>
<ul>
<li>The Fourier transformations of filters can be reused as the filters are convolved with multiple images in a mini-batch.</li>
<li>The Fourier transformations of the output gradients can be reused when back propagating gradients to both filters and input images.</li>
<li>Summation over input channels can be performed in the Fourier domain, such that inverse Fourier transformations are only required once per output channel per image.</li>
<li>Efficient, batched FFT implementation are available using <a href="http://docs.nvidia.com/cuda/pdf/CUFFT_Library.pdf">cuFFT</a>.</li>
<li>The point-wise convolutions can be implemented as batched matrix multiplications using <a href="http://docs.nvidia.com/cuda/pdf/CUBLAS_Library.pdf">cuBLAS</a>.</li>
</ul>
<p>I also discovered that Sander Dieleman had experimented with <a href="http://benanne.github.io/2014/05/12/fft-convolutions-in-theano.html">FFT convolutions for Theano</a>.
Unfortunately, his implementation does not currently include back propagation of gradients.
Moreover, it is written in high-level Theano, which I suspect is not flexible enough for an efficient implementation.</p>
<h2 id="my-implementation">My implementation</h2>
<p>After the above failed attempts at doing my own convolutions, the FFT approach was a refreshing angle.
It took some time to figure out the functioning of the <a href="http://docs.nvidia.com/cuda/cufft/#function-cufftplanmany">batched</a> cuFFT operations with <a href="http://docs.nvidia.com/cuda/cufft/#advanced-data-layout">advanced data layout</a> (which, btw., I'd prefer any day over fiddling with indexing errors in ordinary convolutions!).</p>
<p>I now have a working implementation with the following highlights:</p>
<ul>
<li>Supports back-propagation of gradients for both input images and filters.</li>
<li>Supports 0-padding of image borders.</li>
<li>No limitations on filter sizes / number of channels. Though, one should aim for image dimensions that are powers of 2, 3, 5, or 7 for faster cuFFT operations.</li>
<li>Contains (almost) no GPU architecture specific fine-tuning. This is taken care of by cuBLAS/cuFFT.</li>
<li>Relatively simple implementation (under <a href="http://github.com/andersbll/theano_ops/blob/master/theano_ops/abll/src/abll/conv_bc01_fft.cu">350 lines</a> at the time of writing).</li>
</ul>
<p>The implementation is still WIP but looks promising in terms of speed.
It even comes with a crude Theano wrapper.
Benchmarks will follow as soon as the Theano integration is done.
I have yet to figure out how to properly handle buffers and reusing FFTs in back propagation functions.</p>
<p><a href="http://github.com/andersbll/theano_ops">Check it out today!</a></p>
<p>A few weeks ago, I decided to implement my own convolution operations for the GPU.
My motivation was the need for an implementation that could be easily modified.
Unfortunately, most implementations available online are either slow or a big mess code-wise:</p>
http://compute.dtu.dk/~abll/blog/simple_cnnA simple implementation of convolutional neural networks2014-05-22T00:00:00+02:00<p>I was recently asked for a simple implementation of a convolutional neural network (CNN).
The purpose was to allow GPU-savvy programmers to understand the problem by inspecting the code; and to serve as reference for their optimized implementation.
This request reignited a frustration from when I myself started looking into CNNs a couple of months ago: There are no easily read implementations available!
Most CNN implementations are either highly optimized GPU code or contain only barebone operations in a non-modular code structure.
In either case, the code is hard to read and the back-propagation algorithm is difficult to recognize.</p>
<p>As I failed to find anything usable online and, more likely, because I'm a computer scientist at heart, I ended up coding my own toy CNN from scratch!
The top priority was simplicity - so Python/NumPy was a given.
For the performance critical operations (convolution and max-pooling), I had to use Cython to get tolerable speed.
The implementation is <a href="https://github.com/andersbll/nnet">available on my Github</a>.
I have even included some <a href="https://github.com/andersbll/nnet/blob/master/examples">usage examples</a> that should work right out of the box - just a <code>git clone</code> away!</p>
<p><strong>Note:</strong> My CNN implementation is not in any way competitive with more mature libraries in terms of features and speed.
The code is only meant as a readable example of feed-forward neural networks.</p>
<h2 id="implementation-tips">Implementation tips</h2>
<p>Here is a list of lessons, of which some were learned in a pretty time-consumingly way.</p>
<ol>
<li>Check your back-propagated gradients for correctness using finite-difference calculation! You might as well think this into your program design from the start as this is the only sane way to verify the correctness of your implementation. And yes, bugs are inevitable.</li>
<li>Make the network modular at the layer-level. This is not new if you have studied other popular implementations like <a href="http://code.google.com/p/cuda-convnet/">cuda-convnet</a> or <a href="http://eblearn.sourceforge.net/">EBLearn</a>.
Each layer should then implement a method for forward propagating the input and a method for back propagating the gradients. </li>
<li>Standard library <a href="http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.signal.convolve2d.html">convolution operations</a> are not suitable for CNNs. Typically, you work on a batch of images per gradient update.
Each image in this batch contains multiple channels and you convolve each image with a 3D filter to get one output channel.
If you perform 3D convolution, you would be restricted to a 'valid' convolution because you should not move you along the channel axis.
If you perform 2D convolution you can perform 'valid', 'same' or 'full' convolutions as you wish, however, you must perform many separate convolutions which is not good in terms of efficiency.
Moreover, I have yet to figure out how one would calculate the gradients of the weights from standard convolution operations.</li>
<li>If you reach the edge of what NumPy is good for and it starts getting complicated; use Cython!
I spent a lot of time implementing max-pooling with <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html">striding tricks</a> to allow for sliding windows.
It was a mess in terms of readability.
In comparison, <a href="https://github.com/andersbll/nnet/blob/master/nnet/convnet/pool.pyx">a couple of nested for loops</a> in Cython are both easier to read and faster.</li>
</ol>
<p>I was recently asked for a simple implementation of a convolutional neural network (CNN).
The purpose was to allow GPU-savvy programmers to understand the problem by inspecting the code; and to serve as reference for their optimized implementation.
This request reignited a frustration from when I myself started looking into CNNs a couple of months ago: There are no easily read implementations available!
Most CNN implementations are either highly optimized GPU code or contain only barebone operations in a non-modular code structure.
In either case, the code is hard to read and the back-propagation algorithm is difficult to recognize.</p>
http://compute.dtu.dk/~abll/blog/hello_worldHello brave new world2014-04-25T00:00:00+02:00<p>After months of rumination, I have finally converged to a good personal website solution.
As it turned out, the toughest part was to choose the right frameworks!
At first, I was reluctant to use the popular <a href="http://jekyllrb.com">Jekyll</a> (I'm not a Ruby programmer) in favor of the Python-based <a href="http://docs.getpelican.com/">Pelican</a>.
However, Jekyll and its surrounding ecosystem is more mature and I have not run into limitations like I did with my initial attempt using Pelican.</p>
<p>Some highlights of this website are:</p>
<ul>
<li>Static site generation.
This allows me to edit plain text files with <a href="http://en.wikipedia.org/wiki/Markdown">Markdown syntax</a> and having them transformed to static HTML that doesn't require fancy server-side logic.</li>
<li><a href="http://en.wikipedia.org/wiki/Responsive_web_design">Responsive design</a> using <a href="http://getbootstrap.com/">Bootstrap</a>.
Bootstrap also provides a useful library of sane building blocks that allow a HTML/JavaScript illiterate like me to produce cross-platform websites.</li>
<li>Good separation between layout and content.</li>
<li>Easy integration with BibTeX using <a href="http://github.com/inukshuk/jekyll-scholar">Jekyll-Scholar</a>.</li>
<li>I can deploy changes and new content to the website with a simple <code>make deploy</code> command.</li>
</ul>
<p>The source of my website is <a href="http://github.com/andersbll/website">on Github</a> - feel free to grab whatever you might find useful. Also, you should consider adding plenty of <a href="http://html9responsiveboilerstrapjs.com/">HTML9</a> pizzazz!</p>
<h2 id="purpose-of-this-blog">Purpose of this blog</h2>
<p>I have included a blog on my website as an attempt to exercise my communication skills.
While academia is still focused on the traditional publication pipeline through peer-review, there are plenty of good web-based publication alternatives nowadays.
A blog is a practical way to publish work and findings in an informal manner.
Another motivation behind this blog is the chance to dump the academic by-products that would never get published otherwise (e.g. presentations, notes, tutorials, mediocre results).
I suspect these by-products can be put online without too much work - and they might even be useful for others.</p>
<p>So far, this is all cheap talk on my part as I have yet to fill up this website!
My intentions are to write on a monthly basis (a lower bound).
At least, now that I have a website, I have one less excuse for keeping my work to myself.</p>
<p>After months of rumination, I have finally converged to a good personal website solution.
As it turned out, the toughest part was to choose the right frameworks!
At first, I was reluctant to use the popular <a href="http://jekyllrb.com">Jekyll</a> (I'm not a Ruby programmer) in favor of the Python-based <a href="http://docs.getpelican.com/">Pelican</a>.
However, Jekyll and its surrounding ecosystem is more mature and I have not run into limitations like I did with my initial attempt using Pelican.</p>