A lot of progress has been made throughout the 4 weeks that this update will cover but I will try to make it into a concise and example-centric blog post.
The 4 weeks were spent on building my modular neural network as I planned to do before. It did take a bit longer than expected, in part because of a lot of other schedules and some because of a lot of debugging had to have happened to make the process work. However, the standard structure that I created beforehand made the creation process a lot easier to handle and my pre-existing code for calculations were a reference I could come back to when I was stuck.
For the first week (March 18) I mostly worked on creating the setup phase of the neural network. This is where the “structure” of the network would be determined, and it will create a new set of parameters for each of the layers in the network. It wasn’t too hard of a job since it mostly followed the planning I had done before starting, but a few bugs did come through.
For example, this one happened because I made the setup to calculate the dimensions of the network in each layer. However, I did not consider that the fully connected layer could not simply take in a 3d-input. So, I realized I needed to restrict the structure of the network so that there has to be a flattening layer between convolution and fully connected layers.
Other than that, most of the writing and debugging was done in 3~4-ish hours or so. I also wrote the code to save the network and ask the user for the specifics of each layer. However, China Cup made it hard to make much more significant progress other than a bit of work on the forward passing.
On the second week, the passing of the network was written. The easier ones were written first such as fully connected, squish, and output layers. Then the convolution and pooling layers were written.
One thing that I tried to work more into my code in this project was naming my variables before the actual code. It was especially a problem in my fully connected network where random references to an element in an array made it harder to read and harder to debug. So I laid out the variables and output arrays first before I started to calculate the values.
I also had a problem in slicing arrays using Numpy. Most resources used systems such as arr[3:5][3:5] to use a slice of a 2d array but it took a good half an hour of reading through documents and searching for tutorials to find out it is actually formatted like arr[3:5, 3:5] which looks less confusing on paper but makes absolutely no sense and is much more confusing with variables.
Anyway, after that was over with, I went through each function to check if they were correct and did some trial runs with pre-formatted parameters. All of this took significantly longer 6~7 hours of total work but was an interesting challenge. The concept of using functions as building blocks of other functions used as building blocks for a program also pressured me to try to make my code more efficient. I tried small things such as pre-calculating values that are used multiple times and using functions to save on memory.
On the third and fourth week, the focus was more on the back-propagating and learning parts of the network. This phase was over the break so I was able to invest a lot of time into this part, which was really needed.
While coding for backpropagation, I realized there were a few data points missing from my forward passing stage and fixed a couple of bugs and then tried to run the network with the learning. I realized that the network wasn’t getting much better than random chance and in fact, it was returning the same value each time, so it was converging onto one output.
As seen in the time lapse, I had problems with array shapes, slicing, and array multiplication and addition. So I looked back at the code, tried stuff out, used my testing file to test how features work. And that was just the errors, the learning part still had problems. After hours of staring at the code (which I didn’t timelapse because it is literally just scrolling through the code), I found some mistakes like not dividing the changes by the batch number, inconsistencies in the forward pass and backward pass, etc.
Now I am pretty sure the network is working correctly (hopefully) and ready to move on. I am still considering what to do first, either make a small app so that someone could come and write a number on a screen and it would detect the number or start immediately on the neural network “tutorial”. I think since I have a limited amount of time I will just start the tutorial so that I can schedule myself to at least finish more than 80% of it by the end.
The main challenge to me in this explanation program is who I want my audience to be and what I want them to have learned by the end of the program. I definitely want the audience to be pretty wide and want it to be accessible by most people but I don’t want to limit myself to stating everything everyone else has already said. I also want the audience to get insight and develop more inquiries by the end of the process. So, I’m trying my best to come up with various ideas and goals that I want for this project. I have talked with Mr. Beatty and he gave me some insight on possible approaches to explaining some of the topics, where I could focus on and some packages I could use. I will try my best to continue my thought process and document it for my blogs and myself.