It’s good to go back to basics once in a while

I like being randomly given small, simple programming challenges, because after dealing with NBA Stats Tracker for so long, I can find myself straying away from the simplest approaches in favor of something more complicated, that looks “up to par” with the rest of my coding as of late. (Yeah, talking big… I know.)

That false sense of “I’m really good right now” can make you forget of how simple some things are, and the simple ways with which they can be approached. Those simple ways tend to have applications in much more complex problems, at times, so keeping them at the back of your head should be important. “It can’t be that easy” is a thought that makes quite a lot of people mess up what should be the simplest of answers.

I remember during the Parallel Programming course, me and some friends had completed a team project and were to be examined on it orally. Due to my recent surgery, however, I couldn’t be in Patras, so I asked the graduate student to be examined via Skype, either at the same time, or on my own. So, in the end, we were all examined together, albeit I was in Athens with a headset on and couldn’t see anybody, and my friends were at the student’s office. He starts with some difficult questions about why we implemented this this way, how we’re avoiding deadlock, how we’re synchronizing the threads and why. Well, not difficult, but they require an understanding that, in most cases, only the person that picked the particular implementation has, unless the pick was the result of discussion between the group. However, after those questions, he goes ahead and asks, “So if I have 4 threads, how many items does each one have to process in this case?” We’re talking about an example where we’re trying to partition a 10×10 matrix to be processed by 4 threads. Do you see how easy this question is to answer? If you’ve already thought “of course, 25”, you’re right. It’s an easy question, with an easy answer. But after all those more difficult questions, we weren’t expecting a question to have a simple 100 by 4 division as an answer. So all of us were awkwardly silent for a few seconds, contemplating what the answer is, when someone (can’t remember now) hesitantly answered 25. The graduate student chuckled at our hesitation, and moved on.

That’s why you should never be prejudiced about the difficulty of a problem. Always keep some simple tools that you learned during your first semesters at the back of your head. Always consider the simplest of solutions and answers as a possible one, before moving to something more difficult.

Case in point, I was asked by a user at the NLSC forums to help him design an algorithm that find the mode (most popular item/s) of a set of integers. Really simple, right? I thought so too. And then it took me 10 minutes to figure out the solution, because I had completely forgotten how to think small, and simple.

So the answer depends on the kind of integers in the set, and how efficient you want to be with speed and storage. If the set is one of continuous numbers, you just have to know the minimum and range of the set in order to be efficient with your structures. If you iterate through the set and find them out, then you just need an array of length equal to the set’s range, where the first item of that array will correspond to the occurrences of the set’s minimum, the second item is (min+1)’s occurrences, and so on.

int min = set[0];
int max = set[0];
for (int i = 1; i < set.Length; i++)
{
    if (min > set[i])
    {
        min = set[i];
        range = max - min;
    }
    else if (max < set[i])
    {
        max = set[i];
        range = max - min;
    }
}

int[] occurences = new int[range];

for (int i = 0; i < range; i++)
{
    occurences[i] = 0;
}

int max = 0;
for (int i = 0; i < set.Length; i++)
{
    int j = set[i] - min;
    occurences[j]++;
    if (occurences[j] > max)
    {
        max = occurences[j]; // or max++;
    }
}

for (int i = 0; i < range; i++)
{
    if (occurences[i] == max)
    {
        print("%d is a mode of the set.", numbers[i])
    }
}

Of course, this is only useful if the elements of the set are continuous. If they aren’t, you’re wasting space on items that will never have an occurrence count of more than 0. In that case, you’re better off using a dynamic structure. A linked list of structs that have the number and its occurrences, or even better, a C++ map or a C# dictionary.

Here’s an example of how today’s languages make things so much easier:

Dictionary<int, int> occurences = new Dictionary<int, int>();
HashSet modes = new HashSet();
int[] set = ...

int max = 0;
for (int i = 0; i < set.Length; i++) {    if (occurences.ContainsKey(set[i]))    {       occurences(set[i])++;    }    else    {       occurences.Add(set[i], 1);    }    if (occurences(set[i]) >= max)
   {
      if (occurences(set[i]) > max)
	  {
        max = occurences(set[i]);
		modes.Clear();
	  }
	  modes.Add(set[i]);
   }
}

foreach (int i in modes)
{
	Console.WriteLine(i + " is a mode of the set.");
}

I could go into how this helped me realize how spoiled I was by LINQ and how I was forgetting to do the simplest of things (like keeping a max while creating a set, and keeping a set of just those max instances instead of wasting space keeping them all only to limit them later to the ones I’d use), but I won’t waste more of your time. I’m just reminding you that once in a while, it’s nice to go back to basics. Maybe you’ll see things in your code that are overly complex for no reason. Even better, maybe you’ll (re)learn to code better.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s