System design and development by Kevin Duraj

The Graph Traversal Pattern

A graph is a structure composed of a set of vertices (i.e.nodes, dots) connected to one another by a set of edges (i.e.links, lines). The concept of a graph has been around since the late 19$^\text{th}$ century, however, only in recent decades has there been a strong resurgence in both theoretical and applied graph research in mathematics, physics, and computer science. In applied computing, since the late 1960s, the interlinked table structure of the relational database has been the predominant information storage and retrieval model. With the growth of graph/network-based data and the need to efficiently process such data, new data management systems have been developed. In contrast to the index-intensive, set-theoretic operations of relational databases, graph databases make use of index-free, local traversals. This article discusses the graph traversal pattern and its use in computing.

References:
Marko A. Rodriguez, Peter Neubauer
http://arxiv.org/abs/1004.1001

Changing default text editor to VIM

sudo update-alternatives --config editor


  Selection    Path                Priority   Status
------------------------------------------------------------
  0            /bin/nano            40        auto mode
  1            /bin/ed             -100       manual mode
  2            /bin/nano            40        manual mode
* 3            /usr/bin/vim.basic   30        manual mode
  4            /usr/bin/vim.tiny    10        manual mode

Installing new hard drive

sudo fdisk /dev/sdb
    p   primary partition (1-4)

sudo mkfs.ext4 -b 4096 /dev/sdb1
sudo mkdir /data2
sudo mount /dev/sdb1 /data2
sudo umount /data2
sudo lshw -C disk

Reference: https://help.ubuntu.com/community/InstallingANewHardDrive

Bash Escape and Replace String

#!/bin/bash
# How to escape and replace string path in file

# escape string to \/home\/kevin\/data
BASE='/home/kevin/data'
ESCAPED=`echo $BASE | sed -e 's/\//\\\\\//g'`

# replace escaped string using sed
echo "ESCAPED="$ESCAPED
sed -i "s/path/$ESCAPED/" data_file.txt

Map Reduce Algorithms

Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning.

References:
Jimmy Lin
http://www.umiacs.umd.edu/~jimmylin/book.html

Power law

power law is a special kind of mathematical relationship between two quantities. When the number or frequency of an object or event varies as a power of some attribute of that object (e.g., its size), the number or frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary as a power of the size of the population, and hence follows a power law. The distribution of a wide variety of natural and man-made phenomena follow a power law, including frequencies of words in most languages, frequencies of family names, sizes of craters on the moon and of solar flares, the sizes of power outages, earthquakes, and wars, the popularity of books and music, and many other quantities.

Reference: http://en.wikipedia.org/wiki/Power_law

Caching nameserver lookups

sudo apt-get install dnsmasq
    vi /etc/dnsmasq.conf

listen-address=127.0.0.1
vi /etc/dhcp3/dhclient.conf
prepend domain-name-servers 127.0.0.1;
vi /etc/resolv.conf

search yourisp.com
nameserver 127.0.0.1
nameserver xxx.xxx.xxx.xxx
sudo /etc/init.d/dnsmasq restart
dig google.com

-= REMOVE =-
apt-get remove --purge dnsmasq
apt-get clean

Reference: http://embraceubuntu.com/2006/08/02/local-dns-cache-for-faster-browsing/

Hash of Hashes

#!/usr/bin/perl
use strict;
use warnings;
#--------------------------------------------------------#
&make_hash2hash();
#--------------------------------------------------------#
sub make_hash2hash()
{
  my $hash_ref = {};

  $hash_ref->{first_one}->{second_two}   = "one";
  $hash_ref->{first_one}->{second_three} = "two";
  $hash_ref->{first_one}->{second_four}  = "three";
  $hash_ref->{first_two}->{second_five}  = "four";
  $hash_ref->{first_two}->{second_six}   = "five";

  &print_hashes( $hash_ref );
}
#--------------------------------------------------------#
sub print_hashes
{
    my $hash_ref = shift;

    for my $key1 ( keys %$hash_ref )
    {
        print "\n$key1\n";

        for my $key2 ( keys %{$hash_ref->{ $key1 }} )
        {
            print "   \\_ $key2\n";
        }
    }
}
#--------------------------------------------------------#

Subversion Commands

#-----------------------------------------------------------------------#
# Howto merge working branch into trunk, [svn co path/trunk]
#-----------------------------------------------------------------------#
1.) cd path/trunk
2.) svn merge -r min_rev:max_rev path/work_branch .
3.) svn commit -m "Merging from path/work_branch"
#-----------------------------------------------------------------------#
# Howto Rollback to previous revision
#-----------------------------------------------------------------------#
1.) cd path/working_branch
2.) svn merge -r HEAD:10374 .

Subversion Web Browser

SVN::Web - Subversion repository web frontend

Reference:http://search.cpan.org/dist/SVN-Web/