What I have leant

Friday, August 1, 2014

Mac系统国内VPN使用技巧-自动区分国内国外IP-自动重启VPN

自动区分国内国外IP

在国内上网的时候，VPN已经是必不可少的工具了，但是如果所有网站都用VPN连的话，某些国内网站会慢的离谱，有些甚至不能使用，作为懒人一个，当然想要有个简单方便的方法，不需要手动开关VPN。

以下教程只适用于Mac系统，目前使用的是10.9，其他版本的系统理论上也可用。

首先你得要一个支持PPTP的VPN，这个基本上是标配了。（OpenVPN也支持，请参考这里）

VPN配置的时候，要在System Preference里选择“所有流量都通过VPN”这一选项，然后开启VPN。

进入 https://github.com/fivesheep/chnroutes ，选择Download ZIP。双击下载的ZIP包解压。

接下来开启命令行，


$cd <zip包文件夹>              #进入解压后的ZIP包文件夹

$python chnroutes.py -p mac #生成国内、国外IP过滤文件,此时会生成ip-up跟ip-down两个文件
$sudo cp ip-up ip-down /etc/ppp #将生成的2个文件拷贝到/etc/ppp

$sudo chmod a+x /etc/ppp/ip-up /etc/ppp/ip-down   #将这些文件设置为可执行

然后重启VPN，搞定。

测试方法，用浏览器打开youku，播放一下带版权的电影，如果能正常播放就是OK了，否则会提醒仅限大陆播放。

如果出现国内网站不能访问的情况，尝试以下命令，并重连VPN。
$sudo sh /etc/ppp/ip-down

自动重启VPN

VPN的链接状态不是很稳定，断线重连好麻烦；我希望开机之后就能自动连上VPN。这里介绍一种方法能自动连接/重连VPN。

打开AppleScript，方法：(Applications > Utilities)。在编辑器中输入以下代码：

on idle
    tell application "System Events"
        tell current location of network preferences
            set myConnection to the service "Your VPN Name"
            if myConnection is not null then
                if current configuration of myConnection is not connected then
                    connect myConnection
                end if
            end if
        end tell
        return 120
    end tell
end idle

将“Your VPN Name"换成自己的VPN名称，勾选“Stay Open”选项框，保存一下。将保存的文件拖到任务栏中，右键点击Options->Open at Login。重新登录用户，搞定。以上代码将以每2分钟一次间隔检测VPN链接状态，可以自行修改120为其他数值（单位为秒）。

Wednesday, May 21, 2014

Decode challenges from the Python Challenge

There is a fun website for Python beginners. Python Challenge. I will post some code I used to solve the problems.

0. 2 to the power of 38.

1. Translate the string by replace every letter with letter position 2 after. Notice the 'z' should translates to 'b'.

text = "g fmnc wms bgblr rpylqjyrc gr zw fylb. rfyrq ufyr amknsrcpq ypc dmp. bmgle gr gl zw fylb gq glcddgagclr ylb rfyr'q ufw rfgq rcvr gq qm jmle. sqgle qrpgle.kyicrpylq() gq pcamkkclbcb. lmu ynnjw ml rfc spj."
result = '';
for c in text:
    if 'a' <= c <= 'z':
        result += chr((ord(c) - ord('a') + 2) % 26 + ord('a'))
    elif 'A' <= c <= 'Z':
        result += chr((ord(c) - ord('A') + 2) % 26 + ord('A'))
    else:
        result += c
print result

2. Find the hidden word from string

text='.. text from html source..'
chList = {}
for ch in text:
    if ch in chList.keys():
        chList[ch] = chList[ch] + 1
    else:
        chList[ch] = 1

print chList #a,e,i,l,q,u,t,y = 1

for ch in text:
    if ch in 'aeilquty':
        print ch #equality

3. Find the hidden word by using regex

text='.. text from html source..'

pattern = "[a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]"

m = re.findall(pattern, text)
result = ""
for str in m:
    result += str[4]
print result

4. Send http request and parse the returning string

import urllib2
import re


link = urllib2.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345")
while link.getcode() == 200:
    #get another nothing code
    text = link.read()

    print text

    pattern = "the next nothing is [0-9]+"

    m = re.search(pattern,text)
    if m:

        tmp = m.group()

        pattern2 = "[0-9]+"

        m = re.search(pattern2, tmp)

        code = m.group()

    else:
        code = str(int(code) / 2)

    url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=" + code

    link = urllib2.urlopen(url)

5. 'Peak Hell' implies 'pickle', format the string, you will see the result

from __future__ import print_function
import pickle
import urllib2
import sys

text = urllib2.urlopen("http://www.pythonchallenge.com/pc/def/banner.p").read()

result = pickle.loads(text)



for arr in result:
    for tuple in arr:
        print(tuple[0]*tuple[1], end='')
    print('\n')

Monday, April 21, 2014

A list of solutions to solve the jar hell

Jar Hell is the situation when you have the different incompatible versions of library existed in your classpath simultaneously. For example, you have a project that depends on library A version 2.0, and you also need a third party library B which depends on library A version 1.0, then the library version 1.0 and version 2.0 will all be included in your classpath, it will cause some problems because machine doesn't know which version to choose in the right place.

I just encountered this problem in my recent project, so I did a lot of research of it. However, in theory, there's no best way to resolve it. So I list all the possible solutions here:

1. Avoid

First of all, you should consider this problem before you proceed. Find all the dependencies that your library will use, and make sure all library versions are compatible. It will save your a lot of troubles!

2. Find a transition version that works for both sides

If it already happened, then you should consider finding a transition version. Library changes gradually, so you may be luck to find a right version.

3. Use Jarjar

If your project won't involve any shared interface between the incompatible versions, then you should definitely try Jarjar, a jar wrap tool that can rename the embedded class files. It works like a charm, will do a lot of work with your imagination. There's a blog talking about it.

However, there are one limitation you should know:

If the incompatible versions of library have different interfaces, and these interfaces will be used in your project, then you are no lucky because even they have the some interface name, they are in different packages(jarjar changed the package name), so they are definitely different interfaces, you can't assign instances directly.

But there is also a solution to this case, which is to write adapter classes to wrap one interface to the other, it works in some situations.

4. Modify library source code

If you have access to the library source code, you could also change the incompatible code by yourself and wrap it as a customized jar to fulfill your need. It works well when the difference between versions is not that much.

5. Access Rule & Maven solution

I didn't dig into it too much, but it seems workable to some situation. The Access Rule (see this article) can forbid some classes from using. The Maven also have similar access rules.

Monday, February 10, 2014

Mac OS Commands that make your life easier

Access clipboard(pasteboard)

pbcopy & pbpaste
echo 'hello world!' | pbcopy #copy to the pasteboard

echo `pbpaste`

Power Manager Setting

pmset
pmset sleepnow #sleep the machine immediately

Network

Find info of particular port


sudo lsof -P | grep [PORT_NUM]

Process

Find process using PID
ps aux | grep [PID]
Kill Process with PID
kill [PID]

Friday, December 20, 2013

How to manually convert Non-RDF(CSV, txt) to RDF, insert into Virtuoso Triple-store and query the data using SPARQL

WHY I DID THIS

I was doing some research about how to use OpenLink Virtuoso in my own project. First I want to try something basic, so I start with Triple-store and SPARQL query. This article is about how to transform Non-RDF file to RDF, insert RDF into Virtuoso Triple-store, and then search the result by using SPARQL.

Brief introduction

Transform Non-RDF to RDF

There are a lot of tools can be used to transform Non-RDF to RDF. I will use the GRefine RDF Extension. Here is a tutorial video.

What is Virtuoso

OpenLink Virtuoso is an ambitious software, it provides Web, File, and Database server functionality alongside Native XML Storage, and Universal Data Access Middleware. And, yes, it has implementation of Triple-store to store RDFs.

Let's focus on the Virtuoso triple-store.

Virtuoso triple-store is built atop of traditional RDBMS( See implementation here ). Triples are stored in a table called RDF_QUAD(See table below). Every RDF is composed by triples which will be inserted into RDF_QUAD table.

RDF_QUAD table

Column Name	Data Type	Description
G	IRI_ID	Graph - Primary Key
S	IRI_ID	Subject - Primary Key
P	IRI_ID	Predicate - Primary Key
O	ANY	Object - Primary Key

Triple-store uses Graph to group triples. There is a IRI, similar to URI, to identify every Graph. So when you try to insert data into triple-store, you should specify the IRI of the Graph.

To search triples in a Graph, you should first give the IRI of the Graph, then use Virtuoso SPARQL (which is implemented atop of SQL by Virtuoso) to query the triple-store.

My experiment

My goal is to combine air quality data with disease data using Virtuoso triple-store engine. I found the data source on the AirNow and CDC Database.

Step1: Convert Non-RDF to RDF

First I imported data into GRefine RDF Extension. Of course, you should trim some irrelevant date before importing. After this step, you should see the similar results below.

Then I started to create RDF skeleton, select RDF->Edit RDF Skeleton. If you don't know how to do this, watch the tutorial video above.

You can see the RDF skeleton graph below.

The result should be like this:

Next is to export the RDF file, choose Export->RDF as XML or Export->RDF as Turtle. Done!

Step 2: Insert RDF into Virtuoso Triple-store

Before start, you should install the Virtuoso, I recommend you to use Virtuoso Open-source Edition. Try to build from source code to get the most stable version(which was 6.1.8 when wrote this blog).

Then connect Virtuoso by using command line tool: isql. The default isql location should be at /usr/local/bin/isql .

If you want to insert TTL(TURTLE file), use command below:

SQL> DB.DBA.TTLP_MT (file_to_string_output ('tmp/users.ttl'), '', 'http://mytest.com');

If you want to insert RDF/XML, use command below:

SQL> DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output ('tmp/Kingsley_Idehen.rdf'), '', 'http://mytest.com');

More details about RDF insertion, see here.

Step3: Query data with SPARQL

First, open the virtuoso web UI and log into system as dba. Choose the Linked Data tap. Open the SPARQL subtap.

Fill the Default Graph IRI as the one you used to insert your RDF.

Fill the SPARQL query in the Query section. If you have no idea what is SPARQL and how to use it, read this quick start guide.

Here is the result I got:

And this is the SPARQL I used:

PREFIX aa:<http://airnow.gov/>

PREFIX dd:<http://wonder.cdc.gov/>

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>



PREFIX dc: <http://purl.org/dc/elements/1.1/>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX time: <http://www.w3.org/2006/time#>

PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX place: <http://purl.org/ontology/places#>

select ?year,?month,?city,?state,?cause,?death,?O3,?PM

where {




?deathURI rdf:value ?death;

  dc:date ?date1;

  geo:location ?loc1;

  rdf:type 'Disease of the nervous system';

  rdf:type ?cause.




?date1 time:year ?year;

  time:month ?month.




?loc1 place:State ?state.




?O3URI rdf:value ?O3;

  geo:location ?loc2;

  dc:date ?date2;

  rdf:type 'O3'.




OPTIONAL{

?PMURI rdf:value ?PM;

  geo:location ?loc2;

  dc:date ?date2;

  rdf:type 'PM2.5'.

}




?loc2 place:State ?state;

  place:City ?city.




?date2 time:year ?year;

  time:month ?month.

}

ORDER BY ?year ?month

Wednesday, November 13, 2013

Red Hat/CentOS commonly used command line collection

Yum

yum install <package-name> #install package
yum remove <package-name> #remove package
yum erase <package-name> #delete package
yum search <keyword> #search if exsit package containing keyword
yum repolist #show repository list
yum update #update installed packages

Good site searching for repos: Linux packages search

How to add a yum repository mannually

Yum repository directory is located at /etc/yum.repos.d/, so if you have a yum .repo file, just copy this file into the directory.

And there are some repositories wrapped as rpm files, such as postgreSQL. What you need to do is to download the rpm files and install them.
See Also: Adding, Enabling, and Disabling a Yum Repository

RPM

rpm -i <package-name>.rpm #install rpm package
rpm -qa | grep <package> #check if installed

PS

ps #show process
ps -l #Display information associated with the following keywords: uid, pid, ppid, flags, cpu, pri, nice, vsz=SZ, rss, wchan, state=S, paddr=ADDR, tty, time, and command=CMD
ps aux | grep <keyword> #show details of process

Kill

kill [signal] PID #kill specified process using pid, most used is '-9' to kill process.

kill -l #display all the [signal]s
kill `ps | grep <keyword> | awk '{print $1}'` #kill process by <keyword>

TOP

top #display all process

top -user <user> #display <user>'s process

top -pid <pid> #display process by given <pid>

service

service <service-name> status #show service status(running or stopped)

service <service-name> start #start service

service <service-name> stop #stop service

hostname

hostname #show hostname
hostname --fqd #show fully qualified domain name(FQDN)
hostname NEW_NAME #change hostname temporary

See also: How to Change the Hostname of a Linux System

restart the network

/etc/init.d/network restart        #!Red Hat Specific, may not applicable in                                        other linux

Add third-party repositories

EPEL: see here

User

Add user

userdel user #delete user

useradd -G <group-name> <username> #add user to non-existed group
useradd -g <group-name> <username> #add user to primary group
usermod -a -G <group-name> <username> #add a existing user to existing group
usermod -g <group-name> <username> #change user's primary group to <group-name>
passwd <username> #setup password for user
id <username> #show user information

tar

tar -zxvf <file-name>.tar.gz #extract tar.gz file

Netstat

netstat -a #list all ports of TCP/UDP

netstat -at #list TCP connection

netstat -au #list UDP connection

netstat -l #list all listening ports

Thursday, October 24, 2013

Step by step deploying Storm-yarn on HDP2.0 Using Hortonworks Sandbox

Catalog

Install HDP2.0 using Sandbox;
Prepare for the Storm-yarn depolyment;
Set up Storm on your cluster

All the steps are based on Mac OS 10.8, Windows & Linux can be applied as well with some tiny differences.

Install HDP2.0 using Sandbox

What you need?

VMware Fusion/Play 5 (Virtualbox 4.2 can work as well)
Hortonworks Sandbox 2.0

Steps

1.Open VMware Fusion.

2.Click File -> Import.

3.The file browser opens. Select the appropriate Sandbox appliance file.

Click Open.

4.Import Library opens. Unless you have specific needs, the default values are fine.

Click Import.

5.The appliance is imported. A console window opens and the VM shows up in the Virtual Machine Library.

6.Start the VM. When the Sandbox has finished starting up, the console displays the login instructions.

Press fn+control+option+f5 to start command line.

7.Use a browser on your host machine to open the URL displayed on the console. You will see the index page. Now the HDP2.0 has been settled.

Prepare for the Storm-yarn depolyment

(To distinguish the commands between local Mac OS and VM, I assume the Mac OS command line starts with '$', and VM command line starts with '#')

1. Open the Terminal in your Mac

2. ssh from your Mac to VM;

$ ssh root@<Your VM IP>

Password is hadoop in default.

3. Disable selinux using the command:

# setenforce 0

4. Edit the SELinux configuration file:

# vi /etc/selinux/config

Change SELINUX=enforcing to SELINUX=disabled

5. Stop the iptables firewall and disable it.

# stop iptables
# service iptables stop
# chkconfig iptables off

6. Install the wget package

# yum -y install wget

7. Get the repo for Ambari and copy it to /etc/yum.repos.d

# wget http://public-repo-1.hortonworks.com/ambari-beta/centos6/1.x/beta/ambari.repo
# cp ambari.repo /etc/yum.repos.d

8. Install Oracle Java7 Development environment.

Download jdk file from Oracle Website to Mac local storage, choose the Linux x64 rpm package.
Copy downloaded rpm into the VM, let's say we downloaded jdk into ~/Download

$ scp ~/Download/jdk-7u<version>-linux-x64.rpm root@<Your VM IP>:/tmp

Install the jdk file

# rpm -ivh jdk-7u<version>-linux-x64.rpm

Set JAVA_HOME & PATH

# vi ~/.bash_profile

before

export PATH

JAVA_HOME=/usr/java/jdk1.7.0_45/
export JAVA_HOME
PATH=$PATH:$HOME/bin
PATH=$JAVA_HOME/bin:$PATH

Test your Installation to see if jdk7 installed successfully.(if it say java 1.6 instead of 1.7, check previous steps again)

# java -version

9. Install ntpd, start service and sync time

# yum -y install ntp
# service ntpd start

10. Run the Ambari server setup

# ambari-server setup -s -j /usr/java/jdk1.7<version>/

11. Start Ambari server & agent

# ambari-server start
# ambari-agent start

12. Install Maven 3.11

# wget http://mirror.symnds.com/software/Apache/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
# tar -zxvf apache-maven-3.1.1-bin.tar.gz
# mkdir -p /usr/lib/maven
# mv apache-maven-3.1.1 /usr/lib/maven
# vi ~/.bash_profile

Add Scripts before export PATH:

PATH=$PATH:/usr/lib/maven/bin

13. Get a copy of the repository for Storm on YARN from GitHub

# wget https://github.com/anfeng/storm-yarn/archive/master.zip

# unzip master.zip

14. Edit the pom.xml repos and Hadoop version to point at Hortonworks.

# cd storm-yarn-master
# vi pom.xml

Uncommand some lines and make some lines commanded as below:

Set up Storm on your cluster

1. Create a work folder to hold working files for Storm. Let's say '~/workspace/storm'

# mkdir -p ~/workspace/storm

2. Copy storm.zip to work folder. Go to your work folder and unzip storm.zip.

# cp lib/storm.zip ~/workspace/storm
# cd ~/workspace/storm
# unzip storm.zip

3. Add storm-0.9.0-wip21 and storm-yarn-master bin folders to path

# vi ~/.bash_profile

Add Scripts before export PATH

PATH=$PATH:$HOME/workspace/storm/storm-0.9.0-wip21/bin:$HOME/storm-yarn-master/bin

4. Add root user to hdfs group

# usermod -G hdfs root

5. Add storm.zip to hdfs /lib/storm/0.9.0-wip[*]/storm.zip

# sudo -u hdfs hadoop fs -put ~/storm-yarn-master/lib/storm.zip /lib/storm/0.9.0-wip[*]/storm.zip

You may encounter some permission problems, try this:

# cp ~/storm-yarn-master/lib/storm.zip /tmp/storm.zip

# chown hdfs:hdfs /tmp/storm.zip

# sudo -u hdfs hadoop fs -put /tmp/storm.zip /lib/storm/0.9.0-wip[*]/storm.zip

6. Start Maven in the storm-yarn-master folder.

# cd storm-yarn-master
# mvn package

7. Start Storm

# storm-yarn launch

You may encounter permission issues when launch storm, try step 4 in this section to set root to hdfs group

8. Get the stormconfig with the yarn application id. (application id should be like application_numbers_numbers)

# yarn application -list

9. We store the storm.yaml file in the .storm directory so the storm command can find it when it is submitting jobs.

# storm-yarn getStormConfig -appId <application id> -output ~/.storm/storm.yaml

10. Try running two of the sample topologies:

Word Count:

# storm jar lib/storm-starter-0.0.1-SNAPSHOT.jar storm.starter.WordCountTopology

Exclamation:

# storm jar lib/storm-starter-0.0.1-SNAPSHOT.jar storm.starter.ExclamationTopology

11. Storm UI monitoring tool

Still not working yet...to be continue.

Friday, August 1, 2014

Mac系统国内VPN使用技巧-自动区分国内国外IP-自动重启VPN

自动区分国内国外IP

自动重启VPN

广告

Wednesday, May 21, 2014

Decode challenges from the Python Challenge

Monday, April 21, 2014

A list of solutions to solve the jar hell

1. Avoid

2. Find a transition version that works for both sides

3. Use Jarjar

4. Modify library source code

5. Access Rule & Maven solution

Monday, February 10, 2014

Mac OS Commands that make your life easier

Access clipboard(pasteboard)

Power Manager Setting

Network

Process

Friday, December 20, 2013

How to manually convert Non-RDF(CSV, txt) to RDF, insert into Virtuoso Triple-store and query the data using SPARQL

WHY I DID THIS

Brief introduction

Transform Non-RDF to RDF

What is Virtuoso

RDF_QUAD table

My experiment

Step1: Convert Non-RDF to RDF

Step 2: Insert RDF into Virtuoso Triple-store

Step3: Query data with SPARQL

Wednesday, November 13, 2013

Red Hat/CentOS commonly used command line collection

Yum

How to add a yum repository mannually

RPM

PS

Kill

TOP

service

hostname

restart the network

Add third-party repositories

User

tar

Netstat

Thursday, October 24, 2013

Step by step deploying Storm-yarn on HDP2.0 Using Hortonworks Sandbox

Catalog

Install HDP2.0 using Sandbox

What you need?

Steps

Prepare for the Storm-yarn depolyment

Set up Storm on your cluster