Hi all,
I have created another blog on my own server. Hopefully this will give more flexibility in terms of design and plugins. Here is the link to new server:
http://findnwrite.com/musings
Filed under: General | Leave a Comment »
Hi all,
I have created another blog on my own server. Hopefully this will give more flexibility in terms of design and plugins. Here is the link to new server:
http://findnwrite.com/musings
Filed under: General | Leave a Comment »
I just discovered there is a simple way to use ruby gems (or ruby libraries) in your mapper or reducer script even if you don’t have administrative rights. Below is a short and quick explanation of how to do this. One of the parameter in hadoop streaming is “-cacheArchive”. It allows you to specify path of the archive on the master machine and create a symbolic link. You can read more about it over here. In order to use ruby gems, we will need to do four simple steps
Step 1. Zip gem source code:
Download the source code of a gem and zip it. Lets assume you want to use the awesome geokit gem. At the top level of the geokit gem there is one file (geokit.rb) and a folder (geokit). Use the following command on MacOSX to create a zip file
$> zip -r geokit.zip geokit.rb geokit
Note: use -r parameter to recursively include subfolders
Step 2: Upload the zip file on hadoop’s distributed file system
$> hadoop dfs -copyFromLocal geokit.zip lib/
Step 3: Tell Hadoop about the zip file
In your hadoop streaming file, use the cacheArchive option to specify location of the gem and also its symbolic link. Below is a just an example of a hadoop streaming file. Note that Hadoop will unzip the file before running mapper script and hence the files inside the zip will be available to our ruby mapper.
#!/bin/bash
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/.../hadoop-0.20.1-dev-streaming.jar
-input input_file
-output <output_file>
-mapper "ruby Mapper.rb"
-file code/Mapper.rb
-cacheArchive hdfs://machine-name:port-number/user/user_name/lib/geokit.zip#geokitgem
Step 4: Tell ruby mapper/reducer about the gem
Now modify ruby library path in Mapper.rb as follows
#file: Mapper.rb
$: << 'geokitgem/'
require 'rubygems'
require 'geokit'
That’s it.
Filed under: Hadoop, Programming, Ruby | Tagged: Gem, Hadoop, Ruby, Streaming | 2 Comments »
Tip 1: Importing DBF Files
shp2pgsql command is mainly to import shapefiles. However, it does come with an optional parameter -n that allows to import dbf files. You might need to install postgis in order to have shp2pgsql command.
/> shp2pgsql -n filename > outfile.sql
/> pgsql -h hostname -U username -d database -f outfile.sql
The above two commands can be further shortened into a single command
> shp2pgsql -n filename tableName dbName | psql -d dbName
Tip 2: Calculating Area
Use transform function (st_transform) to project geometry onto some spatial reference system. I would recommend using a projection system that preserves area. Then use the st_area function. The unit depends on the unit used by the projected spatial reference system; most likely it will be in meters (or square meters for area). An excellent resource to find SRID number is http://spatialreference.org/
/> select st_area(st_transform(the_geom, 3035 ) from table
Tip 3: Counting number of words
Unlike char_length, which returns number of characters in a string, there is no function to count number of words. However, you can nest two function in order to number of words.
/> select array_upper(regexp_split_to_array('this is trial. it should return 7', E'\\s'), 1);Tip 4: Fixing “Operations on mixed Geometries” Error
Checkout my previous post
Tip 5: Fixing “Ring Self-Intersection” Error
Use ST_Simplify or ST_SimplifyPreserveTopology functions to make sure all the geometries are valid geometries and there are not self-intersections. I would recommend first trying ST_SimplifyPreserveTopology operation and then ST_Simplify operation as shown below.
> update <table> set the_geom = ST_SimplifyPreserveToplogy(the_geom, 1) where ST_IsValid = false
> update <table> set the_geom = ST_Simplify(the_geom, 1) where ST_IsValid = false

Filed under: Database, postgis, Postgres, Tips | Leave a Comment »
PostGIS usually raises “operations on mixed geometries” while trying to use topological functions (such as st_intersects, st_within, etc). There can be many different reasons because of which one might encounter this error. Below are the steps that I found useful in resolving this error.
1. Validate Geometries: Make sure all the geometries are valid. psql> select *, ST_IsValidReason(<geometry_column>) from <table> where ST_IsValid(<geometry_column>) = false
2. Simplify Invalid Geometries: Skip this step if you didn’t find any invalid geometries in the above step. If you did, then use the ST_Simplify or ST_SimplifyPreserveTopology functions to fix them. psql> update <table> set <geometry_column> = ST_Simplify(<geometry_column>, <tolerance>) where ST_IsValid(<geometry_column>) = false
3. Check SRID: Then make sure that all the geometries have the same projection system. You can check all the different SRID values using the following querypsql> select distinct(SRID(<geometry_column>)) as srid, count(*) from <table> group by srid
4. Update SRID: Skip this step if you find only one SRID value in the above step. If not, then use the setSRID values to set SRID for geometries or ST_Transform to transform from one projection system to another. psql> update <table> set <geometry_column> = setSRID(<geometry_column>, <SRID_Value>) where SRID(<geometry_column>) <> <SRID_Value>
5. Check Geometry_Columns Table: Postgis maintains basic information about any geometry column in any table in a separate table known as “Geometry_Columns”. Make sure that there is an entry for your table in the “Geometry_Columns” table. psql> select * from geometry_columns where f_table_name = <table>
6. Insert geometry information in Geometry_Columns Table: if in the above step (step 5), you didn’t find any entry for the geometry column, then manually add it to geometry_columns table psql> INSERT INTO geometry_columns(f_table_catalog, f_table_schema, f_table_name, f_geometry_column, coord_dimension, srid, "type") SELECT '', '<schema>', '<table>', '<geometry_column>', ST_CoordDim(<geometry_column>), ST_SRID(<geometry_column>), GeometryType(<geometry_column>) FROM <schema>.<table> where <geometry_column> is not null LIMIT 1
7. Use UpdateGeometryColumn function: if you performed step 6, then this step is optional. Use updateGeometryColumn to set SRID valuepsql> select updateGeometryColumn(<schema>, <table>, <SRID_Value>) where f_table_name = <table>
8. Now try your original query. That’s all. Now try running your command and most likely you should be able to get it working.

Filed under: Database, postgis, Postgres, Tips | 1 Comment »
Automation
Shell scripting is a powerful way to automate things. However, writing a shell script to automate interactive shell commands can be tricky. For instance, I weekly download log files from the server to my local machine. For this, I use scp command. While I had written shell script to automate this process, it wasn’t really an automation. The scp command requires password, which I cannot pass as a parameter but has to be entered interactively. Thus, my script wasn’t really an automation. To overcome this specific issue (entering password for using scp command), you can use RSA key as suggested here. However, this would only solve the password problem. Also for a newbie like me who is entered the Mac world recently, this seemed to be much more involving. I was looking for an approach that was much more generic, simple and less geeky.
Luckily, today I came across another solution that satisfy all of my above constraints. However, the simplicity of this second approach approach comes with the price of being less secure. As explained here, the second solution is to use the “expect” command. For instance below a sample code that shows how to use expect command to write an interactive shell script.
#!/usr/bin/expect -f
spawn scp user@hostname.com:/path_to_file/filename.tar /tmp/
expect {
-re ".*sword.*" {
exp_send "password\r"
exp_continue
}
}
exit
In order to get the above code, however, make sure you have following things correctly done:
Expect command, as used above, is not limited to sending password but can be used to send any other parameter. See the original post for an example.This make expect command much more generic than using RSA key. However, as you might have notice that you have explicitly type the password in the file. Anyone with system administrator privileges can easily open you shell script and look at the password. Hence, as I said before, the simplicity of the “expect” command tradeoffs security. Nevertheless, I am quite thrilled to find this and having fun with my new or real automation script.
Enjoy
Filed under: General | 3 Comments »