Robelle Tech: Loading Data into PostgreSQL

August 2002

Get the most out of your Robelle tools

Loading Data into PostgreSQL

By Aaron Holmes

There comes a time in every programmer’s life when he or she is faced with the problem of how to store and retrieve data on a larger scale and a simple flat file will not suffice. Being a Web developer and open source advocate, I have always turned to mySQL for my DBMS needs. However, I had been hearing more and more good things about PostgreSQL and had finally decided to give it a try to see what all the hype was about.

PostgreSQL is an Object-Relational database management system that supports almost all SQL constructs, including subselects, transactions, and user-defined types and functions. You can download, use, and modify the software for free, provided you include the copyright notice. Also, it is able to run under many OS types, include Unix variants and Windows machines under the Cygwin environment.

For my testing purposes, I used an older AMD Duron 800-Mhz system and installed FreeBSD 4.5 Stable on it. I then compiled Apache 1.3.26 with PHP 4.2.1 and PostgreSQL 7.2.1. PostgreSQL itself was very easy to install - a simple CVSup updated of the linux ports directory allowing me to compile a FreeBSD compatible version of the newest PostgreSQL source files.

To obtain the source files and help documentation for PostgreSQL you can point your browser to www.postgresql.org and select the mirror site closest to you.

With the software compiled and running, I created a new database. This was handled by the included createdb command, which accepts the syntax createdb databasename.

I needed some test data for my new database, so I decided to use some old data I had laying around from an address book program I used to use, which was stored in a comma delimited text file (Robelle’s Suprtool creates such files easily from TurboIMAGE data on the HP 3000).

Now seemed like a good time to build a new table within the database. To do this I use the psql command, which is also included with PostgreSQL. To run psql you type psql mydbname from the command prompt. Psql is the interactive terminal program that allows you to interactively enter, edit, and execute SQL commands directly.

The actual code to create a table within psql is as follows:

CREATE TABLE addressbook (

firstname varchar(25),

lastname varchar(25),

email varchar(20),

phone int(11),

age int(2),

);

I built the table to mimic that of the data I was using; included were a couple int columns and several varchar columns of varying size. The design of the table was overall very simple and resembled many of the tables I have created in mySQL for online use.

The next step was to populate the table with the data from the text file. This was easy to do by using the COPY command, which loads large amounts of data (either character or Binary) from flat-text files. You tell COPY which file to load by specifying FROM ‘/directory/file’. By default, COPY uses a tab (“\t”) character as a delimiter between fields; I had to change this behavior by including USING DELIMETERS ‘,’

The COPY command, however, has a few potential pitfalls. Such as if you don’t have enough columns in the file, you will get an error, but if you have too many columns you will get a warning only and the extra columns are ignored. Also remember that COPY is executed as a transaction, meaning that a single error in the data causes an undo of the entire import operation.

As always, it is good practice to read over the intricacies of the COPY command in the PostgreSQL help docs, which are online at www.postgresql.org/idocs

It is entirely possible to access, modify and remove the data from your database completely within the psql tool, but I often find it much more desirable to use PHP access the database via the Web. PHP, when compiled with the -with pgsql directive, comes with a vast library of functions for interacting with PostgreSQL (much the same as with mySQL).

Overall, the migration to PostgreSQL went smoothly without any unwarranted surprises. Of course this would not have been possible without the help of a few resources. Kudos to the PostgreSQL team for supplying the online documentation files on their Website. I also used two books to help me: “PostgreSQL Developer’s Handbook” by Evald Geschwinde and Hans-Jurgen Schonig, Published by Sams, 2002; “Beginning Databases with PostgreSQL” by Richard Stones and Neil Matthew, Published by Wrox Press, 2001.

PostgreSQL Technical Details

Recently we explored the process of transforming TurboIMAGE data into formats that are acceptable to Oracle. If you haven’t read that article (see robelle tech in the June, 2002 NewsWire), this would be a good time to read it, since it covers the basics of MPE/IMAGE datatypes, which are important regardless of why you are transforming your data. It’s online at robelle.com/tips/oracle-datatypes.html The datatypes in PostgreSQL look a lot like Oracle datatypes.

Character Datatypes

For character data, the most common datatype is VARCHAR(n) with a specified maximum number of characters. If all data values will be the same length, you can use CHAR(n) instead. For unlimited length text fields, use TEXT, which does not have a declared maximum size. Although the type text is not in the SQL standard, many other RDBMS packages have it as well. The longest possible character string that can be stored is about 1 GB.

There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type (CHAR(n)).

Multi-byte Characters such as UNICODE are supported, but you may have to recompile the database (no problem, since the source code is freely available).

Numeric Datatypes:

Smallint, int2. An integer of 2 bytes. Use for TurboIMAGE I1 or J1 or K1 fields.

Integer, int4. An integer of 4 bytes. Use for I2, J2 or K2 fields. This is the default datatype for integer fields.

Bigint, int8. An integer of 8 bytes (equivalent to I4 “quad” integers on the HP 3000). Use for TurboIMAGE I4 or J4 fields. Oracle does not have this datatype, but SQL Server does.

The bigint type may not function correctly on all platforms, since it relies on compiler support for eight-byte integers. On a machine without such support, bigint acts the same as integer (but still takes up eight bytes of storage). However, the PostgreSQL team is not aware of any reasonable platform where this is actually the case.

You may want to use type Integer for indeces, rather than Smallint or Bigint. Otherwise you must escape constants to avoid type casting errors. See the user manual for details.

Real. A 4 byte floating point field for approximate numeric values. Use for an E2 in TurboIMAGE.

Double Precision. An 8-byte floating point field for approximate numeric values.

Use for an E4 in TurboIMAGE.

Numeric (p,s) - precise numbers with possible decimal places. Similar to the Numeric datatype in Oracle and SQL Server, except that there is no limit to the precision!

You specify a precision and scale for the values. Precision is the maximum total digits in the values. Scale is the number of places to the right of the decimal. The maximum number of digits that can be placed to the left of the decimal is precision minus scale. For example, DEC(7,2) means the same as S9(5)V9(2) in COBOL. NUMERIC in PostgreSQL is similar to NUMERIC in Oracle, although it does not have the odd “negative scale factors” of Oracle (scale factor-3 in Oracle actually multiplies the value by 1000!).

Numeric is suitable for storing monetary values and others where exactness is required. However, Numeric is much slower than Real or Double Precision.

There is a Money datatype in PostgreSQL currently, but it has been deprecated because it is US-oriented. They suggest using Numeric instead.

Other Datatypes

Bit. String of bits.

Bytea. For storing binary strings, i.e., values without a character set or collation sequence attached. Similar to the BINARY LARGE OBJECT or BLOB in SQL 99.

Boolean. Value can be either True or False.

Date. Accepts dates in almost any format and has commands to specify how to handle ambiguous cases. The default and preferred format is the ISO-8601 format, as in 1999-01-08.

There are also datatypes for storing geometric information. And, users may add new types to PostgreSQL using the CREATE TYPE command.

Precompilers

As with SQL Server and unlike Oracle, PostgreSQL does not come with a precompiler for COBOL or FORTRAN. However, you may be able to use an SQL compliant precompiler from your compiler supplier, and there is a full-featured implemented of ODBC.

PostgreSQL does provide precompilers/interfaces for C, Perl, PHP, Java, Python and TcL/TK.

To learn more about migrating data to new environments, visit the Robelle Migration Web Center at suprtool.com/move.