Database Subsetting ToolSourceForge.net Logo
java.net Member Button: 90px wide
    Home     Tutorial     Documentation     Download        

 

 

Tutorial (using Command Line Interface)

Prerequisites

  • Java JRE 1.5 or above (download)
  • R-DBMS with JDBC-driver
  • jailer_1.0.zip


Step 1. Setup Jailer

Unpack jailer.zip:

$unzip jailer_0.9.6.zip
$cd jailer
$ll
drwxr-xr-x  3 wisser users  136 2007-06-01 10:12 jailer.sh
-rw-r--r--  1 wisser users 1175 2007-06-01 10:10 build.xml
drwxr-xr-x  3 wisser users  136 2007-06-01 10:10 config
drwxr-xr-x  3 wisser users  344 2007-06-01 10:10 datamodel
drwxr-xr-x  3 wisser users  184 2007-06-01 10:10 example
drwxr-xr-x  3 wisser users  160 2007-06-01 10:10 extractionmodel
drwxr-xr-x  3 wisser users  272 2007-06-01 10:11 lib
drwxr-xr-x  3 wisser users  112 2007-06-01 10:10 restrictionmodel
drwxr-xr-x  4 wisser users  160 2007-06-01 10:10 script
drwxr-xr-x  4 wisser users   96 2007-06-01 10:10 src

Register the JDBC-Driver. Edit the file jailer.sh and put the JDBC-Driver into the class-path:

jailer.sh
JDBCLIB=path to the JDBC-Driver-jar
LIB=lib
CP=.:out:out/jailer.jar

# JDBC-driver
CP=$CP:$JDBCLIB/JDBC-Driver-jar

# configuration files in the config directory
CP=$CP:config

# the libraries
CP=$CP:$LIB/junit.jar
CP=$CP:$LIB/commons-logging.jar
CP=$CP:$LIB/log4j.jar
CP=$CP:$LIB/args4j.jar
CP=$CP:$LIB/spring.jar
CP=$CP:$LIB/jailer.jar

java -cp $CP net.sf.jailer.Jailer $@

Step 2. Setup the Database

Create a new schema and execute script/scott-tiger.sql. Make sure that the script runs successfully.

$db2 connect to wisser user scott using tiger
$db2 -tvf script/scott-tiger.sql

Step 3. Building the data model

Jailer needs to know all the tables and all associations between them, so we must tell him. Tables are defined in datamodel/table.csv,associations in datamodel/association.csv.

Fortunately most of the model definitions can be generated automatically by analyzing the relational database.

$ sh jailer.sh build-model com.ibm.db2.jcc.DB2Driver jdbc:db2://localhost:50001/wisser scott tiger
Jailer 0.9.5

Building data model.
See 'export.log' for more information.

$ cat export.log
2007-06-01 15:51:30,308 [main] INFO   - find tables with net.sf.jailer.modelbuilder.JDBCMetaDataBasedModelElementFinder@16fe0f4
2007-06-01 15:51:32,115 [main] INFO   - file 'datamodel/model-builder-table.csv' written
2007-06-01 15:51:32,175 [main] INFO   - find associations with net.sf.jailer.modelbuilder.JDBCMetaDataBasedModelElementFinder@16fe0f4
2007-06-01 15:51:32,298 [main] INFO   - find associations with DEPARTMENT
2007-06-01 15:51:32,323 [main] INFO   - find associations with SALARYGRADE
2007-06-01 15:51:32,328 [main] INFO   - find associations with EMPLOYEE
2007-06-01 15:51:32,369 [main] INFO   - file 'datamodel/model-builder-association.csv' written


Jailer finds the following tables and associations: 2

datamodel/model-builder-table.csv
# Name;      upsert; primary key;    ; author
DEPARTMENT;  N;      DEPTNO INTEGER; ;IBM DB2 JDBC Driver;
EMPLOYEE;    N;      EMPNO INTEGER;  ;IBM DB2 JDBC Driver;
SALARYGRADE; N;      GRADE INTEGER;LOSAL INTEGER;HISAL INTEGER;   ;IBM DB2 JDBC Driver;

datamodel/model-builder-association.csv
# Table A;    Table B;  first-insert; cardinality (opt); join-condition; name (opt); author
DEPARTMENT; EMPLOYEE; A; 1:n; A.DEPTNO=B.DEPTNO;   ; IBM DB2 JDBC Driver;
EMPLOYEE;   EMPLOYEE; A; 1:n; A.EMPNO=B.BOSS;      ; IBM DB2 JDBC Driver;


Copy the definitions into the files datamodel/table.csv and datamodel/association.csv.
One association is still missing: depending on his salary an employee is classified into a salary grade.
Add this definition manually.

datamodel/association.csv
# Table A;    Table B;  first-insert; cardinality (opt); join-condition; name (opt); author
DEPARTMENT; EMPLOYEE; A; 1:n; A.DEPTNO=B.DEPTNO;   ; IBM DB2 JDBC Driver;
EMPLOYEE;   EMPLOYEE; A; 1:n; A.EMPNO=B.BOSS;      ; IBM DB2 JDBC Driver;
EMPLOYEE;   SALARYGRADE; ; n:1; A.SALARY BETWEEN B.LOSAL AND B.HISAL; ; Wisser

Note that Jailer now knows more about the data model than the DBMS.

Step 4.  Examine the data model

1
Let's see what Jailer knows now about the model.

$ sh jailer.sh print-datamodel
DEPARTMENT (DEPTNO INTEGER NOT NULL)

  has dependent:
    EMPLOYEE                       1:n on A.DEPTNO=B.DEPTNO

EMPLOYEE (EMPNO INTEGER NOT NULL)

  depends on:
    DEPARTMENT                     n:1 on B.DEPTNO=A.DEPTNO
    EMPLOYEE                       n:1 on B.EMPNO=A.BOSS

  has dependent:
    EMPLOYEE                       1:n on A.EMPNO=B.BOSS

  is associated with:
    SALARYGRADE                    n:1 on A.SALARY BETWEEN B.LOSAL AND B.HISAL

SALARYGRADE (GRADE INTEGER NOT NULL, LOSAL INTEGER NOT NULL, HISAL INTEGER NOT NULL)

  is associated with:
    EMPLOYEE                       1:n on B.SALARY BETWEEN A.LOSAL AND A.HISAL


tables in dependent-cycle: EMPLOYEE

excluding following tables from component-analysis:  {  }

1 components:
 { DEPARTMENT, EMPLOYEE, SALARYGRADE }

Note that each association is listed twice. While associations are undirected, restrictions on them are directed. We will see later for what restrictions are good for and how to define them.

Step 5. Prepare the DB for exports

Jailer uses some tables for collecting entities inside the data base. The structure of these tables depends on the data-model, so we have to create the tables after building the data-model files. (You can easily re-create the tables after any model-changes)

$ sh jailer.sh create-ddl > jailer-ddl.sql
$ db2 -tvf jailer-ddl.sql


Step 6. Export evil Scott (unrestricted)

Now export the employee named Scott. To do that we need an extraction-model. Create a file named extractionmodel/scott.csv.

extractionmodel/scott.csv
# the employee named 'SCOTT' and all associated entities

# subject;          condition;               limit;       restrictions
EMPLOYEE;           NAME='SCOTT';                 ;


This extraction model describes a set of entities containing (the) employee(s) named 'SCOTT', entities associated with these employees, entities associated with these entities and so forth.

Export this set:

$ sh jailer.sh export -e scott.sql extractionmodel/scott.csv com.ibm.db2.jcc.DB2Driver jdbc:db2://localhost:50001/wisser scott tiger
Jailer 0.9.5

exporting 'extractionmodel/scott.csv' to 'scott.sql'
See 'export.log' for more information.
writing file 'scott.sql'...
file 'scott.sql' written.

$ cat export.log
2007-06-04 13:27:25,123 [main] INFO   - exporting EMPLOYEE Where NAME='SCOTT'
...
2007-06-04 13:27:25,249 [main] INFO   - export statistic:              22
2007-06-04 13:27:25,299 [main] INFO   - DEPARTMENT                     3
2007-06-04 13:27:25,299 [main] INFO   - EMPLOYEE                       14
2007-06-04 13:27:25,299 [main] INFO   - SALARYGRADE                    5
...


A file scott.sql is created containing Insert-statements for Scott, for his evil boss, for the president and for scott's department and salary-grade.

But why are there also statements for all other employees? (Bad luck for innocent James!)

Let Jailer explain why:

$ sh jailer.sh export -e scott.sql -explain extractionmodel/scott.csv com.ibm.db2.jcc.DB2Driver jdbc:db2://localhost:50001/wisser scott tiger > /dev/null
$ cat explain.log
EMPLOYEE(7788) --3--> SALARYGRADE(4, 2001, 3000) --6--> EMPLOYEE(7698) --1--> DEPARTMENT(30).
EMPLOYEE(7788) --3--> SALARYGRADE(4, 2001, 3000) --6--> EMPLOYEE(7698) --2--> EMPLOYEE(7521).
EMPLOYEE(7788) --3--> SALARYGRADE(4, 2001, 3000) --6--> EMPLOYEE(7698) --2--> EMPLOYEE(7844) --3--> SALARYGRADE(3, 1401, 2000).
EMPLOYEE(7788) --4--> EMPLOYEE(7566) --4--> EMPLOYEE(7839) --1--> DEPARTMENT(10).
EMPLOYEE(7788) --4--> EMPLOYEE(7566) --4--> EMPLOYEE(7839) --3--> SALARYGRADE(5, 3001, 9999).
EMPLOYEE(7788) --1--> DEPARTMENT(20) --5--> EMPLOYEE(7369).
EMPLOYEE(7788) --3--> SALARYGRADE(4, 2001, 3000) --6--> EMPLOYEE(7698) --2--> EMPLOYEE(7654).
EMPLOYEE(7788) --2--> EMPLOYEE(7876) --3--> SALARYGRADE(1, 700, 1200) --6--> EMPLOYEE(7900).
EMPLOYEE(7788) --3--> SALARYGRADE(4, 2001, 3000) --6--> EMPLOYEE(7782) --2--> EMPLOYEE(7934) --3--> SALARYGRADE(2, 1201, 1400).
EMPLOYEE(7788) --3--> SALARYGRADE(4, 2001, 3000) --6--> EMPLOYEE(7698) --2--> EMPLOYEE(7499).
EMPLOYEE(7788) --1--> DEPARTMENT(20) --5--> EMPLOYEE(7902).

#    1 EMPLOYEE                 -> DEPARTMENT                     n:1 on B.DEPTNO=A.DEPTNO
#    2 EMPLOYEE                 -> EMPLOYEE(SUBORDINATES)         1:n on A.EMPNO=B.BOSS
#    3 EMPLOYEE                 -> SALARYGRADE                    n:1 on A.SALARY BETWEEN B.LOSAL AND B.HISAL
#    4 EMPLOYEE                 -> EMPLOYEE(inverse-SUBORDINATES) n:1 on B.EMPNO=A.BOSS
#    5 DEPARTMENT               -> EMPLOYEE                       1:n on A.DEPTNO=B.DEPTNO
#    6 SALARYGRADE              -> EMPLOYEE                       1:n on B.SALARY BETWEEN A.LOSAL AND A.HISAL

Adams is Scotts subordinate and James and Adams are both classified in the same salary-grade.

Step 7. Export evil Scott (restricted)

If we export an employee we must export his boss and department too! Otherwise the set of exported entities would not be consistent (due to the foreign key constraints). No constraint prevents us from excluding the salary-grade from export, but we should'nt do that becauses the resulting set would also be inconsistent.

To exclude subordinates, 'same department'-members and 'same salary-grade'-employees, we must restrict some associations. To do so, define a restriction-model:

restrictionmodel/no-subordinates.csv
# from A (or association name); to B;       restriction-condition
SUBORDINATE;                  ;           ignore
DEPARTMENT;                   EMPLOYEE;   ignore
SALARYGRADE;                  EMPLOYEE;   ignore


The restriction-condition is an extension of the associations join-condition (expressed in SQL-syntax) for one direction of an association.
"ignore" stands for an unsatisfiable condition.

Note that the association between DEPARTMENT and EMPLOYEE is restricted in that direction by designating the source and destination table. It's obviously not possible to restrict reflexive associations the same way, so we have to give the 'subordinate of'-association a name.

datamodel/association.csv
# Table A;    Table B;  first-insert; cardinality (opt); join-condition; name (opt); author
DEPARTMENT; EMPLOYEE; A; 1:n; A.DEPTNO=B.DEPTNO;   ; IBM DB2 JDBC Driver;
EMPLOYEE;   EMPLOYEE; A; 1:n; A.EMPNO=B.BOSS;
SUBORDINATE; IBM DB2 JDBC Driver;
EMPLOYEE;   SALARYGRADE; ; n:1; A.SALARY BETWEEN B.LOSAL AND B.HISAL; ; Wisser


You can examine the restrictions the same way you examined the data model:

$ sh jailer.sh print-datamodel restrictionmodel/no-subordinates.csv
restricted by: [restrictionmodel/no-subordinates.csv]
DEPARTMENT (DEPTNO INTEGER NOT NULL)

  ignored:
    EMPLOYEE                       1:n on A.DEPTNO=B.DEPTNO

EMPLOYEE (EMPNO INTEGER NOT NULL)

  depends on:
    DEPARTMENT                     n:1 on B.DEPTNO=A.DEPTNO
    EMPLOYEE(inverse-SUBORDINATE)  n:1 on B.EMPNO=A.BOSS

  is associated with:
    SALARYGRADE                    n:1 on A.SALARY BETWEEN B.LOSAL AND B.HISAL

  ignored:
    EMPLOYEE(SUBORDINATE)          1:n on A.EMPNO=B.BOSS

SALARYGRADE (GRADE INTEGER NOT NULL, LOSAL INTEGER NOT NULL, HISAL INTEGER NOT NULL)

  ignored:
    EMPLOYEE                       1:n on B.SALARY BETWEEN A.LOSAL AND A.HISAL


tables in dependent-cycle: EMPLOYEE

excluding following tables from component-analysis:  { DEPARTMENT, SALARYGRADE }

1 components:
 { EMPLOYEE }


A restriction-model is part of the extraction-model. Create a new extraction-model:

extractionmodel/scott-without-subordinates.csv
# the employee named 'SCOTT' and all associated entities

# subject;          condition;               limit;       restrictions
EMPLOYEE;           NAME='SCOTT';                 ;      
no-subordinates.csv


and look what Jailer extracts now:

$ sh jailer.sh export -e scott.sql extractionmodel/scott-without-subordinates.csv com.ibm.db2.jcc.DB2Driver jdbc:db2://localhost:50001/wisser xbcsetup 1234 > /dev/null
$ cat scott.sql
-- generated by Jailer at Mon Jun 04 15:08:15 CEST 2007 from wisser@u19
--
-- extraction model:  EMPLOYEE where NAME='SCOTT' (extractionmodel/scott-without-subordinates.csv)
-- database URL:      jdbc:db2://localhost:50001/wisser
-- database user:     scott
-- exported entities: 7
--     DEPARTMENT                     2
--     EMPLOYEE                       3
--     SALARYGRADE                    2



Insert into SALARYGRADE(GRADE, LOSAL, HISAL)
  values (4, 2001, 3000),
         (5, 3001, 9999);
Insert into DEPARTMENT(DEPTNO, NAME, LOCATION)
  values (20, 'RESEARCH', 'DALLAS'),
         (10, 'ACCOUNTING', 'NEW YORK');
Insert into EMPLOYEE(EMPNO, NAME, JOB, BOSS, HIREDATE, SALARY, COMM, DEPTNO)
  values (7839, 'KING', 'PRESIDENT', null, '1981-11-17', 5000.00, null, 10);
Insert into EMPLOYEE(EMPNO, NAME, JOB, BOSS, HIREDATE, SALARY, COMM, DEPTNO)
  values (7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975.00, null, 20);
Insert into EMPLOYEE(EMPNO, NAME, JOB, BOSS, HIREDATE, SALARY, COMM, DEPTNO)
  values (7788, 'SCOTT', 'ANALYST', 7566, '1982-12-09', 3000.00, null, 20);

Freedom for the innocent!

Step 8. Delete Scott (unsuccessful)

It is also possible to create DML-scripts for deletion of exported entities:

$ sh jailer.sh export -e scott.sql -d delete-scott.sql extractionmodel/scott-without-subordinates.csv com.ibm.db2.jcc.DB2Driver jdbc:db2://localhost:50001/wisser scott tiger > /dev/null
$ cat delete-scott.sql
-- generated by Jailer at Tue Jun 05 10:20:40 CEST 2007 from wisser@u19
--
-- extraction model:  EMPLOYEE where NAME='SCOTT' (extractionmodel/scott-without-subordinates.csv)
-- database URL:      jdbc:db2://localhost:50001/wisser
-- database user:     scott
-- exported entities: 7
--     DEPARTMENT                     2
--     EMPLOYEE                       3
--     SALARYGRADE                    2
--
-- Tabu-tables:  {  }
--
-- entities to delete: 0


Jailer has exported 7 entities but didn't delete anything! That's because deleting Scott but not Scotts subordinate (who is not in the set defined by the extraction-model!) would violate the integrity of the data base.

Step 9. Delete Scott

In order to delete Scott, me must delete his subordinate too. To do so, relax the restriction on the SUBORDINATE-association:

restrictionmodel/no-subordinates.csv
# from A (or association name); to B;       restriction-condition
SUBORDINATE;                  ;           A.NAME='SCOTT'
DEPARTMENT;                   EMPLOYEE;   ignore
SALARYGRADE;                  EMPLOYEE;   ignore

and repeat the exportation:

$ sh jailer.sh export -e scott.sql -d delete-scott.sql extractionmodel/scott-without-subordinates.csv com.ibm.db2.jcc.DB2Driver jdbc:db2://localhost:50001/wisser scott tiger > /dev/null
$ cat delete-scott.sql
-- generated by Jailer at Tue Jun 05 10:50:03 CEST 2007 from wisser@u19
--
-- extraction model:  EMPLOYEE where NAME='SCOTT' (extractionmodel/scott-without-subordinates.csv)
-- database URL:      jdbc:db2://localhost:50001/wisser
-- database user:     scott
-- exported entities: 9
--     DEPARTMENT                     2
--     EMPLOYEE                       4
--     SALARYGRADE                    3
--
-- Tabu-tables:  {  }
--
-- entities to delete: 2
--      EMPLOYEE                       2 (-2)



Delete from EMPLOYEE Where EMPNO in (7876);
Delete from EMPLOYEE Where EMPNO in (7788);

The file delete-scott.sql contains Delete-statements for Scott and Adams.
Note that scott.sql now contains Insert-statements for Adams and his salary-grade too.



1) Since 0.9.3 it is also possible to render the data model as HTML. Clickhere to see how the tutorial's model would look like in HTML.
2) Since 2.0 all associations in datamodel/model-builder-association.csv have names.