Quantcast
Channel: filter – Oracle DBA – Tips and Techniques

Oracle Goldengate Tutorial 8 – Filtering and Mapping data

$
0
0

Oracle GoldenGate not only provides us a replication solution that is Oracle version independent as well as platform independent, but we can also use it to do data transformation and data manipulation between the source and the target.

So we can use GoldenGate when the source and database database differ in table structure as well as an ETL tool in a Datawarehouse type environment.

We will discuss below two examples to demonstrate this feature – column mapping and filtering of data.

In example 1, we will filter the records that are extracted on the source and applied on the target – only rows where the JOB column value equals ‘MANAGER” in the MYEMP table will be considered for extraction.

In example 2, we will deal with a case where the table structure is different between the source database and the target database and see how column mapping is performed in such cases.

Example 1

Initial load of all rows which match the filter from source to target. The target database MYEMP table will only be populated with rows from the EMP table where filter criteria of JOB=’MANAGER’ is met.

On Source

GGSCI (redhat346.localdomain) 4> add extract myload1, sourceistable
EXTRACT added.

GGSCI (redhat346.localdomain) 5> edit params myload1

EXTRACT myload1
USERID ggs_owner, PASSWORD ggs_owner
RMTHOST devu007, MGRPORT 7809
RMTTASK replicat, GROUP myload1
TABLE scott.myemp, FILTER (@STRFIND (job, “MANAGER”) > 0);

On Target

GGSCI (devu007) 2> add replicat myload1, specialrun
REPLICAT added.

GGSCI (devu007) 3> edit params myload1

“/u01/oracle/software/goldengate/dirprm/myload1.prm” [New file]
REPLICAT myload1
USERID ggs_owner, PASSWORD ggs_owner
ASSUMETARGETDEFS
MAP scott.myemp, TARGET sh.myemp;

On Source – start the initial load extract

GGSCI (redhat346.localdomain) 6> start extract myload1

Sending START request to MANAGER …
EXTRACT MYLOAD1 starting

On SOURCE

SQL> select count(*) from myemp;

COUNT(*)
———-
14

SQL> select count(*) from myemp where job=’MANAGER’;

COUNT(*)
———-
9

On TARGET

SQL> select count(*) from myemp where job=’MANAGER’;

COUNT(*)
———-
9

Create an online change extract and replicat group using a Filter

GGSCI (redhat346.localdomain) 10> add extract myload2, tranlog, begin now
EXTRACT added.

GGSCI (redhat346.localdomain) 11> add rmttrail /u01/oracle/software/goldengate/dirdat/bb, extract myload2
RMTTRAIL added.

GGSCI (redhat346.localdomain) 11> edit params myload2

EXTRACT myload2
USERID ggs_owner, PASSWORD ggs_owner
RMTHOST 10.53.200.225, MGRPORT 7809
RMTTRAIL /u01/oracle/software/goldengate/dirdat/bb
TABLE scott.myemp, FILTER (@STRFIND (job, “MANAGER”) > 0);

On Target

GGSCI (devu007) 2> add replicat myload2, exttrail /u01/oracle/software/goldengate/dirdat/bb
REPLICAT added.

GGSCI (devu007) 3> edit params myload2

“/u01/oracle/software/goldengate/dirprm/myload2.prm” [New file]
REPLICAT myload2
ASSUMETARGETDEFS
USERID ggs_owner, PASSWORD ggs_owner
MAP scott.myemp, TARGET sh.myemp;

On Source – start the online extract group

GGSCI (redhat346.localdomain) 13> start extract myload2

Sending START request to MANAGER …
EXTRACT MYLOAD2 starting

GGSCI (redhat346.localdomain) 14> info extract myload2

EXTRACT MYLOAD2 Last Started 2010-02-23 11:04 Status RUNNING
Checkpoint Lag 00:27:39 (updated 00:00:08 ago)
Log Read Checkpoint Oracle Redo Logs
2010-02-23 10:36:51 Seqno 214, RBA 103988

On Target

GGSCI (devu007) 4> start replicat myload2

Sending START request to MANAGER …
REPLICAT MYLOAD2 starting

GGSCI (devu007) 5> info replicat myload2

REPLICAT MYLOAD2 Last Started 2010-02-23 11:05 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:08 ago)
Log Read Checkpoint File /u01/oracle/software/goldengate/dirdat/bb000000
First Record RBA 989

On Source we now insert two rows into the MYEMP table – one which has the JOB value of ‘MANAGER’ and the other row which has the job value of ‘SALESMAN’


On SOURCE

SQL> INSERT INTO MYEMP
2 (empno,ename,job,sal)
3 VALUES
4 (1234,’GAVIN’,’MANAGER‘,10000);

1 row created.

SQL> commit;

Commit complete.

SQL> INSERT INTO MYEMP
2 (empno,ename,job,sal)
3 VALUES
4 (1235,’BOB’,’SALESMAN‘,1000);

1 row created.

SQL> commit;

Commit complete.

SQL> select count(*) from myemp;
COUNT(*)
———-
16

SQL> select count(*) from myemp where job=’MANAGER’;

COUNT(*)
———-
10

On Target, we will see that even though two rows have been inserted into the source MYEMP table, on the target MYEMP table only one row is inserted because the filter has been applied which only includes the rows where the JOB value equals ‘MANAGER’.

SQL> select count(*) from myemp;

COUNT(*)
———-
10

Example 2 – source and target table differ in column structure

In the source MYEMP table we have a column named SAL whereas on the target, the same MYEMP table has the column defined as SALARY.

Create a definitions file on the source using DEFGEN utility and then copy that definitions file to the target system

GGSCI (redhat346.localdomain) > EDIT PARAMS defgen

DEFSFILE /u01/oracle/ggs/dirsql/myemp.sql
USERID ggs_owner, PASSWORD ggs_owner
TABLE scott.myemp;

[oracle@redhat346 ggs]$ ./defgen paramfile /u01/oracle/ggs/dirprm/defgen.prm

***********************************************************************
Oracle GoldenGate Table Definition Generator for Oracle
Version 10.4.0.19 Build 002
Linux, x64, 64bit (optimized), Oracle 11 on Sep 18 2009 00:09:13

Copyright (C) 1995, 2009, Oracle and/or its affiliates. All rights reserved.

Starting at 2010-02-23 11:22:17
***********************************************************************

Operating System Version:
Linux
Version #1 SMP Wed Dec 17 11:41:38 EST 2008, Release 2.6.18-128.el5
Node: redhat346.localdomain
Machine: x86_64
soft limit hard limit
Address Space Size : unlimited unlimited
Heap Size : unlimited unlimited
File Size : unlimited unlimited
CPU Time : unlimited unlimited

Process id: 14175

***********************************************************************
** Running with the following parameters **
***********************************************************************
DEFSFILE /u01/oracle/ggs/dirsql/myemp.sql
USERID ggs_owner, PASSWORD *********
TABLE scott.myemp;
Retrieving definition for SCOTT.MYEMP

Definitions generated for 1 tables in /u01/oracle/ggs/dirsql/myemp.sql

If we were to try and run the replicat process on the target without copying the definitions file, we will see an error as shown below which pertains to the fact that the columns in the source and target database are different and GoldenGate is not able to resolve that.

2010-02-23 11:31:07 GGS WARNING 218 Aborted grouped transaction on ‘SH.MYEMP’, Database error 904 (ORA-00904: “SAL”: invalid identifier).

2010-02-23 11:31:07 GGS WARNING 218 SQL error 904 mapping SCOTT.MYEMP to SH.MYEMP OCI Error ORA-00904: “SAL”: invalid identifier (status = 904), SQL .

We then ftp the definitions file from the source to the target system – in this case to the dirsql directory located in the top level GoldenGate installed software directory

We now go and make a change to the original replicat parameter file and change the parameter ASSUMEDEFS to SOURCEDEFS which provides GoldenGate with the location of the definitions file.

The other parameter which is included is the COLMAP parameter which tells us how the column mapping has been performed. The ‘USEDEFAULTS’ keyword denotes that all the other columns in both tables are identical except for the columns SAL and SALARY which differ in both tables and now we are mapping the SAL columsn in source to the SALARY column on the target.

REPLICAT myload2
SOURCEDEFS /u01/oracle/software/goldengate/dirsql/myemp.sql
USERID ggs_owner, PASSWORD ggs_owner
MAP scott.myemp, TARGET sh.myemp,
COLMAP (usedefaults,
salary = sal);

We now go and start the originall replicat process myload2 which had abended because of the column mismatch (which has now been corrected via the parameter change) and we see that the process now is running without any error.

now go and start the process which had failed after table modification

GGSCI (devu007) 2> info replicat myload2

REPLICAT MYLOAD2 Last Started 2010-02-23 11:05 Status ABENDED
Checkpoint Lag 00:00:03 (updated 00:11:44 ago)
Log Read Checkpoint File /u01/oracle/software/goldengate/dirdat/bb000000
2010-02-23 11:31:03.999504 RBA 1225

GGSCI (devu007) 3> start replicat myload2

Sending START request to MANAGER …
REPLICAT MYLOAD2 starting

GGSCI (devu007) 4> info replicat myload2

REPLICAT MYLOAD2 Last Started 2010-02-23 11:43 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:03 ago)
Log Read Checkpoint File /u01/oracle/software/goldengate/dirdat/bb000000
2010-02-23 11:31:03.999504 RBA 1461

Coming Next! – Monitoring the GoldenGate environment …..


GoldenGate – using FILTER, COMPUTE and SQLEXEC commands

$
0
0

Some time back I had posted a note on column mapping and data transformation using GoldenGate.

Here are some more examples of column mapping and manipulating data using keywords like SQLPREDICATE, COMPUTE, FILTER and I will also introduce another powerful GoldenGate command called SQLEXEC – which we will discuss in detail at a later date.

SQLPREDICATE

Enables us to provide a WHERE clause to select rows for an initial load. This will be included in the Extract parameter file as part of the TABLE clause as shown below.

The GoldenGate reference guide has this to say ….

“SQLPREDICATE is a better selection method for initial loads than the WHERE or FILTER options.It is much faster because it affects the SQL statement directly and does not require GoldenGate to fetch all records before filtering them, like those other options do.”

TABLE ggs_owner.emp_details, SQLPREDICATE “where ename=’Gavin'”;

We can also perform the filter on the Replicat side by only selecting a subset of the data which has been extracted by using the WHERE clause as part or the Replicat parameter file as shown below.

MAP ggs_owner.emp_details, TARGET ggs_owner.emp_details, WHERE (ename=”Gavin”);

FILTER

The FILTER clause offers us more functionality than the WHERE clause because you can employ any of GoldenGate’s column conversion functions to filter data, whereas the WHERE clause accepts basic WHERE operators.

For example we can use standard arithmetic operators like ‘+’,’-‘,’/’,’*’ or comparison operators like ‘>’,’<', '=' as well as GoldenGate functions like @COMPUTE, @DATE, @STRFIND, @STRNUM etc For example we can use the STRFIND function as part of the Extract parameter file to only extract records from the table that match a particular string value as shown below. TABLE ggs_owner.emp_details,FILTER (@STRFIND (ename, “Gavin”) > 0);

COMPUTE

In this example we will use the GoldenGate function @COMPUTE to derive the values for a column in a table based on values in some other column in the same table.

We will also see how a column mapping is performed on the target side where the target table EMP has an additional column COMM which is not present in the source table. We will derive the values for the COMM column by using a arithmetical calculation where the COMM is the SAL value plus 10%.

Remember we have to first create a definitions file using the defgen command as the source and target tables differ in structure.

In this case we will generate the definitions file on the target GoldenGate environment as the target table has the additional column COMM which is not present in the source EMP table.

edit params defgen

DEFSFILE /home/oracle/goldengate/dirsql/emp.sql
USERID ggs_owner, PASSWORD ggs_owner
TABLE ggs_owner.emp;

We then run this on the Target goldengate location

[oracle@linux02 goldengate]$ ./defgen paramfile /home/oracle/goldengate/dirprm/defgen.prm

The replicat parameter file will have the following contents – note the combination of the COLMAP and COMPUTE – one will tell Goldengate how to map the difference in the table structure and the other will execute a computation on the SAL column to derive data for the COMM column. Remember, the USEDEFAULTS term means that all the other columns other than COMM are identically matched on both source as well as target tables.

REPLICAT rep1
USERID ggs_owner, PASSWORD *********
SOURCEDEFS /home/oracle/goldengate/dirsql/emp.sql
MAP ggs_owner.emp_details, TARGET ggs_owner.emp_details,
COLMAP (usedefaults,
comm= @compute(sal +sal *.10));

After running the extract on the source, we will see that the EMP table has been populated on the target database and the column COMM has been derived as well from the SAL column.

SQL> select * from emp;

     EMPNO ENAME                    DEPTNO        SAL       COMM
---------- -------------------- ---------- ---------- ----------
      1001 Gavin                        10       1000       1100
      1002 Mark                         20       2000       2200
      1003 John                         30       3000       3300

SQLEXEC

SQLEXEC can be used as part of the Extract or Replicat process to make database calls which enables Goldengate to use the native SQL of the database to execute SQL queries, database commands as well as stored procedures and functions.

For example, as part of a large batch data load process, we would like to drop the indexes first and then rebuild them after the data load is complete. We see how when included as part of this replicat parameter file we are dropping and rebuilding an index using native SQL commands.

REPLICAT rep1
USERID ggs_owner, PASSWORD ggs_owner
ASSUMETARGETDEFS
sqlexec “drop index loc_ind”;
MAP ggs_owner.emp_details, TARGET ggs_owner.emp_details, WHERE (location=”Sydney”);
sqlexec “create index loc_ind on emp_details(location)”;

Using GoldenGate EVENTACTIONS to customize processing

$
0
0

Oracle Goldengate has an event marker system which enables the GoldenGate processes to perform some defined action when a specific event occurs which is recorded in the trail file.

The event record is a record will trigger the event action and this is specified using the FILTER or WHERE clause in the TABLE statement of an Extract parameter file or the MAP statement of a Replicat parameter file. It can also be specified using an SQLEXEC query or a stored procedure.

In the same Extract or Parameter file in which we specified the event record, we will use the EVENTACTIONS keyword to specify what action is to be taken by the process.

EVENTACTIONS could be specified via the keywords like IGNORE, DISCARD, ABORT, STOP, SHELL, TRACE, LOG which denote what actions should be taken now that the specified record criteria has been met.

Please refer to the Chapter ‘Customizing Oracle GoldenGate Processing” (page 276) of the Oracle GoldenGate Windows and UNIX Administrator’s Guide.

In this simple example we will see a test case where we are taking an export of the replicated table on the target server after data processing has completed on the source server.

We have a job status table and a record is being inserted into that table to denote that the processing is now complete and we can take a backup of the table which we are doing using Data Pump.

This is the Extract parameter file

EXTRACT ext1
USERID idit_prd, PASSWORD idit_prd
RMTHOST insodb02, MGRPORT 7809
RMTTRAIL ./dirdat/cc
TABLE idit_prd.myobjects ;
TABLE idit_prd.ops_job_status ;

This is the Replicat parameter file

REPLICAT rep1
SETENV (NLS_LANG="AMERICAN_AMERICA.WE8ISO8859P1")
SETENV (ORACLE_SID=GGDB2)
ASSUMETARGETDEFS
USERID idit_prd,PASSWORD idit_prd
MAP idit_prd.myobjects, TARGET idit_prd.myobjects;
MAP idit_prd.ops_job_status, TARGET idit_prd.ops_job_status , FILTER (@STREQ (STATUS, "PROCESSING COMPLETE" )), EVENTACTIONS ( IGNORE TRANS , STOP, SHELL "/home/oracle/exp.sh" );

To explain this simply …

We have an event table called OPS_JOB_STATUS.

We have a FILTER clause which specifies the criteria for the event which is look for the string “PROCESSING COMPLETE” in the column STATUS of the OPS_JOB_STATUS table.

When this event occurs, the EVENTACTIONS clause specfies now what to do which is:

  • Ignore the transaction and do not replicate that insert into the OPS_JOB_STATUS table on the target side.
  • Stop the replicat process.
  • Run the UNIX shell script (on the target server since replicat process runs on target) exp.sh
  • This is the content of the exp.sh shell script

    #!/bin/ksh
    /opt/oracle/product/server/10.2.0.4.5/bin/expdp idit_prd/idit_prd parfile=/home/oracle/exp.par

    and the Data Pump parfile exp.par contents are :

    tables=MYOBJECTS
    directory=dumpdir
    logfile=dumpdir:exp.log
    dumpfile=myobjects.dmp

    Okay – so now on the target we see tha the replicat process rep1 is running fine

    GGSCI (insodb02) 12>  !
    info replicat rep1
     
    REPLICAT   REP1      Last Started 2011-04-01 13:28   Status RUNNING
    Checkpoint Lag       00:00:00 (updated 00:00:08 ago)
    Log Read Checkpoint  File ./dirdat/cc000019
                         2011-04-01 13:06:05.059982  RBA 931
    

    Now let us assume that on the source server, that some month end processing has been completed and these records have been now replicated on the target database.

    We would like to take an export of the target table now as a kind of backup.

    We insert a row into the OPS_JOB_STATUS table

    SQL> insert into ops_job_status
      2  values
      3  (sysdate,'PROCESSING COMPLETE');
     
    1 row created.
     
    SQL> commit;
     
    Commit complete.
    

    Let us now see what has happened to the Replicat process.

    We see that it has stopped and as required a Data Pump job to export the table has been executed via the shell script which has fired as well.

    We can see that the export dumpfile myobjects.dmp has been created in the required Data Pump export directory.

    GGSCI (insodb02) 13> !
    info replicat rep1
     
    REPLICAT   REP1      Last Started 2011-04-01 13:28   Status STOPPED
    Checkpoint Lag       00:00:08 (updated 00:00:04 ago)
    Log Read Checkpoint  File ./dirdat/cc000019
                         2011-04-01 13:38:15.017514  RBA 1107
     
     
    oracle@insodb02:/u01/oracle > ls -lrt
     
    -rw-r--r--   1 oracle     dba           1051 Apr  1 13:38 exp.log
    -rw-r-----   1 oracle     dba        5009408 Apr  1 13:38 myobjects.dmp
    

    GoldenGate – using FILTER, COMPUTE and SQLEXEC commands

    $
    0
    0

    Some time back I had posted a note on column mapping and data transformation using GoldenGate.

    Here are some more examples of column mapping and manipulating data using keywords like SQLPREDICATE, COMPUTE, FILTER and I will also introduce another powerful GoldenGate command called SQLEXEC – which we will discuss

    You need to be logged in to see this part of the content. Please Login to access.

    Using GoldenGate EVENTACTIONS to customize processing

    $
    0
    0

    Oracle Goldengate has an event marker system which enables the GoldenGate processes to perform some defined action when a specific event occurs which is recorded in the trail file.

    The event record is a record will trigger the event action and this is specified using the FILTER or WHERE clause

    You need to be logged in to see this part of the content. Please Login to access.

    GoldenGate – using FILTER, COMPUTE and SQLEXEC commands

    $
    0
    0
    You need to be logged in to see this part of the content. Please Login to access.

    Using GoldenGate EVENTACTIONS to customize processing

    $
    0
    0
    You need to be logged in to see this part of the content. Please Login to access.