Site Logo

IzzySoft


Das inoffizielle Android-HandbuchDas inoffizielle Android-Handbuch
Buy at Amazon for EUR 16,99
Das inoffizielle Android-SystemhandbuchDas inoffizielle Android-Systemhandbuch
Buy at Amazon for EUR 7,00
Die besten Android-Apps: Android-Systemtools. Fotografie & Freizeit. Büro-Tools, Schule und StudiumDie besten Android-Apps: Android-Systemtools. Fotografie & Freizeit. Büro-Tools, Schule und Studium
 
As of 2019-04-24 09:06
prices & availability might be subject to change.

Installing Oracle 10g GridControl

Monitoring databases and servers is an important part of the administrator. Especially for Oracle databases and their internals the question comes up, Why not to take a tool from the vendor itself? And if so, DBControl is shipped with the database – do we really need the Enterprise Manager Grid control? We will deal with these questions here as well as with the Installation itself.

What's this article about

As title and introduction already tell, this article is about Oracle 10g Grid Control, the "why" (in short) and the installation process. The latter will use an already existing database for the repository rather than a new one created by the GridControl installer (the "why so" will also be discussed). We again use a Linux server (RHEL / CentOS).

What this article is NOT about

Yeah, again the same disclaimer: We won't talk about Windoze specific stuff here talking about RedHAT, not RedMOND). Moreover, we will not talk about specifics concerning the "Installing GridControl with a new database" option (you will see why).

Contents

Why GridControl?

Oracle 10g databases ship with DBConsole – a tool somehow similiar to GridControl, which does not necessarily require an Enterprise license. So next to the license question (you may decide for the Enterprise Edition for other reasons, as partitioning or other Enterprise features you need), why not use DBControl?

There may be many reasons – we only want to pick two of them here. First: With DBControl, you have one (Web) interface per database - and have to check them all separately. With GridControl, you have one (Web) interface to monitor them all together – including a summary page for all services configured, not only databases but other components as well. Second, GridControl has more restrict security policies than DBControl, including file permissions the latter one does not monitor (if you don't want these, you can change or turn off what you don't like). Having more than just two or three databases, the first point alone may already make the decision – an Enterprise license provided.

First critics on GridControl

Downloading GridControl from the Oracle server issues the first critic: The latest version at the time of this article is 10.2.0.4 – the corresponding archives take 1GB to download, and the archive only contains patches and the hint, you need a previous version to apply them to. Downloading the previous version, 10.2.0.3, ends at the same point: 1GB and only patches. 10.2.0.2 finally contains the installer. While the idea behind it may be providing one solution for all (if you already installed a previous version, you may patch it instead of re-installing), this is frustrating for new installations plus, in this case, leads to additional problems that could have been avoided. And in time pressure, this may end with a 10.2.0.2 GridControl installed since the patches were postponed until "sometimes in the future" (and nobody knows when this will be), maybe leaving even security holes. I cannot call this "customer friendly" – for a software with prices like that, I'ld rather expect a full installer for the latest version.

Second – and here comes the answer on the question Why not using the default installation with a new database? – the whole thing ships with a database release 1 (!), i.e. 10.1.0.4! This version is more than 2 years old, today we already have 10.2.0.4 available. Why should one install some old version like that?

By the way, the database makes up about one third of the package size of 1.5GB in the distribution archive for GridControl 10.2.0.2. Assumed the same for the patches to 10.2.0.4, this is almost 1GB for nothing: If I don't have any database, I probably don't need the GridControl, do I? And if I have one, I probably use some more up-to-date version. Third option, if I want to monitor my Oracle 9i farm, I either can use a 9i instance for the repository (they explicitely allow 9.2.0.6 or above) or, if I want to use a 10g and not yet have it, I won't install some outdated version. Hm – so where's the option justifying the 10gR1 database in this package?

Caution message

GridControl is a powerful tool for the administrator. But while alerts put their fingers on possibel security lacks, they are not always clear and sometimes even ambiguous: If you receive an alert about an open port 1521, it is quite obvious that you should not close that port but rather define an exception to that rule, since you need that listener to contact the database. But if you receive an alert about certain "Privileges Granted to Public", it may look clear to revoke some of them – but the impact is not that clear, as you will find out later (this one is discussed later in this article). In this example, you could render your database at least partly unusable, since other Oracle packages blindly trust on those permissions as usually being granted, and do not grant them explicitly during their install (in this case, Oracle Text could and should have granted those to the CTXSYS user). So here is some potential left to Oracle for improvements: First to not take things granted they alert about in other places, and second to make the alerts (and/or their help texts) more discriptive and distinguishable.

Prerequisites

General

As already said on the first page, we assume you already have installed a 10g database. While the repository database may be on a different host (which is, by the way, recommended by Oracle), you may also have it running on the same host. However: The host for the GridControl installation must meet the same requirements as for the database itself, plus some more. Differences/additions will be described below:

Additional hardware requirements

Additional Software requirements

Additionally to the database requirements, add the packages

Remark: Oracle also mentiones the control-center package as a must-have requirement. This would resolve in half of Gnome being installed, including Evolution packages and all that stuff. I felt this contra-productive and left that requirement alone – with no consequences so far. Which is not a guarantee, however! So to be on the "safe side", you may better follow the Oracle advice. Since the installer did not feel anything missing (and they make a lot of checks), I personally feel on the safe side without it.

Environment

Additionally to the database requirements, GridControl needs the operating system user nobody – which is usually already there. you can check that by locating the user in /etc/passwd, or simply run id nobody which should return the user_id.

As usual, there will be one or more script to be run as root during the installation (for our article, two scripts). So make sure you have root access on the machine you want to install GridControl to.

Database

Permissions

The curious things first, I dropped into with a failing install: Though DBControl before alerted that you should never grant public execution to the utl_* packages, the installation will only be successful if you did – and will alert the same again once it's running. Don't ask me what idea stands behind that, but before installing GridControl, you have to

sqlplus> GRANT EXECUTE ON utl_smtp TO public;
sqlplus> GRANT EXECUTE ON utl_tcp TO public;
sqlplus> GRANT EXECUTE ON utl_file TO public

Once you have GridControl up and running, don't forget to revoke those again – and grant the same rights to sysman instead.

Configuration

Your redo logs should have a size of 100M. Also, you should have the package dbms_shared_pool installed – if not, do so by executing

sqlplus> @?/rdbms/admin/dbmspool.sql

Furthermore, you should compare your init.ora settings to meet the following settings:

job_queue_processes     >= 10
db_block_size            = 8192
timed_statistics         = TRUE
statistics_level         = TYPICAL
open_cursors            >= 300
session_cached_cursors  >= 200
aq_tm_processes         >= 1
undo_management          = AUTO
undo_retention          >= 10800
processes               >= 150
log_buffer               = 1 M
pga_aggregate_target    >= 256 M
sga_target              >= 512 M
_b_tree_bitmap_plans     = FALSE

Miscellaneous

Installing GridControl Software

Call the installer

To invoke the installer, simply change into the directory where you unpacked the downloaded archive, and execute ./runInstaller. Follow the steps through the installer as described e.g. here (very detailed, including all the pre-requisites etc.). You do not need to read that page – the link is just mentioned "in case of". The process is quite intuitive.

Installation process

On the first page of the installer, chose "Enterprise Manager 10g Grid Control Using an Existing Database", click the "Next" button, and specify the database details on the second page. Also on the second page, click the "Prefill Tablespace Locations" button once – this will read the correct location of your databases data files plus, as a side-effect, makes sure your entered credentials are correct (since otherwise the installer cannot read these data out of the target database). Check the database file names, and click the "Next" button again.

On the next page, you can configure an EMail address (where GridControl sends its alert messages to) plus SMTP server – you should enter these data. Everything else on this page is optional: If you want GridControl to "phone home", you can enter your MetaLink credentials here plus, if necessary, the proxy settings. And then click "Next" again.

Now come the passwords. With GridControl, this is the first time the stronger policies take effect: The given password must be at least 5 characters, starting with a letter and including at least one digit.

Next comes a summary screen, and the the installation itself. Hopefully our preparations where good, and everything runs smoothely. I assume so here – well, leaving you alone in case of additional problems (two of them we already got out of the way with our preparations), sorry for that.

Applying the Patches

Preparations

Download and unpack

No need to say you first must download the patches from Oracle - you probably already have. Unpack the archive into an empty directory, and delete all but the p3731593_10204_LINUX.zip – you won't ever need those files: The 373596.zip and 3822442.zip are for already deployed agents (and we did not yet deploy any), and the p4329444*.zip files are for patching the included 10.1 database we did not use – so you see, again more than 50% waste …

Now unpack the p3731593_10204_LINUX.zip, and it will create another directory called 3731593 holding the patch installation files.

Stop all EM services

Next we need to stop all Enterprise Manager services. This can be done using the following commands:

$EM_HOME/bin/emctl stop oms
$EM_HOME/opmn/bin/opmnctl stopall
$AGENT_HOME/bin/emctl stop agent

where $EM_HOME is the ORACLE_HOME of the Oracle Management Services (a directory usually called "oms10g" somewhere below your ORACLE_BASE – parallel to your usual ORACLE_HOME or, as in our case, parallel to that in em102/oms10g), and $AGENT_HOME at the same place, but usually called agent10g.

Apply the patches

Change to the directory created by the last unpack operation above, and run the installer: ./runInstaller. Now first select the Oracle Home of the OMS and click "Next". Check that it selected the correct database (where your repository resides), and enter the password for the SYS user. Approve all questions to your requirements, and wait the patch to finish (this may take a while – some 15 minutes or the like). Click "Finish".

Start the installer again, and this time select the Oracle Home of the agent. Less questions this time (no database connection required), but almost the same procedure. Click "Finish" once the patches have been applied, and you are done.

Restart the services

Now it's time to start the services again:

$EM_HOME/bin/emctl start oms
$EM_HOME/opmn/bin/opmnctl startproc ias-component=WebCache
$AGENT_HOME/bin/emctl start agent

Instead of the start command, you may want to issue status – most probably these services are already started by the patch utility. Except for the WebCache, which simply gives an no processes matched this request error – no idea why, since these startup informations are taken directly from the Grid manual.

Postprocessing and Remarks

OMS

Services

Right after the installation (and the patch process), the installer starts up a bunch of processes – some of them you will normally not need. For sure you will need the OMS itself and the agent – but not necessary all of the IAS components. Running <OMS HOME>/bin/opmnctl status -l gives you a list of all IAS components and their current state. The following components are the basic requirements:

  1. OC4J_EM
  2. OC4J_EMPROV
  3. HTTP_SERVER

These processes you should make sure to have started when your server boots – all others are optional. Also make sure you keep above order:

$OMS_HOME/bin/emctl start oms
$OMS_HOME/opmn/bin/opmnctl startproc process-type=OC4J_EM
$OMS_HOME/opmn/bin/opmnctl startproc process-type=OC4J_EMPROV
$OMS_HOME/opmn/bin/opmnctl startproc process-type=HTTP_Server
$AGENT_HOME/bin/emctl start agent

In case you wonder where the OC4J_EM and HTTP_SERVER are: They already implicitly got started by emctl start oms. Now, stopping these services goes as follows:

$OMS_HOME/bin/emctl stop oms
$OMS_HOME/opmn/bin/opmnctl stopall
$AGENT_HOME/bin/emctl stop agent

Port security

While with the above steps we already closed some ports unnecessarily open, there are some more we may or may not want to be used: The unsecure HTTP ports. We cannot close them directly – but we can make them "unusable" by forbidding their use. This can be done in the HTTP servers configuration, i.e. $OMS_HOME/sysman/config/http_em.conf (resp. http_em.conf.template, which will overwrite the first every time we secure/lock/unlock the OMS, see Metalink Note 452290.1). First check which are the non-secure HTTP ports used (<OMS HOME>/bin/opmnctl status -l), and then find the right sections in these two files. Here comes one example:

<VirtualHost *:4889>
...
  <Location /em/console>
    order deny,allow
    Deny from all
  <Location>

As you may already have guessed, the keywords are Deny from all – meaning that nobody is allowed to access this host anymore. But be carefully: this also disables the agents possibility to communicate via unsecured HTTP. So check here first and decide then whether you can do so.

Remote Agents

Probably you will have at least one database on a remote host you want to monitor using GridControl. Or another machine where you want to monitor the host itself. Or both, or more. However: You will need an Agent to be installed there (and running). Besides copying the installation files there manually, and click through the installer again, you have two alternatives:

Before we roll out

Alright, you installed everything, patched everything, configured everything - what more? Don't ask me why, but if you want to install the agents on remote hosts as described below, at this point there would only be the 10.2.0.1 agent available! Although you patched OMS as well as the agent, this created the 10.2.0.4 directory, but only to add the script file and a plain text file there, missing all the bunch of files needed for the installation on the remote host.

What does it mean – but an additional download of another 340MB for the Complete 10.2.0.4 agent install files from Oracle? Once you downloaded these files, unzip them into the <OEM Home>/sysman/agent_download/10.2.0.4.0 directory (at least the linux directory and its contents you need there). After that, the install will still roll out the 10.2.0.1 agent, but together with its patches, and will apply those automatically on the rollout. Why they don't simply put the latest version there, is another question to the Oracle …

Push Install

The Push Install requires a working SSH setup: The Management server must be able to connect to the hosts where the agents should be installed, plus to transfer and install the software there and finally start it. If you have not yet set up SSH for this, you can do so using the <OMS ORACLE_HOME>/sysman/prov/resources/scripts/sshUserSetup.sh script. It further requires a username and password (I did not check whether this should be for the OS – if you did not make the oracle user a login user, you probably cannot use the Push Install).

If these conditions are met, you can open GridControl in your web browser (e.g. https://<servername>:1159/em), click the Deployments tab, hit the Install Agents link there, and follow the instructions. This way you can deploy the agents to multiple machines at once – provided they share the same credentials.

Pull Install

Other than the Push Install, this does not require the SSH setup - but wget must be available. Logged in as the oracle user on the machine the agent should be installed, issue the following commands:

oracle@remotehost> wget http://<emhost>:4889/agent_download/10.2.0.4.0/linux/agentDownload.linux
oracle@remotehost> chmod +x agentDownload.linux
oracle@remotehost> ./agentDownload.linux -m <emhost> -b <ORACLE_BASE> -r 4889

ORACLE_BASE specifies the directory below which the agent home will be created. So if you e.g. specify /home/oracle/em102, it will create /home/local/em102/agent102 as the AGENT_HOME directory and install the agent software here. The agent will be started right after the installation process completed. With the 10.2.0.4 version of the script, you must provide the port with the -r option, or the script will simply break with some error messages.

Ah, well: <emhost> is the host running the management server, i.e. the one we just made the big installation on the pages before.

Securing the agents

The agents transport a lot of secure and sensitive information via the net, as e.g. passwords. So you probably don't want them to use an unsecured protocol such as HTTP, do you? At least not as long there is an alternative – and there is: HTTPS. So first let's check what they are using:

$AGENT_HOME/bin/emctl status agent

Watch the output for the Agent URL and Repository URL lines. If the values here already start with "https://", everything is fine (usually they do so at least after a Pull Install). Otherwise, better secure the agent:

$AGENT_HOME/bin/emctl secure agent

Once you have done so for all agents, you may optionally tell the OMS not to permit any unsecure connections anymore:

$OMS_HOME/bin/emctl secure lock

At this stage the agents communication is completely secured – no passwords will be transfered by them anymore uncrypted. But what about the users of the GridControl – they may still use the unsecured HTTP (accidentally, of course)? This now can be suppressed as described above.

Unavailable Services

Having done this all, it's time to check our console and see if everything works. Open your web browser and point it to the management server, e.g. at https://<emhost>:1159/em and have a look.

First thing you will notice (only if you decided for the minimal services described above, and you already stopped and restarted the services as described above) are 3 services marked down. A problem? Not really. The agent of course discovered all services installed – and now expects them to run. But we decided we don't need them all. The agent does not yet know that – so we need to tell him.

In order to do so, first click on the "Targets" tab. Then, in the line just below the tab bar, select "Application Servers". Expand the tree displayed – you will find our 3 services marked down here. Do not remove them all! This would have some implications, i.e. other services we want to monitor would be removed as well. Tell it a "counting problem": 1+1=3 here (2 services are down – yes, right. That means, not all services in the group EnterpriseManager are available – so the group is marked down and added to the list – makes 3). So what we are doing now is: Selecting the WebCache by checking the radio box next to it, click on "Remove", and confirm it. After returning to the list, we do the same with the EnterpriseManager*_home item (only remove the leafs here, never a tree).

Going back to the main page (click the "Home" tab), GridControl now still reports one service being down. Don't mind, give it a few minutes to update. Continue with the next steps below – and when you are finished, it should have updated and listing that service as being "Up" again.

Discover the Databases

Now we check for the other things. Yeah, it already discovered a lot – but probably no database! For each of the databases, it simply says they are in "unknown state" because there have been some "metric errors". A quite misleading information: "Credential errors" would match better. The problem is that we did not yet configure a database user for the console – so it simply assumes the dbsnmp user (which is somehow correct) without any password (which should be not correct – that is, if your databases are shown fine, you have a security problem!).

Normal Databases

To fix this, we go to the databases home page (e.g. select "Database-Instance" in the search forms select box, send the search request, and then click on the link with the database name). Once there, click on the "Monitoring Configuration" link (usually at the bottom of the page in the "Related Links" section), fill in the correct password. Click on the "Test Connection" button to make sure you put the right credentials, and click the "Next" button to apply the changes.

DataGuard Databases

If the database is part of a DataGuard configuration (i.e. a standby database or the corresponding primary), you must configure a SYSDBA user here, e.g. SYS – since otherwise a login attempt will simply result in an ORA-01033: insufficient privileges error (the database is not open but mounted when in recovery state). Though this would not apply to the primary instance, on a switchover the primary becomes the new standby, so you should be prepared. To do so, first open the drop-down box stating "NORMAL", and select "SYSDBA". Now you can edit the user field, and put e.g. "SYS" here. Add the correct password to the password field, then click the "Test connection" and finally the "Next" button as above.

Policies

When a policy is violated, GridControl will issue an alert. Well, that's what we expect. But what we certainly not expect are e.g. the following alerts:

Just to mention some of them. So you probably want either to define some exceptions to those rules, or simply disable them completely. Both is possible with a few mouse clicks – you can do this either for each database separately, or by using templates. Here it is described on a per-host base:

On the GridControl home page you click on the link (number) following "critical" or "warning" in the "All targets - Alerts" section. On the following page, click the link (number) in the column "Number of Violations next to the policy you want to edit. On the next page, you'll find a single link in the section "Related Links" to edit that policy. To disable it completely, simply use the first drop-down box, set the value to "Disabled", and submit. For exception rules, click the "Add Object" button in the "Excluded Objects" section.

Candidates to turn off completely

Here are a few example candidates you may wish to turn off completely:

Policy Description
Permissions for files in the Oracle Home directory This will always find something – including the tnsnames.ora. It requires all directories and files in the $ORACLE_HOME to belong to the OS-user oracle and one of the oracle OS-groups, and (expect for the $ORACLE_HOME/bin directory) to have absolutely no permissions to somebody else (others). As for the tnsnames.ora this could be crucial for some applications.
Use of SQL92 Security features You rarely will need that. What stands behind it is simply to make queries like DELETE FROM table WHERE … fail when the executor has no SELECT permission. Hell, in that case it is very unlikely that the same executor should have DELETE permission! And if so, the DBA should ask what that should be for …
Use of Non-Standard Initialization Parameters On the repository database, this will also warn on the _b_tree_bitmap_plans=FALSE parameter, which Oracle explicitly lists as requirement. Recommendation: Make an exclude rule for this one.
Installation von JAccelerator (NCOMP) Not very descriptive help in no Oracle documentation I found; they simply argue "it's c00l and makes Java faster" (what we probably already guessed from the name) – but nowhere I found what that has to do with the database, whether it is a requirement or a nice-to-have, and to what Java code it applies after all. The Oracle Application Server Release Notes state, JAccelerator is not needed (literally: "installed") with OracleAS, and "You can ignore this error message." (i.e. the ORA-29558: JAccelerator (NCOMP) not installed.). Concluding from this, the database itself (which in this case holds the repository) does not need it. Googling for more, there are statements in the DBForum.COM that the JAccelerator is only needed to compile YOUR OWN STUFF, since all Java shipped with the database is already compiled natively. On Forums.Oracle.COM we find a short hint that it is "speeding up the Java execution in the database".
Except for the first statement from the AS Release Notes, we cannot say anything about "official statements". But the conclusion seems to be: As long as YOU don't use YOUR OWN Java code in the database, you will not need JAccelerator and thus can simply disable this policy.
Public Execute Privileges There are several policies concerning execute privileges for the PUBLIC user: Execute Privileges on %1 To Public (with %1 being one of UTL_FILE, DBMS_LOB and DBMS_JOB), and Restricted Privilege to Execute %1 (with %1 being one of UTL_TCP, UTL_SMTP and UTL_HTTP). These alerts will come up if you did not change the defaults concerning these public privileges. While there is some sense behind these, you should not blindly trust those alerts and revoke the permissions – this may lead you in trouble (at least if you have Oracle Text installed, for example, which relies on some of these implicit grants and will be broken when you revoke them). If you are concerned about these permissions and think about revoking them, read Metalink note 247093.1 first and think twice. In many cases, you may simply prefer to disable these policies.
OC4J Password indirection If you don't know what OC4J needs passwords for, you probably can turn this policy off. If you are not sure, check those files mentioned (and Metalink note 312648.1" to know where to look). Probably this file still uses our well-known "scott/tiger" account we never created – so turn the policy off. But if you need it, follow the advices in the Metalink note just mentioned.

Ports to exclude

For the Open Ports rule, you may want to define the following exceptions:

Port Description
22 SSH
1521 Oracle Listener
3872 EM Agent
1159 GridControl secure HTTP

Well Known Accounts

Probably this comes up with the OUTLN account. So all you have to do here is to log in to the database as SYSDBA user, and execute:

sqlplus> ALTER USER outln PASSWORD EXPIRE;
sqlplus> ALTER USER outln ACCOUNT LOCK;

Miscellaneous Rules and Metrics

There are some more metrics and rules you may wish to adjust. We cannot mention them all here (since at least some of them are very subjective), but a few common things should be listed at least:

Policy/Metric Description
Current Open Cursors Count The default of 1.200 set here will not be enough in most cases, so you probably always will have an alert for this. Do not confuse this value with the open cursors per session – we talk about open cursors in the entire database (a sum of all sessions). Depending on the session count, this number may be much higher. No idea how Oracle calculates the default value of 1.200 (simple calculation: The default "small" database is configured with 50 processes, default for open_cursors is 50 – which results in a maximum of 2.750 possible open cursors (processes * 1.1 * open_cursors) – so the alert is already triggered at <50% of a small databases maximum). In a Metalink forum I read a value of 6.000 is not unusual. So if you encounter alerts for this, changing the metrics treshold up to this should be no problem (probably even more is not, so you could take above calculation with your settings, and set the "warning" to 80% of that value).
Tablespace Space Used (%) Generally, the values here are OK – but if you are using the automatic undo management, having your undo tablespace properly sized but its datafiles with autoextend off, you will be annoyed by many alerts for it – since Oracle always keeps as much undo as possible. So better add a separate rule for your undo tablespace, setting its tresholds to 99%.
Database Time Spent Waiting (%) Here are some useless alerts for Wait Class Other: these are in 99% simple idle waits, nothing to worry about. So to prevent those "false alerts", since you cannot simply turn this section off, either increase the Warning Treshold from 30 to something >70, the Number of Occurences to something >10, or both.

Permissions

One thing you will note very soon is a contradiction between the catalog scripts you run on installation and upgrade of a database – and the policies GridControl ships with: The catalog scripts will grant a bunch of permissions to public, which GridControl then will alert for. If you simply revoke them, this will break GridControl itself – amongst others: They rely on PUBLIC being granted those permissions. This is a known "feature" of Oracle for years. Since a solution is not to be expected that soon (there is a "feature request" pending for this, which may make it into one of the next releases), we have to find a work-around meanwhile.

The easiest one is to turn these policies off. But if we'd do that in all cases, we can save us from installing GridControl: There wouldn't be that much things left. So we better figure out how to solve the grants. Before touching these, make sure to read the Metalink article 247093.1, since this may have some bad side-effects!

So first we check who is relying on the grants alerted:

-- some (optional) output formatting
COLUMN package FOR a10
COLUMN owner FOR a10
COLUMN objects FOR 999
COLUMN references FOR a10
COLUMN type FOR a10
SET feedback off
SET verify off

SELECT referenced_name package, owner, count(name) objects,referenced_owner references, referenced_type type
  FROM dba_dependencies
 WHERE referenced_owner IN ('SYS','PUBLIC')
   AND referenced_type IN ('PACKAGE','SYNONYM')
   AND referenced_name IN ('UTL_FILE','UTL_SMTP','UTL_TCP','UTL_HTTP','DBMS_JOB','DBMS_LOB')
   AND owner <> 'SYS' AND owner <> 'PUBLIC'
 GROUP BY referenced_name, owner, referenced_owner, referenced_type
 ORDER BY referenced_name, owner, referenced_owner, referenced_type;

SET feedback on
SET verify on

The result probably looks like:

DBMS_JOB   DBSNMP           1 PUBLIC     SYNONYM
DBMS_JOB   SYSMAN           5 PUBLIC     SYNONYM
UTL_FILE   ORACLE_OCM       1 PUBLIC     SYNONYM
UTL_FILE   SYSMAN           1 PUBLIC     SYNONYM

Which means: We have to grant the execute permissions on DBMS_JOB to DBSNMP and SYSMAN, plus execute on UTL_FILE to SYSMAN and ORACLE_OCM before revoking them from PUBLIC (please adjust this according to your real output!). And it means, the other packages are not affected, and we can safely revoke the execute permissions on them from PUBLIC:

-- Grant necessary permissions explicitly
GRANT EXECUTE ON dbms_job TO DBSNMP;
GRANT EXECUTE ON dbms_job TO SYSMAN;
GRANT EXECUTE ON utl_file TO ORACLE_OCM;
GRANT EXECUTE ON utl_file TO SYSMAN;

-- Remove unsafe permissions from public
REVOKE EXECUTE ON dbms_job FROM PUBLIC;
REVOKE EXECUTE ON dbms_lob FROM PUBLIC;
REVOKE EXECUTE ON utl_tcp FROM PUBLIC;
REVOKE EXECUTE ON utl_smtp FROM PUBLIC;
REVOKE EXECUTE ON utl_http FROM PUBLIC;
REVOKE EXECUTE ON utl_file FROM PUBLIC;

-- "Repair" packages we possibly broke temporarily
@?/rdbms/admin/utlrp.sql

The last command executes a script checking for invalid objects allover the database and trying to "repair" them. Maybe you have to run this script again a while later …

Warning: Due to complex nestings, revoking above mentioned privileges may have serious side-effects. In some cases, even many objects of the SYS user have been invalidated on the following day (again and again, even after fixing this with utlrp.sql) – though the objects we revoked PUBLIC access from belong to this user himself! Another troublemaker is CTXSYS: Here we had cases where some packages did not even re-compile after granting all the permissions again. This may force you to re-run the entire catalog process in order to re-validate the objects.

So if following this procedure, you find some objects popping up as invalid (most likely package bodies), and cannot validate them again using the @?/rdbms/admin/utlrp.sql script, and even an explicit ALTER PACKAGE <package_name> compile fails (most likely with a message like "malformed or corrupted wrapped unit"), here's what you can do: Check for the broken objects create script (in most cases this is a *.plb file) and run this in the users context to re-create the object. This solved the issue here in most cases.

And here comes an example:

COLUMN owner FOR a15
COLUMN object_name FOR a30
COLUMN status FOR a10
COLUMN object_type FOR a15
SELECT owner,object_name,status,object_type
  FROM dba_objects
 WHERE status<>'VALID';

This could show up:

CTXSYS     DRVDOC                         INVALID   PACKAGE BODY

Usually, you would try to ALTER PACKAGE ctxsys.drvdoc COMPILE BODY – but in our case, this failed ("compilation errors"). The same for @?/rdbms/admin/utlrp.sql – the object remains unimpressed. By the "owner" we know this object belongs to CTX – so we will find the associated CREATE Statement in $ORACLE_HOME/ctx/admin/*: grep -i "package body drvdoc" $ORACLE_HOME/ctx/admin/* points us to the file drvdoc.plb. So at the SQL prompt, we bravely type:

ALTER SESSION SET current_schema ctx;
@?/ctx/admin/drvdoc.plb

And we will see the object valid again. At least in most cases.

Policy Templates

It might be quite annoying to repeat each of this step for each database/host again and again – but we don't have to. Once we fugured out which rules we need to adjust, we can create a template for this to apply to a list of targets. To do so, first click on the Setup link on the top of the tabs in the upper left of the page. On the next page, select Monitoring Templates from the link-list on the left of the page. The table in the center of the next page is probably still empty – so click on the Create button to create a new template. Now select a target you already configured accordingly, and you can chose the policies which need to be modified to be part of the template. You also can modify the rules in this template to be more generic, or whatever you need.

Once you've setup one or more templates, you can apply them to one or multiple targets using the Apply button and selecting the target(s).

Troubleshooting

There are several common problems arising during or shortly after the installation of GridControl. Here we will list some of them, and how they may be solved.

Installation and Konfiguration

Unknown target: generic_mom_managed_host

Problem: The management home page lists an unknown target called generic_mom_managed_host, the status is pending.

Cause: Well, they really install everything … MOM stands for Microsoft Operations Management – and you probably don't use that stuff. This connector enables OEM to receive alerts from MOM. I.e., if you don't have MOM running somewhere, the connector cannot connect.

Solution: What you don't need you can simply kick off:

  1. Click on the "unknown" link
  2. Click on "generic_mom_managed_host"
  3. Click on "Connector Setup"
  4. Select "Microsoft Operations Manager Connector"
  5. Click on "Delete"
  6. Click on "Yes"

Error on pulling the 10.2.0.4 agent

Problem: You want to install the agent on some remote host using the Pull method. After downloading the agentDownload.* script and executing it, you simply get an error 404: Not Found.

Cause: Looking into the <OMS Home>/sysman/agent_download/10.2.0.4 directory you will notice that the files there are indeed missing – so the script is not able to download them. Though you applied the patches to both OMS and the agent, it only added an empty directory here with the script plus one text file.

Solution: If you ran into this problem, you missed the paragraph on an additional download here, so please go back and check ;)

It looks like Oracle wants us to roll out the old 10.2.0.2 agent and then patch it on every machine - not a very practical way! Or, as the doc/ReleaseNotes_EMGC_10.2.0.4.html file mentions in the chapter Post Installation Tasks, you overlooked to download the Clone Support Files from Metalink. Nice that they mention this – but they don't tell where to find them. Searching for a place on Metalink to download them from only returns suitable results, if you know exactly where and how to search:

Tab "Patches and Updates", follow the link *"Advanced Search" here. Chose Enterprise Manager Grid Control (emgrid) as "Product", set "Patch Type" to Any, enter CLONE SUPPORT FILES FOR% in the "Description" field. Now also set "Priority" to Any, and submit the form – to find out that the only available patch for this issue applies to the cloning of Oracle 9i databases only – so if you (no longer) have such, you don't need this patch …

Re-Install an Agent

Problem: You have an agent installed on some machine, but want to replace it.

Cause: Yes, there could be some: e.g. you want to replace an older version of the agent with a newer one, but don't want to patch it, or the prior installation broke or failed, or you put it into the wrong directory (according to the local policies) …

Solution: Do NOT try to simply install the new agent on top or next to the existing one!!! This will either fail if using the same directory, or you will run into trouble with alerts like Number of Duplicate Targets exceeded the critical threshold. Instead, follow below rules:

  1. Make sure you removed all targets (which watched by the affected agent) from the OMS repository
  2. Make sure the (old) agent is stopped
  3. If the agent is still reported "Down" in GridControl even after waiting for a reasonal time (i.e. up to 15 minutes), and you find no way to remove it there: Login to the repository database as SYSMAN, and run exec mgmt_admin.cleanup_agent('<myserver>:<port>') (see Metalink Note 454081.1). Admitted: This is the "Die Hard" way finally getting rid of it ;)
  4. On the target host, login as oracle, change to the directory $AGENT_HOME/oui/bin and execute the runInstaller script. Delete the current agent using the installer.
  5. Now you can install the new version, as described before
  6. Grrr … and now we have to re-configure the "new" targets again! With some luck, you created templates before, which you now only have to apply …

Alerts and Policies

Grid reports the Listener to be down - but it is running

Problem: Sometimes GridControl reports your Listener is down. Shocked as you are, you log in to the machine and check with lsnrctl status – and this command reports: Everything fine, I'm up and running, and I even know all databases etc. I should know.

Cause: Who knows? It was reported OK by GridControl immediately after installation. Then you possibly restarted either the listener, the OMS (e.g. for patching) or both - and the problem was there.

Solution: There may be some. If you search on Metalink, you'll find some SRs on this issue – but none solved. Check Google, and you'll find some more. Advices vary and even include manual changes to configuration files. Before you fiddle around with those, try a simple approach:

In GridControl, select "Agents" from the drop-down search list and submit. Then click on the associated agent and list its targets. Search for the listener, and remove it. Manually add it again with the correct data – and it will probably work.

PLS-00201: identifier 'MGMT_CNTR_TT.CLEAN_UP_OLD_TICKET_RECS' must be declared

Problem: Soon after applying the patch, above error message appears in the alert log – execution of some job failed due to this.

Cause: The patch submitted the job using the wrong user (SYS instead of SYSMAN). But even with the right owner, it raises another error (see Metalink Note 416352.1" for details).

Solution: Above note writes about a couple of bugs filed for this issue with GridControl 10.2.0.3 already – obviously they are still not fixed with 10.2.0.4. The workaround recommended by above note is to simply remove the job using

SQL> exec dbms_job.remove(<job_id>)

Alright: a non-existing job can no longer fail …

Alerts: Warning and errors because a Standby Database is not open

Problem: There is a bunch of warnings and errors for a standby database, tracing back to (or even explicitely mentioning) an ORA-01219: database not open: queries allowed on fixed tables/views only error.

Cause: Bug in the agent (see e.g. MetaLink note 403223.1), which should have been fixed for 10.2.0.4 according to that note – but still is not.

Solution: Good question. There is a patch available on MetaLink (see bug 6442193 and/or search for the patch with the same number) to be applied on top of the 10.2.0.4 agent. But even after doing so (plus removing and re-discovering the standby database as mentioned in above note), the errors persist.

After not receiving any usable answer from Metalink for about 2 months, I found the solution as a side-effect from another task: If you enabled the Data Guard Broker, these errors disappear! To do so:

  1. Open the GridControl web page
  2. Click on the "Targets" tab
  3. In the sub-navigation, select "Databases"
  4. Click on the primary instance belonging to the standby in question
  5. In the section "High Availability", click on the link Primary
  6. If asked for login credentials, provide the necessary data for a SYSDBA account (e.g. SYS)
  7. In the "Overview" section, click on the link Use Data Guard Broker
  8. Follow the instructions to setup Data Guard

As a side-effect, doing a SwitchOver or FailOver now is only a matter of a few mouse clicks, since the Data Guard Broker cares for everything else …

Even though Metalink claims they have a patch solving this issue (Patch #6442193 for agent 10.2.0.4 - once applied, you should remove and rediscover the standby database in Grid Control in order to get the existing alerts cleared), this patch did not do anything for me – I even got more ORA-01219 errors reported. As far as I can tell, configuring the Data Guard Broker seems to be the only almost-solution for now. It still leaves the two warnings about the database being MOUNTED (instead of open) – and all the other alerts maybe re-appear after a restart …

Trouble with the Open Ports policy

Problem: Initially there where 5 open ports reported. Since those should be open (usually, it's the Oracle ports like listener, agent, etc. which are needed to be open), they were added to the exclusion list. Now there are no more open ports reported – but there are for sure more ports open on that host!

Cause: By default, this policy has set a maximum of 5 entries, including the exception list – so it will never report more than 5 ports. If you already have 5 entries in the exclusion list, that's it – you will never get an open ports alert anymore, regardless what.

Solution: If you need this policy to report more, you must increase the maximum allowed entries for it. This can be done as follows:

Click on the "Jobs" tab. From the "Create Job" drop-down, select Security Policy Configuration. Click the "Go" button. Now give the job some name (and optionally description). Leave "Target Type" to Host, and use the "Add" button to add the target hosts where you want to change the policy. Click the "Parameters" tab then. In the "Security Policy" drop-down, select Open Ports, and in the "Max Row Count" input put the number of ports you want to have reported (put All to remove the limit completely). Next, select the "Credentials" tab. If you did not yet setup preferred credentials for the hosts you want to change that policy on, check the "Override Preferred Credentials" radio button, and enter a valid user/password combination. Note: It will NOT work to use the root login and sudo to oracle (the job will fail, claiming there's no sudo available even if it is) – but the job will succeed with the root user alone (without sudo to anything). Finally, press the "Submit" button.

Once the job was successfully executed on the target hosts, you will get more open ports alerted as soon as the policy is evaluated next on these.

Target Alert: Error - Missing Properties

Problem: In the target alerts, we find one or more errors reported with the message Missing Properties: […]. The error type is Collection Failure.

Cause: The dynamic properties are not being computed correctly (see Metalink Bug 5283003).

Solution: Metalink offers a solution in note 422847.1. In short: On the target host, issue the command

$AGENT_HOME/bin/emctl reload agent dynamicproperties <target>:<type>
$AGENT_HOME/bin/emctl clearstate agent
$AGENT_HOME/bin/emctl upload

(<target> is the name of the affected target, as reported in the target column of the error report; <type> is the target type, e.g. host for a host, or oracle_database for an oracle database). According to the note, this should clear the alert. Then give it 5-10 minutes, and check the alert list – these alerts should have been cleared. For us, it helped on the second try; so you may need to give it another chance …

Massive collection errors with ORA-00600

Problem: In the target alerts, you suddenly find a bunch of collection errors, the details show some ORA-00600 [17059] on the repository database. About one per minute, throwing partially huge trace files.

Cause: Your repository database is probably Oracle 10.2.0.3 on Windows or Linux, and you hit the bug 5705795.

Solution: Congratulations! You just won the lottery for another patch set. Please confirm with Metalink support first (file a Service Request) just to make sure that this fix applies to your environment. If so, they will tell you to download patch #5705795 and apply it against your repository database (yepp – this time we do not patch the Grid software, but the database software).

Remark: Despite of the patch set #5705795 being applied, the error may still persist. In this case, it should help to upgrade the database to version 10.2.0.4 – but if that's the permanent solution, I cannot (yet) tell …

Alerts: "User SYS logged on from …"

Problem: If you want to monitor a dataguard environment, you probably configured these databases to use the SYS user. Now you have a permanent alert in warnings for these database, telling you that SYS connected from the database host – no wonder, because this is the agent itself! So what to do about this?

I opened a Service Request at Metalink for this issue. They point to the $ORACLE_HOME/rdbms/admin/catsnmp.sql file to check the permissions given to the DBSNMP monitor user (see also Metalink Note 315340.1), which is basically what is described above (plus GRANT EXECUTE ON DBMS_SERVER_ALERT directly to the user – which does not change anything).

Cause: As you may already have guessed, the agent simply reads all sessions from v$sessions. Since it usually holds more than one connection to the database, it does not even try to exclude itself from the result set.

Solution: There will probably never be a fix from Oracle for this issue. Quoting the Metalink co-ordinator for the responsible team: "This is a configuration issue. no statement need to be changed in the xml/xmlp file. Just disable this policy on the databases that you would like to monitor under the SYS account." So we have to care ourselves: The following work-around solves the issue – but note that this is not officially supported by Oracle:

Repeat these steps for all affected hosts, and the agents should no longer alert themselves.

2018-12-16