Quantcast
Channel: All Data Protector Practitioners Forum posts
Viewing all articles
Browse latest Browse all 10494

Re: Restore problem

$
0
0

First, I would advise you to reduce your 'global' variable

 

SmMaIdleTimeout =800

 

back down to the deafult of 140 minutes

 

I would also recommend setting

 

SmDaIdleTimeout=120

 

Save the files, stop and restart DP

 

I disagree with a previous poster, this is NOT a media agent timeout, the error message clearly states "the file system Restore DA".  From what I can make out from the pictures, it appears that your Disk Agent 'repsrv'  is HPUX

 

IN every Disk Agent of Media Agent timeout, I recommend 2 things

 

   - Check the server involved for any hanging processes.  ON the server 'repsrv' run the command

             ps -ef | grep omni

     If you see any hanging processes, then kill them

 

  -  Whether you do or not, I recommend that you implement KeepAlive on the disk agent 'repsrv'.  Most of your timeout issues are resolved by implementing KeepAlive

 

                ==================

 

Many problems encountered with Data Protector can be helped by using the KeepAlive functionality. These include Disk Agent and Media Agent timeouts, and backups seeming to ‘Complete’, but, never actually finishing. You may even see Session Statistics printed, but don’t get the popup when the session is completed

The keepalive parameter just turns on the system mechanism for keeping a network connection alive for all the connections we make

The keepalive packets are part of the TCP protocol on machines, and are transparent to the program opening or using the connection. They are handled by the system and are not seen by application. They normally default to 7200 seconds, or 2 hours.

In certain circumstances, it is important that Gateways using TCP/IP socket communications periodically send TCP messages even if they have nothing to say. These null messages called “keepalive” packets and help inform networking infrastructure that the endpoints are still there, connected, and expect the TCP/IP socket to stay connected even though at the moment, they don’t have any data to exchange.

From a HPUX Operating System perspective, check the link

http://www.filibeto.org/unix/hp-ux/lib/kernel/perftun/tcp_ip-performance-wp-c02020743.pdf pg 16

On the server where you are seeing the problem, edit the file

/opt/omni/.omnirc The ‘dot’ is part of the filename

This file does not exist by default. If you have this file, you can edit it, and, if not, you can create it using the HPUX ‘touch’ command

touch /opt/omni/.omnirc

Either way, add this line to the file:

OB2IPCKEEPALIVE=1 Activates the KeepAlive mechanism

After making the changes to the ‘omnirc’ file, save the file, making sure that it has no extension, like ‘txt’ or TMPL’

You can add this file on any server which is showing a problem, either a Disk Agent or a Media Agent. Generally speaking, the changes do not have to be made on the Cell Manager, unless it is also a Media Agent having this issue

So, for example, after adding the KeepAlive switch to the ‘omnirc’ file, you need to set the kernel parameters to change the default value of 7200 seconds.

From the command prompt on any UNIX server. “ndd” can be used to get/set the kernel parameter for keepalive packets interval

ndd -get /dev/tcp tcp_keepalive_interval

will report the interval in milliseconds. To change this, run the command

ndd –set /dev/tcp tcp_keepalive_interval [milliseconds]

So, for example, to set this to 15 minutes,

ndd –set /dev/tcp tcp_keepalive_interval 900000

and run this again to be sure the parameter is set correctly
ndd -get /dev/tcp tcp_keepalive_interval

15 minutes is a reasonable value. The key is to set these Values low because most firewalls, etc. timeout after an hour or so. Having the interval set to 15 or even 30 min. should not hurt anything.

You may have seen some reference to the ‘omnirc’ environment values

OB2IPCKEEPALIVETIME
OB2IPCKEEPALIVEINTERVAL


It needs to be emphasized that OB2IPCKEEPALIVETIME and OB2IPCKEEPALIVEINTERVAL are used on Windows servers only, and have no effect on HPUX or Linux servers

If you want to add this line to the .omnirc’ file proactively, and set the kernel parameters,, that is, without seeing the problem, this is acceptable to cut off a possible future problem

Up to this point, we have seen no negative effects to setting KeepAlive on HPUX servers

 

      =================

 

You may also want to do this on other servers in your cell as a pro-active step

 

 


Viewing all articles
Browse latest Browse all 10494

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>