Class ThrottledFetcher
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher
-
public class ThrottledFetcher extends java.lang.ObjectThis class uses httpclient to fetch stuff from webservers. However, it additionally controls the fetch rate in two ways: first, controlling the overall bandwidth used per server, and second, limiting the number of simultaneous open connections per server. An instance of this class would very probably need to have a lifetime consistent with the long-term nature of these values, and be static.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static classThrottledFetcher.ConnectionPoolEach connection pool has identical connections we can draw on.protected static classThrottledFetcher.ConnectionPoolKeyConnection pool keyprotected static classThrottledFetcher.ExecuteMethodThreadThis thread does the actual socket communication with the server.protected static classThrottledFetcher.LaxBrowserCompatSpecProviderClass to create a cookie spec.protected static classThrottledFetcher.OurBasicCookieStoreprotected static classThrottledFetcher.PoolExceptionPool exception classprotected static classThrottledFetcher.ThrottledConnectionThrottled connections.protected static classThrottledFetcher.ThrottledInputstreamThis class throttles an input stream based on the specified byte rate parameters.protected static classThrottledFetcher.WaitExceptionWait exception class
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsidprotected static java.util.Map<ThrottledFetcher.ConnectionPoolKey,ThrottledFetcher.ConnectionPool>connectionPoolsConnection pools.protected static longidleTimeoutIdle timeoutprotected static intREAD_CHUNK_LENGTHThe read chunk lengthprotected static booleanrecordEverythingThis flag determines whether we record everything to the disk, as a means of doing a web snapshotprotected static longTIME_15MINprotected static longTIME_1DAYprotected static longTIME_2HRSprotected static longTIME_5MINprotected static longTIME_6HRSprotected static java.lang.StringwebThrottleGroupTypeWeb throttle group type
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidflushIdleConnections(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext)Flush connections that have timed out from inactivity.static IThrottledConnectiongetConnection(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, java.lang.String throttleGroupName, java.lang.String protocol, java.lang.String server, int port, PageCredentials authentication, org.apache.manifoldcf.connectorcommon.interfaces.IKeystoreManager trustStore, org.apache.manifoldcf.connectorcommon.interfaces.IThrottleSpec throttleDescription, java.lang.String[] binNames, int connectionLimit, java.lang.String proxyHost, int proxyPort, java.lang.String proxyAuthDomain, java.lang.String proxyAuthUsername, java.lang.String proxyAuthPassword, int socketTimeoutMilliseconds, int connectionTimeoutMilliseconds, org.apache.manifoldcf.crawler.interfaces.IAbortActivity activities)Obtain a connection to specified protocol, server, and port.
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
webThrottleGroupType
protected static final java.lang.String webThrottleGroupType
Web throttle group type- See Also:
- Constant Field Values
-
idleTimeout
protected static final long idleTimeout
Idle timeout- See Also:
- Constant Field Values
-
recordEverything
protected static final boolean recordEverything
This flag determines whether we record everything to the disk, as a means of doing a web snapshot- See Also:
- Constant Field Values
-
TIME_2HRS
protected static final long TIME_2HRS
- See Also:
- Constant Field Values
-
TIME_5MIN
protected static final long TIME_5MIN
- See Also:
- Constant Field Values
-
TIME_15MIN
protected static final long TIME_15MIN
- See Also:
- Constant Field Values
-
TIME_6HRS
protected static final long TIME_6HRS
- See Also:
- Constant Field Values
-
TIME_1DAY
protected static final long TIME_1DAY
- See Also:
- Constant Field Values
-
READ_CHUNK_LENGTH
protected static final int READ_CHUNK_LENGTH
The read chunk length- See Also:
- Constant Field Values
-
connectionPools
protected static final java.util.Map<ThrottledFetcher.ConnectionPoolKey,ThrottledFetcher.ConnectionPool> connectionPools
Connection pools. /* This is a static hash of the connection pools in existence. Each connection pool represents a set of identical connections.
-
-
Method Detail
-
getConnection
public static IThrottledConnection getConnection(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, java.lang.String throttleGroupName, java.lang.String protocol, java.lang.String server, int port, PageCredentials authentication, org.apache.manifoldcf.connectorcommon.interfaces.IKeystoreManager trustStore, org.apache.manifoldcf.connectorcommon.interfaces.IThrottleSpec throttleDescription, java.lang.String[] binNames, int connectionLimit, java.lang.String proxyHost, int proxyPort, java.lang.String proxyAuthDomain, java.lang.String proxyAuthUsername, java.lang.String proxyAuthPassword, int socketTimeoutMilliseconds, int connectionTimeoutMilliseconds, org.apache.manifoldcf.crawler.interfaces.IAbortActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Obtain a connection to specified protocol, server, and port. We use the protocol because the setup for some protocols is extensive (e.g. https) and hopefully would not need to be repeated if we distinguish connections based on that.- Parameters:
protocol- is the protocol, e.g. "http"server- is the server IP address, e.g. "10.32.65.1"port- is the port to connect to, e.g. 80. Pass -1 if the default port for the protocol is desired.authentication- is the page credentials object to use for the fetch. If null, no credentials are available.trustStore- is the current trust store in effect for the fetch.binNames- is the set of bins, in order, that should be used for throttling this connection. Note that the bin names for a given IP address and port MUST be the same for every connection! This must be enforced by whatever it is that builds the bins - it must do so given an IP and port.throttleDescription- is the description of all the throttling that should take place.connectionLimit- isthe maximum number of connections permitted.- Returns:
- an IThrottledConnection object that can be used to fetch from the port.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
flushIdleConnections
public static void flushIdleConnections(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionFlush connections that have timed out from inactivity.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
-