Class ThrottledFetcher.ThrottledConnection

  • All Implemented Interfaces:
    IThrottledConnection
    Enclosing class:
    ThrottledFetcher

    protected static class ThrottledFetcher.ThrottledConnection
    extends java.lang.Object
    implements IThrottledConnection
    Throttled connections. Each instance of a connection describes the bins to which it belongs, along with the actual open connection itself, and the last time the connection was used.
    • Field Detail

      • fetchThrottler

        protected final org.apache.manifoldcf.connectorcommon.interfaces.IFetchThrottler fetchThrottler
        Fetch throttler
      • protocol

        protected final java.lang.String protocol
        Protocol
      • server

        protected final java.lang.String server
        Server
      • port

        protected final int port
        Port
      • authentication

        protected final PageCredentials authentication
        Authentication
      • expireTime

        protected long expireTime
        This is when the connection will expire. Only valid if connection is in the pool.
      • connManager

        protected org.apache.http.conn.HttpClientConnectionManager connManager
        The http connection manager. The pool is of size 1.
      • httpClient

        protected org.apache.http.client.HttpClient httpClient
        The http client object.
      • fetchMethod

        protected org.apache.http.client.methods.HttpRequestBase fetchMethod
        The method object
      • throwable

        protected java.lang.Throwable throwable
        The error trace, if any
      • myUrl

        protected java.lang.String myUrl
        The current URL being fetched
      • statusCode

        protected int statusCode
        The status code fetched, if any
      • fetchType

        protected java.lang.String fetchType
        The kind of fetch we are doing
      • fetchCounter

        protected long fetchCounter
        The current bytes in the current fetch
      • startFetchTime

        protected long startFetchTime
        The start of the current fetch
      • lastFetchCookies

        protected LoginCookies lastFetchCookies
        The cookies from the last fetch
      • proxyHost

        protected final java.lang.String proxyHost
        Proxy host
      • proxyPort

        protected final int proxyPort
        Proxy port
      • proxyAuthDomain

        protected final java.lang.String proxyAuthDomain
        Proxy auth domain
      • proxyAuthUsername

        protected final java.lang.String proxyAuthUsername
        Proxy auth user name
      • proxyAuthPassword

        protected final java.lang.String proxyAuthPassword
        Proxy auth password
      • httpsSocketFactory

        protected final javax.net.ssl.SSLSocketFactory httpsSocketFactory
        Https protocol
      • socketTimeoutMilliseconds

        protected final int socketTimeoutMilliseconds
        Socket timeout milliseconds
      • connectionTimeoutMilliseconds

        protected final int connectionTimeoutMilliseconds
        Connection timeout milliseconds
      • threadStarted

        protected boolean threadStarted
        Set if thread has been started
      • abortCheck

        protected AbortChecker abortCheck
        Abort checker
    • Constructor Detail

      • ThrottledConnection

        public ThrottledConnection​(ThrottledFetcher.ConnectionPool myPool,
                                   org.apache.manifoldcf.connectorcommon.interfaces.IFetchThrottler fetchThrottler,
                                   java.lang.String protocol,
                                   java.lang.String server,
                                   int port,
                                   PageCredentials authentication,
                                   javax.net.ssl.SSLSocketFactory httpsSocketFactory,
                                   java.lang.String proxyHost,
                                   int proxyPort,
                                   java.lang.String proxyAuthDomain,
                                   java.lang.String proxyAuthUsername,
                                   java.lang.String proxyAuthPassword,
                                   int socketTimeoutMilliseconds,
                                   int connectionTimeoutMilliseconds)
        Constructor. Create a connection with a specific server and port, and register it as active against all bins.
    • Method Detail

      • hasExpired

        public boolean hasExpired​(long currentTime)
        Check whether the connection has expired.
        Specified by:
        hasExpired in interface IThrottledConnection
        Parameters:
        currentTime - is the current time to use to judge if a connection has expired.
        Returns:
        true if the connection has expired, and should be closed.
      • logFetchCount

        public void logFetchCount​(int count)
        Log the fetch of a number of bytes, from within a stream.
      • beginFetch

        public void beginFetch​(java.lang.String fetchType)
                        throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                               org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Begin the fetch process.
        Specified by:
        beginFetch in interface IThrottledConnection
        Parameters:
        fetchType - is a short descriptive string describing the kind of fetch being requested. This is used solely for logging purposes.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • executeFetch

        public void executeFetch​(java.lang.String urlPath,
                                 java.lang.String userAgent,
                                 java.lang.String from,
                                 boolean redirectOK,
                                 java.lang.String host,
                                 FormData formData,
                                 LoginCookies loginCookies)
                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                 org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Execute the fetch and get the return code. This method uses the standard logging mechanism to keep track of the fetch attempt. It also signals the following conditions: ServiceInterruption (if a dynamic error occurs), or ManifoldCFException if a fatal error occurs, or nothing if a standard protocol error occurs. Note that, for proxies etc, the idea is for this fetch request to handle whatever redirections are needed to support proxies.
        Specified by:
        executeFetch in interface IThrottledConnection
        Parameters:
        urlPath - is the path part of the url, e.g. "/robots.txt"
        userAgent - is the value of the userAgent header to use.
        from - is the value of the from header to use.
        redirectOK - should be set to true if you want redirects to be automatically followed.
        host - is the value to use as the "Host" header, or null to use the default.
        formData - describes additional form arguments and how to fetch the page.
        loginCookies - describes the cookies that should be in effect for this page fetch.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getResponseCode

        public int getResponseCode()
                            throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                   org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get the http response code.
        Specified by:
        getResponseCode in interface IThrottledConnection
        Returns:
        the response code. This is either an HTTP response code, or one of the codes above.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getLastFetchCookies

        public LoginCookies getLastFetchCookies()
                                         throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get the last fetch cookies.
        Specified by:
        getLastFetchCookies in interface IThrottledConnection
        Returns:
        the cookies now in effect from the last fetch.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getResponseHeaders

        public java.util.Map<java.lang.String,​java.util.List<java.lang.String>> getResponseHeaders()
                                                                                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                                                                         org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get response headers
        Specified by:
        getResponseHeaders in interface IThrottledConnection
        Returns:
        a map keyed by header name containing a list of values.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getResponseHeader

        public java.lang.String getResponseHeader​(java.lang.String headerName)
                                           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                  org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get a specified response header, if it exists.
        Specified by:
        getResponseHeader in interface IThrottledConnection
        Parameters:
        headerName - is the name of the header.
        Returns:
        the header value, or null if it doesn't exist.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getResponseBodyStream

        public java.io.InputStream getResponseBodyStream()
                                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                         org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get the response input stream. It is the responsibility of the caller to close this stream when done.
        Specified by:
        getResponseBodyStream in interface IThrottledConnection
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • getLimitedResponseBody

        public java.lang.String getLimitedResponseBody​(int maxSize,
                                                       java.lang.String encoding)
                                                throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                       org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Get limited response as a string.
        Specified by:
        getLimitedResponseBody in interface IThrottledConnection
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • noteInterrupted

        public void noteInterrupted​(java.lang.Throwable e)
        Note that the connection fetch was interrupted by something.
        Specified by:
        noteInterrupted in interface IThrottledConnection
      • doneFetch

        public void doneFetch​(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities)
                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Done with the fetch. Call this when the fetch has been completed. A log entry will be generated describing what was done.
        Specified by:
        doneFetch in interface IThrottledConnection
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • close

        public void close()
        Close the connection. Call this to return the connection to its pool.
        Specified by:
        close in interface IThrottledConnection
      • handleHTTPException

        protected void handleHTTPException​(org.apache.http.HttpException e,
                                           java.lang.String activity)
                                    throws org.apache.manifoldcf.agents.interfaces.ServiceInterruption,
                                           org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Throws:
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
      • handleIOException

        protected void handleIOException​(java.io.IOException e,
                                         java.lang.String activity)
                                  throws org.apache.manifoldcf.agents.interfaces.ServiceInterruption,
                                         org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Throws:
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        org.apache.manifoldcf.core.interfaces.ManifoldCFException