HttpClient vs HtmlUnit - java

I know that HtmlUnit simulates a browser, while HttpClient doesn't.
In HtmlUnit, when a page is loaded and there is a JavaScript inside, will the script be executed? If the script sets a cookie, will the cookie set in HtmlUnit's browser and accessible from Java code?
Is there anything that can be done using HttpClient, but not using HtmlUnit? In HtmlUnit, can we start with a POST request and modify any part of HTTP request including method, URI, HTTP version, headers, and body?
What are the advantages of HttpClient over HtmlUnit?

HttpClient is a library at a lower-level, to send HTTP requests and retrieve responses.
HtmlUnit is at a higher level, and internally uses HttpClient to make HTTP requests, but also handles JavaScript (through Rhino and internal DOM implementation), XPath (through Xalan), CSS (through CSSParser), malformed HTML (through NekoHtml), WebSockets (through Jetty), etc.
You can modify the outgoing requests and response in HtmlUnit by something like:
new WebConnectionWrapper(webClient) {
public WebResponse getResponse(WebRequest request) throws IOException {
WebResponse response = super.getResponse(request);
if (request.getUrl().toExternalForm().contains("my_url")) {
String content = response.getContentAsString("UTF-8");
//change content
WebResponseData data = new WebResponseData(content.getBytes("UTF-8"),
response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders());
response = new WebResponse(data, request, response.getLoadTime());
}
return response;
}
};
as hinted here.
You can change the used HttpClient in HtmlUnit by overriding HttpWebConnection.createHttpClient().
You can make POST request by:
WebRequest webRequest = new WebRequest(url, HttpMethod.POST);
HtmlPage page = webClient.getPage(webRequest);

Related

Why do I get a 421 Misdirected Request error with HtmlUnit and not with curl, OKHttp and standard browsers?

I have a HtmlUnit "script" that has been running for months, but started to fail about a week ago. Some changes were most probably made on the backend of the web site targeted by my script, but I don't have all the details unfortunately.
Basically the script throws an exception on the first attempt to access the main page. The code looks like this:
try (final WebClient webClient = new WebClient())
{
webClient.getOptions().setRedirectEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);
HtmlPage loginPage = webClient.getPage("https://example.com/");
}
The exception thrown:
com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 421 Misdirected Request for https://example.com/
I am able to open that same page with curl using this command:
curl -L https://example.com
Same thing when I try with OKHttp client, it works and causes no error.
Since I could not find a solution with HtmlUnit, I have tried generate the redirected url using HttpClient:
private static Optional<String> getRedirectUrl() throws Exception
{
final HttpClient client = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_2)
.followRedirects(HttpClient.Redirect.NEVER)
.build();
final HttpResponse<String> response = client.send(HttpRequest.newBuilder(new URI(EXAMPLE_SITE)).GET().build(), HttpResponse.BodyHandlers.ofString());
return response.headers().firstValue("location");
}
But then I get a 403 Forbidden error when I attempt to access this redirected url using HtmlUnit (it works if I open the redirected URL in a standard browser).
I am wondering if there are any limitations with HtmlUnit that I am not aware of?

how to create client TGT with java cxf

I'm new to the java rest CXF client. I will make various requests to a remote server, but first I need to create a Ticket Granting Ticket (TGT). I looked through various sources but I could not find a solution. The server requests that I will create a TGT are as follows:
Content-Type: text as parameter, application / x-www-form-urlencoded as value
username
password
I create TGT when I make this request with the example URL like below using Postman. (URL is example). But in the code below, I'm sending the request, but the response is null. Could you help me with the solution?
The example URL that I make a request with POST method using Postman: https://test.service.com/v1/tickets?format=text&username=user&password=pass
List<Object> providers = new ArrayList<Object>();
providers.add(new JacksonJsonProvider());
WebClient client = WebClient.create("https://test.service.com/v1/tickets?format=text&username=user&password=pass", providers);
Response response = client.getResponse();
You need to do a POST, yet you did not specify what your payload looks like?
Your RequestDTO and ResponseDTO have to have getters/setters.
An example of using JAX-RS 2.0 Client.
Client client = ClientBuilder.newBuilder().register(new JacksonJsonProvider()).build();
WebTarget target = client.target("https://test.service.com/v1/tickets");
target.queryParam("format", "text");
target.queryParam("username", "username");
target.queryParam("password", "password");
Response response = target.request().accept(MediaType.APPLICATION_FORM_URLENCODED).post(Entity.entity(yourPostDTO,
MediaType.APPLICATION_JSON));
YourResponseDTO responseDTO = response.readEntity(YourResponseDTO.class);
int status = response.getStatus();
Also something else that can help is if you copy the POST request from POSTMAN as cURL request. It might help to see the differences between your request and POSTMAN. Perhaps extra/different headers are added by postman?
Documentation: https://cxf.apache.org/docs/jax-rs-client-api.html#JAX-RSClientAPI-JAX-RS2.0andCXFspecificAPI
Similar Stackoverflow: Is there a way to configure the ClientBuilder POST request that would enable it to receive both a return code AND a JSON object?

How to follow-through on HTTP 303 status code when using HttpClient in Java 11 and later?

When using the java.net.http.HttpClient classes in Java 11 and later, how does one tell the client to follow through an HTTP 303 to get to the redirected page?
Here is an example. Wikipedia provides a REST URL for getting the summary of a random page of their content. That URL redirects to the URL of the randomly-chosen page. When running this code, I see the 303 when calling HttpResponse#toString. But I do not know how to tell the client class to follow along to the new URL.
HttpClient client = HttpClient.newHttpClient();
HttpRequest request =
HttpRequest
.newBuilder()
.uri( URI.create( "https://en.wikipedia.org/api/rest_v1/page/random/summary" ) )
.build();
try
{
HttpResponse < String > response = client.send( request , HttpResponse.BodyHandlers.ofString() );
System.out.println( "response = " + response ); // ⬅️ We can see the `303` status code.
String body = response.body();
System.out.println( "body = " + body );
}
catch ( IOException e )
{
e.printStackTrace();
}
catch ( InterruptedException e )
{
e.printStackTrace();
}
When run:
response = (GET https://en.wikipedia.org/api/rest_v1/page/random/summary) 303
body =
Problem
You're using HttpClient#newHttpClient(). The documentation of that method states:
Returns a new HttpClient with default settings.
Equivalent to newBuilder().build().
The default settings include: the "GET" request method, a preference of HTTP/2, a redirection policy of NEVER [emphasis added], the default proxy selector, and the default SSL context.
As emphasized, you are creating an HttpClient with a redirection policy of NEVER.
Solution
There are at least two solutions to your problem.
Automatically Follow Redirects
If you want to automatically follow redirects then you need to use HttpClient#newBuilder() (instead of #newHttpClient()) which allows you to configure the to-be-built client. Specifically, you need to call HttpClient.Builder#followRedirects(HttpClient.Redirect) with an appropriate redirect policy before building the client. For example:
HttpClient client =
HttpClient.newBuilder()
.followRedirects(HttpClient.Redirect.NORMAL) // follow redirects
.build();
The different redirect policies are specified by the HttpClient.Redirect enum:
Defines the automatic redirection policy.
The automatic redirection policy is checked whenever a 3XX response code is received. If redirection does not happen automatically, then the response, containing the 3XX response code, is returned, where it can be handled manually.
There are three constants: ALWAYS, NEVER, and NORMAL. The meaning of the first two is obvious from their names. The last one, NORMAL, behaves just like ALWAYS except it won't redirect from https URLs to http URLs.
Manually Follow Redirects
As noted in the documentation of HttpClient.Redirect you could instead manually follow a redirect. I'm not well versed in HTTP and how to properly handle all responses so I won't give an example here. But I believe, at a minimum, this requires you:
Check the status code of the response.
If the code indicates a redirect, grab the new URI from the response headers.
If the new URI is relative then resolve it against the request URI.
Send a new request.
Repeat 1-4 as needed.
Obviously configuring the HttpClient to automatically follow redirects is much easier (and less error-prone), but this approach would give you more control.
Please find below code where i was calling another api from my REST APi in java.
To note I am using java version 17. This will solve error code 303.
#GetMapping(value = "url/api/url")
private String methodName() throws IOException, InterruptedException {
var url = "api/url/"; // remote api url which you want to call
System.out.println(url);
var request = HttpRequest.newBuilder().GET().uri(URI.create(url)).setHeader("access-token-key", "accessTokenValue").build();
System.out.println(request);
var client = HttpClient.newBuilder().followRedirects(HttpClient.Redirect.NORMAL).build();
System.out.println(client);
var response = client.send(request, HttpResponse.BodyHandlers.ofString());
System.out.println(response);
System.out.println(response.body());
return response.body();
}

Apache HttpClient custom dynamic header on every request

I am writing a java client for a protected api service using Apache's HttpClient. I was wondering if it is possible to add a dynamic header to each request automatically instead of having to add the header on every HttpGet or HttpPost instance. The header needs to take the request URL and the request method (GET or POST), because of this requirement I cannot just simply add it to the default request headers when building the HttpClient. Thanks
Use custom request interceptor
CloseableHttpClient client = CachingHttpClients.custom()
.addInterceptorLast((HttpRequestInterceptor) (request, context) -> {
String method = request.getRequestLine().getMethod();
String requestUri = request.getRequestLine().getUri();
request.addHeader("x-my-header", doSomethingClever(method, requestUri));
})
.build();

Posting to a form based on what HttpGet returns with Apache's HttpClient

I'm posting data to a website form using Apache's HttpClient class. The form is retrieved using the following lines of code:
HttpGet get = new HttpGet(url);
HttpResponse response = client.execute(get);
The website that I'm retrieving the form from requires authentication to access the form. If the request isn't authenticated, the website redirects the request to a login form page that will subsequently redirect back to the original page on successful authentication.
I want to cleanly detect whether or not the GET request returns the login page or the desired form page so that I can either POST login data or form data. The only way I can think of to do this is by reading from the content InputStream of the entity of the response and parsing each line. But that seems somewhat convoluted. I haven't worked with the Apache HttpComponents api before so I'm not sure if this would be the only and best way to accomplish what I want to accomplish.
EDIT: To clarify question, I'm asking if there is a set way to handle forms with Apache's HttpClient. I somewhat know how to achieve what I'm looking to do, but it looks very ugly and I'm hoping there is an easier and faster way to achieve it. For example, if there was some way to do the following:
HttpGet get = new HttpGet(url);
HttpResponse response = client.execute(get);
if(parseElements(response.getEntity()).hasFormWithId("login")) {
// post authentication data
} else {
// post actual form data
}
Because of my inexperience with Apache's HttpClient api, I'm not sure if what I'm looking for in the API is too abstract for the intent of the API.
You can modify the behavior of the HttpClient by setting the HttpClient Parameters
DefaultHttpClient client = new DefaultHttpClient();
client.setDefaultHttpParams(client.getParams().setBoolean(ClientPNames.HANDLE_REDIRECTS, false));
Which disables handling redirects automatically.
See also:
Automatic redirect handling
HTTP Authentication
DefaultHttpClient API

Categories