Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use own copy of PercentCodec for URI path encoding #1109

Merged
merged 6 commits into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ This section is for maintaining a changelog for all breaking changes for the cli
### Dependencies

### Changed
- Changed URL path encoding to own implementation adapted from Apache HTTP Client 5's ([#1109](https://github.com/opensearch-project/opensearch-java/pull/1109))

### Deprecated

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,6 @@ public static RuntimeException noPathTemplateFound(String what) {
}

public static void pathEncode(String src, StringBuilder dest) {
dest.append(PathEncoder.encode(src));
PathEncoder.encode(dest, src);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,59 +8,37 @@

package org.opensearch.client.util;

/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Collections;

public class PathEncoder {
private final static String HTTP_CLIENT4_UTILS_CLASS = "org.apache.http.client.utils.URLEncodedUtils";
private final static String HTTP_CLIENT5_UTILS_CLASS = "org.apache.hc.core5.net.URLEncodedUtils";
private final static MethodHandle FORMAT_SEGMENTS_MH;

static {
Class<?> clazz = null;
try {
// Try Apache HttpClient5 first since this is a default one
clazz = Class.forName(HTTP_CLIENT5_UTILS_CLASS);
} catch (final ClassNotFoundException ex) {
try {
// Fallback to Apache HttpClient4
clazz = Class.forName(HTTP_CLIENT4_UTILS_CLASS);
} catch (final ClassNotFoundException ex1) {
clazz = null;
}
/**
* Percent encoding codec that matches Apache HTTP Client 4's path segment encoding.
*/
@Deprecated
public static final PercentCodec APACHE_HTTP_CLIENT_4_EQUIV_CODEC = PercentCodec.RFC3986_PATHSAFE;
/**
* Percent encoding codec that matches Apache HTTP Client 5's path segment encoding.
*/
public static final PercentCodec APACHE_HTTP_CLIENT_5_EQUIV_CODEC = PercentCodec.RFC3986_UNRESERVED;

public static final PercentCodec DEFAULT_CODEC = APACHE_HTTP_CLIENT_5_EQUIV_CODEC;

private static PercentCodec codec;
Copy link
Collaborator

@reta reta Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we leave an option to change the value from anywhere:

Suggested change
private static PercentCodec codec;
private static volatile PercentCodec codec = DEFAULT_CODEC;


public static PercentCodec getCodec() {
if (codec == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just initialize the codec with DEFAULT_CODEC value

codec = DEFAULT_CODEC;
}
return codec;
}

if (clazz == null) {
throw new IllegalStateException(
"Either '" + HTTP_CLIENT5_UTILS_CLASS + "' or '" + HTTP_CLIENT4_UTILS_CLASS + "' is required by not found on classpath"
);
}
public static void setCodec(PercentCodec codec) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it package private for now

Suggested change
public static void setCodec(PercentCodec codec) {
static void setCodec(PercentCodec codec) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to let users call it for backwards compat?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to let users call it for backwards compat?

May be not directly? As part of client transport builder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta The problem with calling it as part of the transport builder is you're then hiding the static/global nature of it behind an instance creation. It's perfectly reasonable that in my app I construct two different clients using different transports and could unwittingly override the setting. Which ideally it would be completely tied to the transport instance, but that's not backportable to 2.x.

Copy link
Collaborator

@reta reta Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta The problem with calling it as part of the transport builder is you're then hiding the static/global nature of it behind an instance creation.

Thanks @Xtansia , I understand that. Consider the alternative - asking users to call this method directly. Personally, I don't think we are giving the right tool out - I am having hard time even to explain in documentation what are the consequences of letting to change the path encoder arbitrarily, from anywhere, anytime.

It's perfectly reasonable that in my app I construct two different clients using different transports and could unwittingly override the setting.

We have all the instruments to prevent that from happening: fe, use set-once semantics.

Copy link
Member

@dblock dblock Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta The problem with calling it as part of the transport builder is you're then hiding the static/global nature of it behind an instance creation.

It sounds like we all agree that the ultimate API to recommend users is a builder like so.

builder = ApacheHttpClient5TransportBuilder.builder(hosts)
  .encoder(PercentCodec.RFC3986_UNRESERVED)
  .decoder(PercentCodec.RFC3986_UNRESERVED)
  .build()

If we implement this then we don't need a global public setting, do we? It's a convenience that causes a lot of side-effects, maybe we better force users to explicitly set it for every client they create?

If you feel strongly about it, I am fine with both PathEncoder.default(Encoding.HTTP_CLIENT_V5_EQUIV) global default and an env setting, but I don't know why we'd want to add yet another level of indirection or even the global default out of "convenience" for users which will end up actually having side effects in their entire codebase.

I do think that just having a setting org.opensearch.path.encoding is not sufficient, I'd want a programmatic way of calling PathEncoder.default(...) - users reporting the bug they are experiencing originally probably want to add a line of code to their apps, not a setting (they might not have any).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement this then we don't need a global public setting, do we? It's a convenience that causes a lot of side-effects, maybe we better force users to explicitly set it for every client they create?

That would be ideal but the problem comes from usage of the path encoder: it is referenced statically from (literally) every single request class: SimpleEndpoint.pathEncode(...);. That's the showstopper that @Xtansia tries to solve with static PathEncoder configuration .

So if we do follow the builder pattern

builder = ApacheHttpClient5TransportBuilder.builder(hosts)
  .pathEncoder(PercentCodec.RFC3986_UNRESERVED)
  .build()

The pathEncoder would set the static path encoder behind the scene (and I think this is 100% fine). I hardly imagine the case when users would want clients with same JVM / app but different path encoders.

PS: I don't think we need decoder for path but that's just a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand. I am strongly opposed to an instance .pathEncoder that sets a global default. That's definitely not expected! We should either fix the static calls or I'm fine with a global setting.

Copy link
Collaborator

@reta reta Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely not expected! We should either fix the static calls or I'm fine with a global setting.

My mental model breaks for static call (sorry), but global setting (system property) is looking acceptable, thanks @Xtansia

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm good with this.

PathEncoder.codec = codec;
}

try {
FORMAT_SEGMENTS_MH = MethodHandles.lookup()
.findStatic(clazz, "formatSegments", MethodType.methodType(String.class, Iterable.class, Charset.class));
} catch (final NoSuchMethodException | IllegalAccessException ex) {
throw new IllegalStateException("Unable to find 'formatSegments' method in " + clazz + " class");
}
public static String encode(String pathSegment) {
return getCodec().encode(pathSegment);
}

public static String encode(String uri) {
try {
return ((String) FORMAT_SEGMENTS_MH.invoke(Collections.singletonList(uri), StandardCharsets.UTF_8)).substring(1);
} catch (final Throwable ex) {
throw new RuntimeException("Unable to encode URI: " + uri, ex);
}
public static void encode(StringBuilder dest, CharSequence pathSegment) {
getCodec().encode(dest, pathSegment);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.client.util;

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.BitSet;

/**
* Percent-encoding.
* <p>
* Adapted from Apache HttpComponents HttpCore v5's <a href="https://github.com/apache/httpcomponents-core/blob/e009a923eefe79cf3593efbb0c18a3525ae63669/httpcore5/src/main/java/org/apache/hc/core5/net/PercentCodec.java">PercentCodec.java</a>
* </p>
*/
public class PercentCodec {
private static class Chars {
private final BitSet set = new BitSet(256);

public void add(char... chars) {
for (char c : chars) {
set.set(c);
}
}

public void addRange(char start, char end) {
set.set(start, end + 1);
}

public void add(Chars set) {
this.set.or(set.set);
}

public boolean contains(int c) {
return set.get(c);
}
}

private static final Chars RFC3986_GEN_DELIMS_CHARS = new Chars() {
{
add(':', '/', '?', '#', '[', ']', '@');
}
};
private static final Chars RFC3986_SUB_DELIMS_CHARS = new Chars() {
{
add('!', '$', '&', '\'', '(', ')', '*', '+', ',', ';', '=');
}
};
private static final Chars RFC3986_UNRESERVED_CHARS = new Chars() {
{
addRange('a', 'z');
addRange('A', 'Z');
addRange('0', '9');
add('-', '.', '_', '~');
}
};
private static final Chars RFC3986_PATH_NO_COLON_CHARS = new Chars() {
{
add(RFC3986_UNRESERVED_CHARS);
add(RFC3986_SUB_DELIMS_CHARS);
add('@');
}
};
private static final Chars RFC3986_PATH_CHARS = new Chars() {
{
add(RFC3986_PATH_NO_COLON_CHARS);
add(':');
}
};
private static final Chars RFC3986_URIC_CHARS = new Chars() {
{
add(RFC3986_SUB_DELIMS_CHARS);
add(RFC3986_UNRESERVED_CHARS);
}
};

private static final Chars RFC5987_UNRESERVED_CHARS = new Chars() {
{
addRange('a', 'z');
addRange('A', 'Z');
addRange('0', '9');
// Additional characters as per RFC 5987 attr-char
add('!', '#', '$', '&', '+', '-', '.', '^', '_', '`', '|', '~');
}
};

private static final int RADIX = 16;

private static void encode(
final StringBuilder buf,
final CharSequence content,
final Charset charset,
final Chars safeChars,
final boolean blankAsPlus
) {
if (content == null) {
return;
}
final CharBuffer cb = CharBuffer.wrap(content);
final ByteBuffer bb = (charset != null ? charset : StandardCharsets.UTF_8).encode(cb);
while (bb.hasRemaining()) {
final int b = bb.get() & 0xff;
if (safeChars.contains(b)) {
buf.append((char) b);
} else if (blankAsPlus && b == ' ') {
buf.append("+");
} else {
buf.append("%");
final char hex1 = Character.toUpperCase(Character.forDigit((b >> 4) & 0xF, RADIX));
final char hex2 = Character.toUpperCase(Character.forDigit(b & 0xF, RADIX));
buf.append(hex1);
buf.append(hex2);
}
}
}

private static String decode(final CharSequence content, final Charset charset, final boolean plusAsBlank) {
if (content == null) {
return null;
}
final ByteBuffer bb = ByteBuffer.allocate(content.length());
final CharBuffer cb = CharBuffer.wrap(content);
while (cb.hasRemaining()) {
final char c = cb.get();
if (c == '%' && cb.remaining() >= 2) {
final char uc = cb.get();
final char lc = cb.get();
final int u = Character.digit(uc, RADIX);
final int l = Character.digit(lc, RADIX);
if (u != -1 && l != -1) {
bb.put((byte) ((u << 4) + l));
} else {
bb.put((byte) '%');
bb.put((byte) uc);
bb.put((byte) lc);
}
} else if (plusAsBlank && c == '+') {
bb.put((byte) ' ');
} else {
bb.put((byte) c);
}
}
bb.flip();
return (charset != null ? charset : StandardCharsets.UTF_8).decode(bb).toString();
}

public static final PercentCodec RFC3986_UNRESERVED = new PercentCodec(RFC3986_UNRESERVED_CHARS);
public static final PercentCodec RFC3986_PATHSAFE = new PercentCodec(RFC3986_PATH_CHARS);
public static final PercentCodec RFC5987_UNRESERVED = new PercentCodec(RFC5987_UNRESERVED_CHARS);

private final Chars unreserved;

private PercentCodec(final Chars unreserved) {
this.unreserved = unreserved;
}

public void encode(final StringBuilder buf, final CharSequence content) {
encode(buf, content, StandardCharsets.UTF_8, unreserved, false);
}

public String encode(final CharSequence content) {
if (content == null) {
return null;
}
final StringBuilder buf = new StringBuilder();
encode(buf, content, StandardCharsets.UTF_8, unreserved, false);
return buf.toString();
}

public String decode(final CharSequence content) {
return decode(content, StandardCharsets.UTF_8, false);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -57,19 +57,10 @@ public void testArrayPathParameter() {
assertEquals("/a/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));

req = RefreshRequest.of(b -> b.index("a", "b"));
if (isHttpClient5Present()) {
assertEquals("/a%2Cb/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));

} else {
assertEquals("/a,b/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));
}
assertEquals("/a%2Cb/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));

req = RefreshRequest.of(b -> b.index("a", "b", "c"));
if (isHttpClient5Present()) {
assertEquals("/a%2Cb%2Cc/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));
} else {
assertEquals("/a,b,c/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));
}
assertEquals("/a%2Cb%2Cc/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));
}

@Test
Expand All @@ -80,11 +71,7 @@ public void testPathEncoding() {
assertEquals("/a%2Fb/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));

req = RefreshRequest.of(b -> b.index("a/b", "c/d"));
if (isHttpClient5Present()) {
assertEquals("/a%2Fb%2Cc%2Fd/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));
} else {
assertEquals("/a%2Fb,c%2Fd/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));
}
assertEquals("/a%2Fb%2Cc%2Fd/_refresh", RefreshRequest._ENDPOINT.requestUrl(req));

}

Expand All @@ -103,13 +90,4 @@ public void testArrayQueryParameter() {
req = RefreshRequest.of(b -> b.expandWildcards(ExpandWildcard.All, ExpandWildcard.Closed));
assertEquals("all,closed", RefreshRequest._ENDPOINT.queryParameters(req).get("expand_wildcards"));
}

private static boolean isHttpClient5Present() {
try {
Class.forName("org.apache.hc.core5.net.URLEncodedUtils");
return true;
} catch (ClassNotFoundException e) {
return false;
}
}
}

This file was deleted.

Loading
Loading