-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use own copy of PercentCodec for URI path encoding #1109
Changes from 3 commits
b5901c9
7fcb24d
2820ff8
75f9059
8430323
f436a3d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -8,59 +8,37 @@ | |||||
|
||||||
package org.opensearch.client.util; | ||||||
|
||||||
/* | ||||||
* SPDX-License-Identifier: Apache-2.0 | ||||||
* | ||||||
* The OpenSearch Contributors require contributions made to | ||||||
* this file be licensed under the Apache-2.0 license or a | ||||||
* compatible open source license. | ||||||
*/ | ||||||
|
||||||
import java.lang.invoke.MethodHandle; | ||||||
import java.lang.invoke.MethodHandles; | ||||||
import java.lang.invoke.MethodType; | ||||||
import java.nio.charset.Charset; | ||||||
import java.nio.charset.StandardCharsets; | ||||||
import java.util.Collections; | ||||||
|
||||||
public class PathEncoder { | ||||||
private final static String HTTP_CLIENT4_UTILS_CLASS = "org.apache.http.client.utils.URLEncodedUtils"; | ||||||
private final static String HTTP_CLIENT5_UTILS_CLASS = "org.apache.hc.core5.net.URLEncodedUtils"; | ||||||
private final static MethodHandle FORMAT_SEGMENTS_MH; | ||||||
|
||||||
static { | ||||||
Class<?> clazz = null; | ||||||
try { | ||||||
// Try Apache HttpClient5 first since this is a default one | ||||||
clazz = Class.forName(HTTP_CLIENT5_UTILS_CLASS); | ||||||
} catch (final ClassNotFoundException ex) { | ||||||
try { | ||||||
// Fallback to Apache HttpClient4 | ||||||
clazz = Class.forName(HTTP_CLIENT4_UTILS_CLASS); | ||||||
} catch (final ClassNotFoundException ex1) { | ||||||
clazz = null; | ||||||
} | ||||||
/** | ||||||
* Percent encoding codec that matches Apache HTTP Client 4's path segment encoding. | ||||||
*/ | ||||||
@Deprecated | ||||||
public static final PercentCodec APACHE_HTTP_CLIENT_4_EQUIV_CODEC = PercentCodec.RFC3986_PATHSAFE; | ||||||
/** | ||||||
* Percent encoding codec that matches Apache HTTP Client 5's path segment encoding. | ||||||
*/ | ||||||
public static final PercentCodec APACHE_HTTP_CLIENT_5_EQUIV_CODEC = PercentCodec.RFC3986_UNRESERVED; | ||||||
|
||||||
public static final PercentCodec DEFAULT_CODEC = APACHE_HTTP_CLIENT_5_EQUIV_CODEC; | ||||||
|
||||||
private static PercentCodec codec; | ||||||
|
||||||
public static PercentCodec getCodec() { | ||||||
if (codec == null) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could just initialize the codec with |
||||||
codec = DEFAULT_CODEC; | ||||||
} | ||||||
return codec; | ||||||
} | ||||||
|
||||||
if (clazz == null) { | ||||||
throw new IllegalStateException( | ||||||
"Either '" + HTTP_CLIENT5_UTILS_CLASS + "' or '" + HTTP_CLIENT4_UTILS_CLASS + "' is required by not found on classpath" | ||||||
); | ||||||
} | ||||||
public static void setCodec(PercentCodec codec) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please make it package private for now
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't we want to let users call it for backwards compat? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
May be not directly? As part of client transport builder? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @reta The problem with calling it as part of the transport builder is you're then hiding the static/global nature of it behind an instance creation. It's perfectly reasonable that in my app I construct two different clients using different transports and could unwittingly override the setting. Which ideally it would be completely tied to the transport instance, but that's not backportable to 2.x. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thanks @Xtansia , I understand that. Consider the alternative - asking users to call this method directly. Personally, I don't think we are giving the right tool out - I am having hard time even to explain in documentation what are the consequences of letting to change the path encoder arbitrarily, from anywhere, anytime.
We have all the instruments to prevent that from happening: fe, use set-once semantics. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It sounds like we all agree that the ultimate API to recommend users is a builder like so.
If we implement this then we don't need a global public setting, do we? It's a convenience that causes a lot of side-effects, maybe we better force users to explicitly set it for every client they create? If you feel strongly about it, I am fine with both I do think that just having a setting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That would be ideal but the problem comes from usage of the path encoder: it is referenced statically from (literally) every single request class: So if we do follow the builder pattern
The PS: I don't think we need decoder for path but that's just a comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I understand. I am strongly opposed to an instance There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
My mental model breaks for static call (sorry), but global setting (system property) is looking acceptable, thanks @Xtansia There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I'm good with this. |
||||||
PathEncoder.codec = codec; | ||||||
} | ||||||
|
||||||
try { | ||||||
FORMAT_SEGMENTS_MH = MethodHandles.lookup() | ||||||
.findStatic(clazz, "formatSegments", MethodType.methodType(String.class, Iterable.class, Charset.class)); | ||||||
} catch (final NoSuchMethodException | IllegalAccessException ex) { | ||||||
throw new IllegalStateException("Unable to find 'formatSegments' method in " + clazz + " class"); | ||||||
} | ||||||
public static String encode(String pathSegment) { | ||||||
return getCodec().encode(pathSegment); | ||||||
} | ||||||
|
||||||
public static String encode(String uri) { | ||||||
try { | ||||||
return ((String) FORMAT_SEGMENTS_MH.invoke(Collections.singletonList(uri), StandardCharsets.UTF_8)).substring(1); | ||||||
} catch (final Throwable ex) { | ||||||
throw new RuntimeException("Unable to encode URI: " + uri, ex); | ||||||
} | ||||||
public static void encode(StringBuilder dest, CharSequence pathSegment) { | ||||||
getCodec().encode(dest, pathSegment); | ||||||
} | ||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
/* | ||
* SPDX-License-Identifier: Apache-2.0 | ||
* | ||
* The OpenSearch Contributors require contributions made to | ||
* this file be licensed under the Apache-2.0 license or a | ||
* compatible open source license. | ||
*/ | ||
|
||
package org.opensearch.client.util; | ||
|
||
import java.nio.ByteBuffer; | ||
import java.nio.CharBuffer; | ||
import java.nio.charset.Charset; | ||
import java.nio.charset.StandardCharsets; | ||
import java.util.BitSet; | ||
|
||
/** | ||
* Percent-encoding. | ||
* <p> | ||
* Adapted from Apache HttpComponents HttpCore v5's <a href="https://github.com/apache/httpcomponents-core/blob/e009a923eefe79cf3593efbb0c18a3525ae63669/httpcore5/src/main/java/org/apache/hc/core5/net/PercentCodec.java">PercentCodec.java</a> | ||
* </p> | ||
*/ | ||
public class PercentCodec { | ||
private static class Chars { | ||
private final BitSet set = new BitSet(256); | ||
|
||
public void add(char... chars) { | ||
for (char c : chars) { | ||
set.set(c); | ||
} | ||
} | ||
|
||
public void addRange(char start, char end) { | ||
set.set(start, end + 1); | ||
} | ||
|
||
public void add(Chars set) { | ||
this.set.or(set.set); | ||
} | ||
|
||
public boolean contains(int c) { | ||
return set.get(c); | ||
} | ||
} | ||
|
||
private static final Chars RFC3986_GEN_DELIMS_CHARS = new Chars() { | ||
{ | ||
add(':', '/', '?', '#', '[', ']', '@'); | ||
} | ||
}; | ||
private static final Chars RFC3986_SUB_DELIMS_CHARS = new Chars() { | ||
{ | ||
add('!', '$', '&', '\'', '(', ')', '*', '+', ',', ';', '='); | ||
} | ||
}; | ||
private static final Chars RFC3986_UNRESERVED_CHARS = new Chars() { | ||
{ | ||
addRange('a', 'z'); | ||
addRange('A', 'Z'); | ||
addRange('0', '9'); | ||
add('-', '.', '_', '~'); | ||
} | ||
}; | ||
private static final Chars RFC3986_PATH_NO_COLON_CHARS = new Chars() { | ||
{ | ||
add(RFC3986_UNRESERVED_CHARS); | ||
add(RFC3986_SUB_DELIMS_CHARS); | ||
add('@'); | ||
} | ||
}; | ||
private static final Chars RFC3986_PATH_CHARS = new Chars() { | ||
{ | ||
add(RFC3986_PATH_NO_COLON_CHARS); | ||
add(':'); | ||
} | ||
}; | ||
private static final Chars RFC3986_URIC_CHARS = new Chars() { | ||
{ | ||
add(RFC3986_SUB_DELIMS_CHARS); | ||
add(RFC3986_UNRESERVED_CHARS); | ||
} | ||
}; | ||
|
||
private static final Chars RFC5987_UNRESERVED_CHARS = new Chars() { | ||
{ | ||
addRange('a', 'z'); | ||
addRange('A', 'Z'); | ||
addRange('0', '9'); | ||
// Additional characters as per RFC 5987 attr-char | ||
add('!', '#', '$', '&', '+', '-', '.', '^', '_', '`', '|', '~'); | ||
} | ||
}; | ||
|
||
private static final int RADIX = 16; | ||
|
||
private static void encode( | ||
final StringBuilder buf, | ||
final CharSequence content, | ||
final Charset charset, | ||
final Chars safeChars, | ||
final boolean blankAsPlus | ||
) { | ||
if (content == null) { | ||
return; | ||
} | ||
final CharBuffer cb = CharBuffer.wrap(content); | ||
final ByteBuffer bb = (charset != null ? charset : StandardCharsets.UTF_8).encode(cb); | ||
while (bb.hasRemaining()) { | ||
final int b = bb.get() & 0xff; | ||
if (safeChars.contains(b)) { | ||
buf.append((char) b); | ||
} else if (blankAsPlus && b == ' ') { | ||
buf.append("+"); | ||
} else { | ||
buf.append("%"); | ||
final char hex1 = Character.toUpperCase(Character.forDigit((b >> 4) & 0xF, RADIX)); | ||
final char hex2 = Character.toUpperCase(Character.forDigit(b & 0xF, RADIX)); | ||
buf.append(hex1); | ||
buf.append(hex2); | ||
} | ||
} | ||
} | ||
|
||
private static String decode(final CharSequence content, final Charset charset, final boolean plusAsBlank) { | ||
if (content == null) { | ||
return null; | ||
} | ||
final ByteBuffer bb = ByteBuffer.allocate(content.length()); | ||
final CharBuffer cb = CharBuffer.wrap(content); | ||
while (cb.hasRemaining()) { | ||
final char c = cb.get(); | ||
if (c == '%' && cb.remaining() >= 2) { | ||
final char uc = cb.get(); | ||
final char lc = cb.get(); | ||
final int u = Character.digit(uc, RADIX); | ||
final int l = Character.digit(lc, RADIX); | ||
if (u != -1 && l != -1) { | ||
bb.put((byte) ((u << 4) + l)); | ||
} else { | ||
bb.put((byte) '%'); | ||
bb.put((byte) uc); | ||
bb.put((byte) lc); | ||
} | ||
} else if (plusAsBlank && c == '+') { | ||
bb.put((byte) ' '); | ||
} else { | ||
bb.put((byte) c); | ||
} | ||
} | ||
bb.flip(); | ||
return (charset != null ? charset : StandardCharsets.UTF_8).decode(bb).toString(); | ||
} | ||
|
||
public static final PercentCodec RFC3986_UNRESERVED = new PercentCodec(RFC3986_UNRESERVED_CHARS); | ||
public static final PercentCodec RFC3986_PATHSAFE = new PercentCodec(RFC3986_PATH_CHARS); | ||
public static final PercentCodec RFC5987_UNRESERVED = new PercentCodec(RFC5987_UNRESERVED_CHARS); | ||
|
||
private final Chars unreserved; | ||
|
||
private PercentCodec(final Chars unreserved) { | ||
this.unreserved = unreserved; | ||
} | ||
|
||
public void encode(final StringBuilder buf, final CharSequence content) { | ||
encode(buf, content, StandardCharsets.UTF_8, unreserved, false); | ||
} | ||
|
||
public String encode(final CharSequence content) { | ||
if (content == null) { | ||
return null; | ||
} | ||
final StringBuilder buf = new StringBuilder(); | ||
encode(buf, content, StandardCharsets.UTF_8, unreserved, false); | ||
return buf.toString(); | ||
} | ||
|
||
public String decode(final CharSequence content) { | ||
return decode(content, StandardCharsets.UTF_8, false); | ||
} | ||
} |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we leave an option to change the value from anywhere: