summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorAlejandro Colomar <alx@nginx.com>2023-03-31 14:08:41 +0200
committerAlejandro Colomar <alx@nginx.com>2023-04-12 11:50:56 +0200
commitfcff55acb61be0b9818075e8fb9d3cc80dd97b31 (patch)
tree25cc758ebe0643b278ee032039bd368625c5609c
parentbfd9a0a30a856aa3ac85d6d722fff41a8645a322 (diff)
downloadunit-fcff55acb61be0b9818075e8fb9d3cc80dd97b31.tar.gz
unit-fcff55acb61be0b9818075e8fb9d3cc80dd97b31.tar.bz2
HTTP: optimizing $request_line.
Don't reconstruct a new string for the $request_line from the parsed method, target, and HTTP version, but rather keep a pointer to the original memory where the request line was received. This will be necessary for implementing URI rewrites, since we want to log the original request line, and not one constructed from the rewritten target. This implementation changes behavior (only for invalid requests) in the following way: Previous behavior was to log as many tokens from the request line as were parsed validly, thus: Request -> access log ; error log "GET / HTTP/1.1" -> "GET / HTTP/1.1" OK ; = "GET / HTTP/1.1" -> "GET / HTTP/1.1" [1] ; = "GET / HTTP/2.1" -> "GET / HTTP/2.1" OK ; = "GET / HTTP/1." -> "GET / HTTP/1." [2] ; "GET / HTTP/1. [null]" "GET / food" -> "GET / food" [2] ; "GET / food [null]" "GET / / HTTP/1.1" -> "GET / / HTTP/1.1" [2] ; = "GET / / HTTP/1.1" -> "GET / / HTTP/1.1" [2] ; = "GET food HTTP/1.1" -> "GET" ; "GET [null] [null]" "OPTIONS * HTTP/1.1" -> "OPTIONS" [3] ; "OPTIONS [null] [null]" "FOOBAR baz HTTP/1.1"-> "FOOBAR" ; "FOOBAR [null] [null]" "FOOBAR / HTTP/1.1" -> "FOOBAR / HTTP/1.1" ; = "get / HTTP/1.1" -> "-" ; " [null] [null]" "" -> "-" ; " [null] [null]" This behavior was rather inconsistent. We have several options to go forward with this patch: - NGINX behavior. Log the entire request line, up to '\r' | '\n', even if it was invalid. This is the most informative alternative. However, RFC-complying requests will probably not send invalid requests. This information would be interesting to users where debugging requests constructed manually via netcat(1) or a similar tool, or maybe for debugging a client, are important. It might be interesting to support this in the future if our users are interested; for now, since this approach requires looping over invalid requests twice, that's an overhead that we better avoid. - Previous Unit behavior This is relatively fast (almost as fast as the next alternative, the one we chose), but the implementation is ugly, in that we need to perform the same operation in many places around the code. If we want performance, probably the next alternative is better; if we want to be informative, then the first one is better (maybe in combination with the third one too). - Chosen behavior Only logging request lines when the request is valid. For any invalid request, or even unsupported ones, the request line will be logged as "-". Thus: Request -> access log [4] "GET / HTTP/1.1" -> "GET / HTTP/1.1" OK "GET / HTTP/1.1" -> "GET / HTTP/1.1" [1] "GET / HTTP/2.1" -> "-" [3] "GET / HTTP/1." -> "-" "GET / food" -> "-" "GET / / HTTP/1.1" -> "GET / / HTTP/1.1" [2] "GET / / HTTP/1.1" -> "GET / / HTTP/1.1" [2] "GET food HTTP/1.1" -> "-" "OPTIONS * HTTP/1.1" -> "-" "FOOBAR baz HTTP/1.1"-> "-" "FOOBAR / HTTP/1.1" -> "FOOBAR / HTTP/1.1" "get / HTTP/1.1" -> "-" "" -> "-" This is less informative than previous behavior, but considering how inconsistent it was, and that RFC-complying agents will probably not send us such requests, we're ready to lose that information in the log. This is of course the fastest and simplest implementation we can get. We've chosen to implement this alternative in this patch. Since we modified the behavior, this patch also changes the affected tests. [1]: Multiple successive spaces as a token delimiter is allowed by the RFC, but it is discouraged, and considered a security risk. It is currently supported by Unit, but we will probably drop support for it in the future. [2]: Unit currently supports spaces in the request-target. This is a violation of the relevant RFC (linked below), and will be fixed in the future, and consider those targets as invalid, returning a 400 (Bad Request), and thus the log lines with the previous inconsistent behavior would be changed. [3]: Not yet supported. [4]: In the error log, regarding the "log_routes" conditional logging of the request line, we only need to log the request line if it was valid. It doesn't make sense to log "" or "-" in case that the request was invalid, since this is only useful for understanding decisions of the router. In this case, the access log is more appropriate, which shows that the request was invalid, and a 400 was returned. When the request line is valid, it is printed in the error log exactly as in the access log. Link: <https://datatracker.ietf.org/doc/html/rfc9112#section-3> Suggested-by: Liam Crilly <liam@nginx.com> Reviewed-by: Zhidao Hong <z.hong@f5.com> Cc: Timo Stark <t.stark@nginx.com> Cc: Andrei Zeliankou <zelenkov@nginx.com> Cc: Andrew Clayton <a.clayton@nginx.com> Cc: Artem Konev <a.konev@f5.com> Signed-off-by: Alejandro Colomar <alx@nginx.com>
Diffstat (limited to '')
-rw-r--r--src/nxt_h1proto.c14
-rw-r--r--src/nxt_http.h1
-rw-r--r--src/nxt_http_parse.c14
-rw-r--r--src/nxt_http_parse.h1
-rw-r--r--src/nxt_http_variables.c31
-rw-r--r--test/test_access_log.py8
6 files changed, 24 insertions, 45 deletions
diff --git a/src/nxt_h1proto.c b/src/nxt_h1proto.c
index 40b10af0..df1f82f9 100644
--- a/src/nxt_h1proto.c
+++ b/src/nxt_h1proto.c
@@ -577,6 +577,15 @@ nxt_h1p_conn_request_header_parse(nxt_task_t *task, void *obj, void *data)
*/
h1p->keepalive = (h1p->parser.version.s.minor != '0');
+ r->request_line.start = h1p->parser.method.start;
+ r->request_line.length = h1p->parser.request_line_end
+ - r->request_line.start;
+
+ if (nxt_slow_path(r->log_route)) {
+ nxt_log(task, NXT_LOG_NOTICE, "http request line \"%V\"",
+ &r->request_line);
+ }
+
ret = nxt_h1p_header_process(task, h1p, r);
if (nxt_fast_path(ret == NXT_OK)) {
@@ -654,11 +663,6 @@ nxt_h1p_header_process(nxt_task_t *task, nxt_h1proto_t *h1p,
r->path = &h1p->parser.path;
r->args = &h1p->parser.args;
- if (nxt_slow_path(r->log_route)) {
- nxt_log(task, NXT_LOG_NOTICE, "http request line \"%V %V %V\"",
- r->method, &r->target, &r->version);
- }
-
r->fields = h1p->parser.fields;
ret = nxt_http_fields_process(r->fields, &nxt_h1p_fields_hash, r);
diff --git a/src/nxt_http.h b/src/nxt_http.h
index ffd3f601..d7bbaf02 100644
--- a/src/nxt_http.h
+++ b/src/nxt_http.h
@@ -144,6 +144,7 @@ struct nxt_http_request_s {
nxt_str_t host;
nxt_str_t server_name;
+ nxt_str_t request_line;
nxt_str_t target;
nxt_str_t version;
nxt_str_t *method;
diff --git a/src/nxt_http_parse.c b/src/nxt_http_parse.c
index f39d8f67..439993df 100644
--- a/src/nxt_http_parse.c
+++ b/src/nxt_http_parse.c
@@ -417,23 +417,25 @@ space_after_target:
{
rp->version.ui64 = ver.ui64;
- if (nxt_fast_path(p[9] == '\r')) {
- p += 10;
+ p += 9;
+ if (nxt_fast_path(*p == '\r')) {
- if (nxt_slow_path(p == end)) {
+ if (nxt_slow_path(p + 1 == end)) {
return NXT_AGAIN;
}
- if (nxt_slow_path(*p != '\n')) {
+ if (nxt_slow_path(p[1] != '\n')) {
return NXT_HTTP_PARSE_INVALID;
}
- *pos = p + 1;
+ *pos = p + 2;
} else {
- *pos = p + 10;
+ *pos = p + 1;
}
+ rp->request_line_end = p;
+
if (rp->complex_target != 0
#if 0
|| rp->quoted_target != 0
diff --git a/src/nxt_http_parse.h b/src/nxt_http_parse.h
index 2b714464..88b10675 100644
--- a/src/nxt_http_parse.h
+++ b/src/nxt_http_parse.h
@@ -41,6 +41,7 @@ struct nxt_http_request_parse_s {
u_char *target_start;
u_char *target_end;
+ u_char *request_line_end;
nxt_str_t path;
nxt_str_t args;
diff --git a/src/nxt_http_variables.c b/src/nxt_http_variables.c
index fa0244db..b73d9151 100644
--- a/src/nxt_http_variables.c
+++ b/src/nxt_http_variables.c
@@ -273,40 +273,11 @@ static nxt_int_t
nxt_http_var_request_line(nxt_task_t *task, nxt_str_t *str, void *ctx,
uint16_t field)
{
- size_t length;
- u_char *p, *start;
nxt_http_request_t *r;
r = ctx;
- length = r->method->length + 1 + r->target.length + 1 + r->version.length;
-
- start = nxt_mp_nget(r->mem_pool, length);
- if (nxt_slow_path(start == NULL)) {
- return NXT_ERROR;
- }
-
- p = start;
-
- if (r->method->length != 0) {
- p = nxt_cpymem(p, r->method->start, r->method->length);
-
- if (r->target.length != 0) {
- *p++ = ' ';
- p = nxt_cpymem(p, r->target.start, r->target.length);
-
- if (r->version.length != 0) {
- *p++ = ' ';
- p = nxt_cpymem(p, r->version.start, r->version.length);
- }
- }
-
- } else {
- *p++ = '-';
- }
-
- str->start = start;
- str->length = p - start;
+ *str = r->request_line;
return NXT_OK;
}
diff --git a/test/test_access_log.py b/test/test_access_log.py
index ffaae0b1..c29638a3 100644
--- a/test/test_access_log.py
+++ b/test/test_access_log.py
@@ -174,7 +174,7 @@ Connection: close
time.sleep(1)
assert (
- self.wait_for_record(r'"GE" 400 0 "-" "-"') is not None
+ self.wait_for_record(r'"-" 400 0 "-" "-"') is not None
), 'partial'
def test_access_log_partial_2(self):
@@ -185,7 +185,7 @@ Connection: close
self.http(b"""GET /\n""", raw=True)
assert (
- self.wait_for_record(r'"GET /" 400 \d+ "-" "-"') is not None
+ self.wait_for_record(r'"-" 400 \d+ "-" "-"') is not None
), 'partial 2'
def test_access_log_partial_3(self):
@@ -198,7 +198,7 @@ Connection: close
time.sleep(1)
assert (
- self.wait_for_record(r'"GET /" 400 0 "-" "-"') is not None
+ self.wait_for_record(r'"-" 400 0 "-" "-"') is not None
), 'partial 3'
def test_access_log_partial_4(self):
@@ -211,7 +211,7 @@ Connection: close
time.sleep(1)
assert (
- self.wait_for_record(r'"GET / HTTP/1.1" 400 0 "-" "-"') is not None
+ self.wait_for_record(r'"-" 400 0 "-" "-"') is not None
), 'partial 4'
@pytest.mark.skip('not yet')