Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ER] Line number for 'stream did not contain valid UTF-8' errors #76869

Closed
leonardo-m opened this issue Sep 18, 2020 · 6 comments · Fixed by #135557
Closed

[ER] Line number for 'stream did not contain valid UTF-8' errors #76869

leonardo-m opened this issue Sep 18, 2020 · 6 comments · Fixed by #135557
Labels
A-diagnostics Area: Messages for errors, warnings, and lints D-terse Diagnostics: An error or lint that doesn't give enough information about the problem at hand. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@leonardo-m
Copy link

If by mistake I write code as this:

fn main() {
    println!("Hello ’world");
}

All I get from rustc is:

error: couldn't read ...\test.rs: stream did not contain valid UTF-8

But I'd like to receive a line number where such invalidity happens.

Using: rustc 1.48.0-nightly f3c923a13 2020-09-17 x86_64-pc-windows-gnu
@tesuji
Copy link
Contributor

tesuji commented Sep 18, 2020

If rustc read source file with fs::read_to_string, that's maybe hard to report which line number contains invalid UTF-8.
Just my guess.

@jyn514 jyn514 added A-diagnostics Area: Messages for errors, warnings, and lints D-terse Diagnostics: An error or lint that doesn't give enough information about the problem at hand. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 18, 2020
@leonardo-m
Copy link
Author

To spot the problem I've used a Python script like:

for i, line in enumerate(open("mycode.rs")):
    if any(ord(c) >= 127 for c in line):
        print(i)

@SagnikHajra
Copy link

Also faced the same issue. I was using VS code.
>>> for i, line in enumerate(open("hello.rs")): ... print(line) ... ÿþfn main() {

No idea how did I end up with "ÿþ". Created a new file and it worked.

@Kevpopper
Copy link

I got same issue but i just changed .rs to .txt

@ms0503
Copy link

ms0503 commented Dec 24, 2024

"ÿþ" is U+00FF and U+00FE in Unicode.
These are byte order mark (BOM) in little-endian UTF-16.

If you write your source code in UTF-8 without BOM, it should work.

estebank added a commit to estebank/rust that referenced this issue Jan 15, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
estebank added a commit to estebank/rust that referenced this issue Jan 15, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
@estebank
Copy link
Contributor

#135557 will provide more information for these errors. For the case where the first byte of the "first" (entry) file is incorrect, we don't get a proper span to point at, so we customize the output to include the byte position in the message itself. It's not ideal, but it is better than what we had. I also encountered multiple places in our test suite that expect rust files to be valid utf-8 or crash 😄

estebank added a commit to estebank/rust that referenced this issue Jan 18, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
estebank added a commit to estebank/rust that referenced this issue Jan 18, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
estebank added a commit to estebank/rust that referenced this issue Jan 22, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

CC rust-lang#76869, rust-lang#135557.
jieyouxu added a commit to jieyouxu/rust that referenced this issue Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
@bors bors closed this as completed in 57dd42d Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints D-terse Diagnostics: An error or lint that doesn't give enough information about the problem at hand. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants