Skip to content

Breaking change with string.IndexOf(string) from .NET Core 3.0 -> .NET 5.0 #43736

@jbogard

Description

@jbogard
Contributor

Description

I'm extending a package to support .NET 5.0 and ran into a breaking change. Given the console application:

using System;

namespace ConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            var actual = "Detail of supported commands\n============\n## Documentation produced for DelegateDecompiler, version 0.28.0 on Thursday, 22 October 2020 16:03\n\r\nThis file documents what linq commands **DelegateDecompiler** supports when\r\nworking with [Entity Framework Core](https://docs.microsoft.com/en-us/ef/core/) (EF).\r\nEF has one of the best implementations for converting Linq `IQueryable<>` commands into database\r\naccess commands, in EF's case T-SQL. Therefore it is a good candidate for using in our tests.\r\n\r\nThis documentation was produced by compaired direct EF Linq queries against the same query implemented\r\nas a DelegateDecompiler's `Computed` properties. This produces a Supported/Not Supported flag\r\non each command type tested. Tests are groups and ordered to try and make finding things\r\neasier.\r\n\r\nSo, if you want to use DelegateDecompiler and are not sure whether the linq command\r\nyou want to use will work then clone this project and write your own tests.\r\n(See [How to add a test](HowToAddMoreTests.md) documentation on how to do this). \r\nIf there is a problem then please fork the repository and add your own tests. \r\nThat will make it much easier to diagnose your issue.\r\n\r\n*Note: The test suite has only recently been set up and has only a handful of tests at the moment.\r\nMore will appear as we move forward.*\r\n\r\n\r\n### Group: Unit Test Group\n#### [My Unit Test1](../TestGroup01UnitTestGroup/Test01MyUnitTest1):\n- Supported\n  * Good1 (line 1)\n  * Good2 (line 2)\n\r\n#### [My Unit Test2](../TestGroup01UnitTestGroup/Test01MyUnitTest2):\n- Supported\n  * Good1 (line 1)\n  * Good2 (line 2)\n\r\n\r\n\nThe End\n";

            var expected = "\n#### [My Unit Test2](";

            Console.WriteLine($"actual.Contains(expected): {actual.Contains(expected)}");
            Console.WriteLine($"actual.IndexOf(expected): {actual.IndexOf(expected)}");
        }
    }
}

I get different results based on the runtime from .NET Core 3.0 -> .NET 5.0:

.NET Core 3.0:

actual.Contains(expected): True
actual.IndexOf(expected): 1475

.NET 5.0:

actual.Contains(expected): True
actual.IndexOf(expected): -1

Configuration

Windows 10 Pro Build 19041 x64
.NET Core 3.1.9
.NET 5.0.0-rc.2.20475.5

Regression?

Yes, it worked through .NET Core 3.1.9

Activity

danmoseley

danmoseley commented on Oct 22, 2020

@danmoseley
Member
added this to the 5.0.0 milestone on Oct 22, 2020
added
questionAnswer questions and provide assistance, not an issue with source code or documentation.
and removed
untriagedNew issue has not been triaged by the area owner
on Oct 22, 2020
ghost
tarekgh

tarekgh commented on Oct 22, 2020

@tarekgh
Member

This is by design as in .NET 5.0 we have switched using ICU instead of NLS. You can look at https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu for more details.

You have the option to use the config switch System.Globalization.UseNls to go back to old behavior but we don't recommend doing that as ICU is more correct and moving forward using ICU will give consistency across OS's.

tarekgh

tarekgh commented on Oct 22, 2020

@tarekgh
Member

forgot to say, if you want IndexOf behave as Contains, you should use Ordinal comparisons at that time.

actual.IndexOf(expected, StringComparison.Ordinal)
safern

safern commented on Oct 22, 2020

@safern
Member

You have the option to use the config switch System.Globalization.UseNls to go back to old behavior but we don't recommend doing that as ICU is more correct and moving forward using ICU will give consistency across OS's.

Yeah, if you run this code in Unix targeting netcoreapp3.1 you will see this same behavior:

 santifdezm  DESKTOP-1J7TFMI  ~  experimental  indexof  $   /home/santifdezm/experimental/indexof/bin/Debug/netcoreapp3.1/linux-x64/publish/indexof
actual.Contains(expected): True
actual.IndexOf(expected): -1

and as @tarekgh with Ordinal comparison it returns the expected result.

 santifdezm  DESKTOP-1J7TFMI  ~  experimental  indexof  $  /home/santifdezm/experimental/indexof/bin/Debug/net5.0/linux-x64/publish/indexof
actual.Contains(expected): True
actual.IndexOf(expected): 1475
ericstj

ericstj commented on Oct 22, 2020

@ericstj
Member

I think this is failing because the mix of \r\n and \n in the source string. If I replace all instances of \r\n with \n it works. Same is true if I make everything \r\n. It's just the mix that is resulting in different results from ICU's comparison.

GrabYourPitchforks

GrabYourPitchforks commented on Oct 22, 2020

@GrabYourPitchforks
Member

Issue as reported on Twitter was that within the 5.0 runtime, repeated calls to string.IndexOf with the same inputs was giving different results on each call.

https://twitter.com/jbogard/status/1319381273585061890?s=21

Edit: Above was a misunderstanding.

pinkli

pinkli commented on Oct 29, 2020

@pinkli

A change to honor grapheme clusters doesn't bother me, as long as the impact is well documented. A change that causes closely related members of the family of string methods to behave inconsistently from one another does.

Someone who is doing informal string munging, indifferent to the obscurities of characters, grapheme clusters, or locale, would take it as given that if str.Contains(whatever) succeeds, there is no need to inspect the result from str.IndexOf(whatever) because we were just told it is in there and therefore can be found. It doesn't matter which second parameter they don't care to know about is the default because defaulting is sure to behave the same across methods, freeing them from needing to study all the subtleties to use them at all.

Inconsistencies like this produce a language that can only be used successfully by experts and alienate the folks coming out of code camp. Don't raise the bar to entry this way.

Yes, this totally expressed my concerns. As a typical Chinese developer, we rarely put StringComparison or CultureInfo explicitly when calling string related methods in our application codes, and it just works. We defintely don't expect a different behaviour between IndexOf and Contains!
.net 5.0
image
.net core 3.1
image
.net framework
image

twinter-amosfivesix

twinter-amosfivesix commented on Oct 29, 2020

@twinter-amosfivesix

I agree with @lupestro. The inconsistence in method behavior is very concerning. If you are going to have long standing methods act both differently and inconsistently there will be much sadness.

Perhaps a key point here is that the two methods have always been inconsistent. They did not suddenly become inconsistent in .NET 5.0. If I'm following things correctly, IndexOf has always used current culture comparison, Contains has always used ordinal comparison. Granted .NET 5.0 adds more inconsistency. But the mistake here was in the original API design that allowed this inconsistency.

safern

safern commented on Oct 29, 2020

@safern
Member

If I'm following things correctly, IndexOf has always used ordinal comparison, Contains has always used current culture comparison. Granted .NET 5.0 adds more inconsistency. But the mistake here was in the original API design that allowed this inconsistency.

That is correct but it is the other way around, IndexOf(string) uses current culture, IndexOf(char) uses Ordinal and Contains uses ordinal.

GrabYourPitchforks

GrabYourPitchforks commented on Oct 29, 2020

@GrabYourPitchforks
Member

I'll elaborate briefly on the IndexOf vs. Contains differences that others alluded to recently.

IndexOf(string) has always assumed CurrentCulture comparison, and Contains(string) has always assumed Ordinal comparison. This discrepancy existed way back in .NET Framework. It is not a new discrepancy introduced in .NET 5. For example, on .NET Framework (which uses Windows's NLS facility), the ligature "æ" and the two-char string "ae" compare as equal under a linguistic comparer. This results in the following discrepancy:

// Sample on .NET Framework, showing Contains & IndexOf returning inconsistent results
Console.WriteLine("encyclopædia".Contains("ae")); // prints 'False'
Console.WriteLine("encyclopædie".IndexOf("ae")); // prints '8' (my machine is set to en-US)

This discrepancy has existed for over a decade. It's documented and there is guidance regarding it. Is it a misdesign? Perhaps. We're discussing over at #43956 ways to make the ecosystem healthier moving forward.

For this specific issue, what we're really asking ourselves is: "Given that the discrepancy has existed forever, how far apart are these two methods allowed to deviate in practice before it harms the larger ecosystem?" I think we're still trying to define where that threshold should be. Reports like this are extremely helpful because they give us insight into people use these APIs in practice and what expectations customers have regarding their behavior. As stewards of the framework, we need to consider not just the technical documentation, but also the manner in which real-world apps consume these APIs.

This comment isn't really intended to defend any particular point of view. My intent is to clarify some misconceptions that I've seen and to help explain how we've framed the problem.

ufcpp

ufcpp commented on Oct 30, 2020

@ufcpp
Contributor

Even if InvariantCulture specified, is it correct behavior that \n doesn't match in the ICU version?

image
image

ufcpp

ufcpp commented on Oct 30, 2020

@ufcpp
Contributor

Maybe, the following code displays 5 on Linux and -1 on Windows if we use git and its default settings (autocrlf=true)?

using System;

var s = @"hello
world";
Console.WriteLine(s.IndexOf("\n", StringComparison.InvariantCulture));

120 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-System.GlobalizationquestionAnswer questions and provide assistance, not an issue with source code or documentation.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @pr0vieh@ForNeVeR@jbogard@fschmied@svick

      Issue actions

        Breaking change with string.IndexOf(string) from .NET Core 3.0 -> .NET 5.0 · Issue #43736 · dotnet/runtime