Discussion:
RFR: 8058255: Native jbyte Atomic::cmpxchg for supported x86 platforms
Erik Österlund
2014-09-11 21:48:43 UTC
Permalink
Hi,

These changes aim at replacing the awkward old jbyte Atomic::cmpxchg implementation for all the supported x86 platforms. It previously emulated the behaviour of cmpxchgb using a loop of cmpxchgl and some dynamic alignment of the destination address.

This code is called by remembered sets to manipulate card entries.

The implementation has now been replaced with a bunch of assembly, appropriate for all platforms. Yes, for windows too.

Implementations include:
bsd x86/x86_64: inline asm
linux x86/x86_64: inline asm
solaris x86/x86_64: .il files
windows x86_64 without GNU source: stubGenerator and manual code emission and hence including new Assembler::cmpxchgb support
Windows x86 + x86_64 with GNU source: inline asm

Bug: https://bugs.openjdk.java.net/browse/JDK-8058255

Webrev: http://cr.openjdk.java.net/~jwilhelm/8058255/webrev/

Improvements can be made for other architectures can as well, but this should be a good start.

/Erik
David Holmes
2014-09-12 01:40:33 UTC
Permalink
Hi Erik,

Can we pause and give some more thought to a clean mechanism for
allowing a shared implementation if desired with the ability to override
if desired. I really do not like to see CPU specific ifdefs being added
to shared code. (And I would also not like to see all platforms being
forced to reimplement this natively).

I'm not saying we will find a simple solution, but it would be nice if
we could get a few folk to think about it before proceeding with the
ifdefs :)


Thanks,
David


On 12/09/2014 7:48 AM, Erik ?sterlund wrote:
> Hi,
>
> These changes aim at replacing the awkward old jbyte Atomic::cmpxchg implementation for all the supported x86 platforms. It previously emulated the behaviour of cmpxchgb using a loop of cmpxchgl and some dynamic alignment of the destination address.
>
> This code is called by remembered sets to manipulate card entries.
>
> The implementation has now been replaced with a bunch of assembly, appropriate for all platforms. Yes, for windows too.
>
> Implementations include:
> bsd x86/x86_64: inline asm
> linux x86/x86_64: inline asm
> solaris x86/x86_64: .il files
> windows x86_64 without GNU source: stubGenerator and manual code emission and hence including new Assembler::cmpxchgb support
> Windows x86 + x86_64 with GNU source: inline asm
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8058255
>
> Webrev: http://cr.openjdk.java.net/~jwilhelm/8058255/webrev/
>
> Improvements can be made for other architectures can as well, but this should be a good start.
>
> /Erik
>
Andrew Haley
2014-09-12 08:46:51 UTC
Permalink
On 12/09/14 02:40, David Holmes wrote:
> Can we pause and give some more thought to a clean mechanism for
> allowing a shared implementation if desired with the ability to override
> if desired. I really do not like to see CPU specific ifdefs being added
> to shared code. (And I would also not like to see all platforms being
> forced to reimplement this natively).

Indeed. Could we put the code for things like cmpxchg in an abstract
class and override them?

Also, this doesn't look right:

+ // Support fori jbyte Atomic::cmpxchg(jbyte exchange_value,
+ // volatile jbyte *dest,
+ // jbyte compare_value)
+ // An additional bool (os::is_MP()) is passed as the last argument.
+ .inline _Atomic_cmpxchg_byte,4
+ movb 8(%esp), %al // compare_value
+ movb 0(%esp), %cl // exchange_value
+ movl 4(%esp), %edx // dest
+ cmp $0, 12(%esp) // MP test
+ jne 1f
+ cmpxchgb %cl, (%edx)
+ jmp 2f
+1: lock
+ cmpxchgl %cl, (%edx)
+2:
+ .end

What is this cmpxchgl for?

Andrew.
Erik Österlund
2014-09-12 09:14:49 UTC
Permalink
Sure, if you want a cleaner more OOP fix then I'm fine with slowing down. I won't be able to respond until wednesday anyway. :(

And yes that cmpxhgl should be cmpxhgb thanks for spotting that!

/Erik

> On 12 sep 2014, at 09:47, "Andrew Haley" <aph at redhat.com> wrote:
>
>> On 12/09/14 02:40, David Holmes wrote:
>> Can we pause and give some more thought to a clean mechanism for
>> allowing a shared implementation if desired with the ability to override
>> if desired. I really do not like to see CPU specific ifdefs being added
>> to shared code. (And I would also not like to see all platforms being
>> forced to reimplement this natively).
>
> Indeed. Could we put the code for things like cmpxchg in an abstract
> class and override them?
>
> Also, this doesn't look right:
>
> + // Support fori jbyte Atomic::cmpxchg(jbyte exchange_value,
> + // volatile jbyte *dest,
> + // jbyte compare_value)
> + // An additional bool (os::is_MP()) is passed as the last argument.
> + .inline _Atomic_cmpxchg_byte,4
> + movb 8(%esp), %al // compare_value
> + movb 0(%esp), %cl // exchange_value
> + movl 4(%esp), %edx // dest
> + cmp $0, 12(%esp) // MP test
> + jne 1f
> + cmpxchgb %cl, (%edx)
> + jmp 2f
> +1: lock
> + cmpxchgl %cl, (%edx)
> +2:
> + .end
>
> What is this cmpxchgl for?
>
> Andrew.
Erik Österlund
2014-09-17 11:13:04 UTC
Permalink
I am back! Did you guys have time to do some thinking? I see three different solutions:

1. Good old class inheritance! Class Atomic is-a Atomic_YourArchHere is-a AtomicAbstract
Using the CRTP (Curiously Recurring Template Pattern) for C++, this could be done without a virtual call where we want inlining.

2. Similar except with the SFINAE idiom (Substitution Failure Is Not An Error) for C++, to pick the right overload based on statically determined constraints.
E.g. define if Atomic::has_general_byte_CAS and based on whether this is defined or not, pick the general or specific overload variant of the CAS member function.

3. Simply make the current CAS a normal function which is called from billions of new inline method definitions that we have to create for every single architecture.

What do we prefer here? Does anyone else have a better idea? Also, should I start a new thread or is it okay to post it here?

/Erik


On 12 Sep 2014, at 03:40, David Holmes <david.holmes at oracle.com> wrote:

> Hi Erik,
>
> Can we pause and give some more thought to a clean mechanism for allowing a shared implementation if desired with the ability to override if desired. I really do not like to see CPU specific ifdefs being added to shared code. (And I would also not like to see all platforms being forced to reimplement this natively).
>
> I'm not saying we will find a simple solution, but it would be nice if we could get a few folk to think about it before proceeding with the ifdefs :)
>
>
> Thanks,
> David
>
>
> On 12/09/2014 7:48 AM, Erik ?sterlund wrote:
>> Hi,
>>
>> These changes aim at replacing the awkward old jbyte Atomic::cmpxchg implementation for all the supported x86 platforms. It previously emulated the behaviour of cmpxchgb using a loop of cmpxchgl and some dynamic alignment of the destination address.
>>
>> This code is called by remembered sets to manipulate card entries.
>>
>> The implementation has now been replaced with a bunch of assembly, appropriate for all platforms. Yes, for windows too.
>>
>> Implementations include:
>> bsd x86/x86_64: inline asm
>> linux x86/x86_64: inline asm
>> solaris x86/x86_64: .il files
>> windows x86_64 without GNU source: stubGenerator and manual code emission and hence including new Assembler::cmpxchgb support
>> Windows x86 + x86_64 with GNU source: inline asm
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8058255
>>
>> Webrev: http://cr.openjdk.java.net/~jwilhelm/8058255/webrev/
>>
>> Improvements can be made for other architectures can as well, but this should be a good start.
>>
>> /Erik
>>
David Holmes
2014-09-17 12:28:03 UTC
Permalink
On 17/09/2014 9:13 PM, Erik ?sterlund wrote:
> I am back! Did you guys have time to do some thinking? I see three different solutions:
>
> 1. Good old class inheritance! Class Atomic is-a Atomic_YourArchHere is-a AtomicAbstract
> Using the CRTP (Curiously Recurring Template Pattern) for C++, this could be done without a virtual call where we want inlining.

I would prefer this approach (here and elsewhere) but it is not a
short-term option.

> 2. Similar except with the SFINAE idiom (Substitution Failure Is Not An Error) for C++, to pick the right overload based on statically determined constraints.
> E.g. define if Atomic::has_general_byte_CAS and based on whether this is defined or not, pick the general or specific overload variant of the CAS member function.

Not sure what this one is but it sounds like a manual virtual dispatch -
which seems not a good solution.

> 3. Simply make the current CAS a normal function which is called from billions of new inline method definitions that we have to create for every single architecture.

I think the simple version of 3 is just move cmpxchg(jbtye) out of the
shared code and define for each platform - there aren't that many and it
is consistent with many of the other variants.

> What do we prefer here? Does anyone else have a better idea? Also, should I start a new thread or is it okay to post it here?

Continuing this thread is fine by me.

I think short-term the simple version of 3 is preferable.

Thanks,
David

> /Erik
>
>
> On 12 Sep 2014, at 03:40, David Holmes <david.holmes at oracle.com> wrote:
>
>> Hi Erik,
>>
>> Can we pause and give some more thought to a clean mechanism for allowing a shared implementation if desired with the ability to override if desired. I really do not like to see CPU specific ifdefs being added to shared code. (And I would also not like to see all platforms being forced to reimplement this natively).
>>
>> I'm not saying we will find a simple solution, but it would be nice if we could get a few folk to think about it before proceeding with the ifdefs :)
>>
>>
>> Thanks,
>> David
>>
>>
>> On 12/09/2014 7:48 AM, Erik ?sterlund wrote:
>>> Hi,
>>>
>>> These changes aim at replacing the awkward old jbyte Atomic::cmpxchg implementation for all the supported x86 platforms. It previously emulated the behaviour of cmpxchgb using a loop of cmpxchgl and some dynamic alignment of the destination address.
>>>
>>> This code is called by remembered sets to manipulate card entries.
>>>
>>> The implementation has now been replaced with a bunch of assembly, appropriate for all platforms. Yes, for windows too.
>>>
>>> Implementations include:
>>> bsd x86/x86_64: inline asm
>>> linux x86/x86_64: inline asm
>>> solaris x86/x86_64: .il files
>>> windows x86_64 without GNU source: stubGenerator and manual code emission and hence including new Assembler::cmpxchgb support
>>> Windows x86 + x86_64 with GNU source: inline asm
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8058255
>>>
>>> Webrev: http://cr.openjdk.java.net/~jwilhelm/8058255/webrev/
>>>
>>> Improvements can be made for other architectures can as well, but this should be a good start.
>>>
>>> /Erik
>>>
>
Erik Österlund
2014-09-20 21:53:11 UTC
Permalink
I don't want to break the ported builds so I made a special variant of the third option where the x86 inline methods also have a #define next to them.
The atomic.inline.hpp file then simply defines an inline Atomic::cmpxchg calling the general solution if there is no #define for an override.

That way it's using variant three, but there is no need to write overrides for every platform port (which are sometimes in other repos) there is to simply run the default member function.
We can add them one by one instead. :)

Hope this seems satisfactory.

Thanks,
Erik

On 17 Sep 2014, at 14:28, David Holmes <david.holmes at oracle.com> wrote:

> On 17/09/2014 9:13 PM, Erik ?sterlund wrote:
>> I am back! Did you guys have time to do some thinking? I see three different solutions:
>>
>> 1. Good old class inheritance! Class Atomic is-a Atomic_YourArchHere is-a AtomicAbstract
>> Using the CRTP (Curiously Recurring Template Pattern) for C++, this could be done without a virtual call where we want inlining.
>
> I would prefer this approach (here and elsewhere) but it is not a short-term option.
>
>> 2. Similar except with the SFINAE idiom (Substitution Failure Is Not An Error) for C++, to pick the right overload based on statically determined constraints.
>> E.g. define if Atomic::has_general_byte_CAS and based on whether this is defined or not, pick the general or specific overload variant of the CAS member function.
>
> Not sure what this one is but it sounds like a manual virtual dispatch - which seems not a good solution.
>
>> 3. Simply make the current CAS a normal function which is called from billions of new inline method definitions that we have to create for every single architecture.
>
> I think the simple version of 3 is just move cmpxchg(jbtye) out of the shared code and define for each platform - there aren't that many and it is consistent with many of the other variants.
>
>> What do we prefer here? Does anyone else have a better idea? Also, should I start a new thread or is it okay to post it here?
>
> Continuing this thread is fine by me.
>
> I think short-term the simple version of 3 is preferable.
>
> Thanks,
> David
Erik Österlund
2014-10-01 18:32:41 UTC
Permalink
Hi,

There is now a new webrev with my jbyte cmpxchg changes here: http://cr.openjdk.java.net/~jwilhelm/8058255/webrev.01/

The change from last time is:
1. Fixed an issue where cmpxchgl was incorrectly emitted instead of cmpxchgb for one of the configurations as pointed out by Andrew Haley, cheers for finding that
2. Changed the "design" around a bit to make it easy to override the cmpxchgb for different platforms without having #if defined(PLATFORM_X) || defined(PLATFORMY) etc. in the headers.. If not explicitly overridden, the old general solution be used instead like normal and this should hence not break any of the different ports to other architectures not included in the main hotspot project.

Jesper ran it through jprt and it was all green. Thanks for checking that!

Would be nice if I could get some reviews. :)

Thanks,
Erik


On 20 Sep 2014, at 23:53, Erik ?sterlund <erik.osterlund at lnu.se<mailto:erik.osterlund at lnu.se>> wrote:

I don't want to break the ported builds so I made a special variant of the third option where the x86 inline methods also have a #define next to them.
The atomic.inline.hpp file then simply defines an inline Atomic::cmpxchg calling the general solution if there is no #define for an override.

That way it's using variant three, but there is no need to write overrides for every platform port (which are sometimes in other repos) there is to simply run the default member function.
We can add them one by one instead. :)

Hope this seems satisfactory.

Thanks,
Erik

On 17 Sep 2014, at 14:28, David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote:

On 17/09/2014 9:13 PM, Erik ?sterlund wrote:
I am back! Did you guys have time to do some thinking? I see three different solutions:

1. Good old class inheritance! Class Atomic is-a Atomic_YourArchHere is-a AtomicAbstract
Using the CRTP (Curiously Recurring Template Pattern) for C++, this could be done without a virtual call where we want inlining.

I would prefer this approach (here and elsewhere) but it is not a short-term option.

2. Similar except with the SFINAE idiom (Substitution Failure Is Not An Error) for C++, to pick the right overload based on statically determined constraints.
E.g. define if Atomic::has_general_byte_CAS and based on whether this is defined or not, pick the general or specific overload variant of the CAS member function.

Not sure what this one is but it sounds like a manual virtual dispatch - which seems not a good solution.

3. Simply make the current CAS a normal function which is called from billions of new inline method definitions that we have to create for every single architecture.

I think the simple version of 3 is just move cmpxchg(jbtye) out of the shared code and define for each platform - there aren't that many and it is consistent with many of the other variants.

What do we prefer here? Does anyone else have a better idea? Also, should I start a new thread or is it okay to post it here?

Continuing this thread is fine by me.

I think short-term the simple version of 3 is preferable.

Thanks,
David
David Holmes
2014-10-01 19:41:27 UTC
Permalink
Hi Erik,

Currently at JavaOne and then travelling back to Australia tomorrow. It
will be Tuesday (Holiday Monday) before I can take another look at this.

Sorry.

David

On 2/10/2014 4:32 AM, Erik ?sterlund wrote:
> Hi,
>
> There is now a new webrev with my jbyte cmpxchg changes here: http://cr.openjdk.java.net/~jwilhelm/8058255/webrev.01/
>
> The change from last time is:
> 1. Fixed an issue where cmpxchgl was incorrectly emitted instead of cmpxchgb for one of the configurations as pointed out by Andrew Haley, cheers for finding that
> 2. Changed the "design" around a bit to make it easy to override the cmpxchgb for different platforms without having #if defined(PLATFORM_X) || defined(PLATFORMY) etc. in the headers.. If not explicitly overridden, the old general solution be used instead like normal and this should hence not break any of the different ports to other architectures not included in the main hotspot project.
>
> Jesper ran it through jprt and it was all green. Thanks for checking that!
>
> Would be nice if I could get some reviews. :)
>
> Thanks,
> Erik
>
>
> On 20 Sep 2014, at 23:53, Erik ?sterlund <erik.osterlund at lnu.se<mailto:erik.osterlund at lnu.se>> wrote:
>
> I don't want to break the ported builds so I made a special variant of the third option where the x86 inline methods also have a #define next to them.
> The atomic.inline.hpp file then simply defines an inline Atomic::cmpxchg calling the general solution if there is no #define for an override.
>
> That way it's using variant three, but there is no need to write overrides for every platform port (which are sometimes in other repos) there is to simply run the default member function.
> We can add them one by one instead. :)
>
> Hope this seems satisfactory.
>
> Thanks,
> Erik
>
> On 17 Sep 2014, at 14:28, David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote:
>
> On 17/09/2014 9:13 PM, Erik ?sterlund wrote:
> I am back! Did you guys have time to do some thinking? I see three different solutions:
>
> 1. Good old class inheritance! Class Atomic is-a Atomic_YourArchHere is-a AtomicAbstract
> Using the CRTP (Curiously Recurring Template Pattern) for C++, this could be done without a virtual call where we want inlining.
>
> I would prefer this approach (here and elsewhere) but it is not a short-term option.
>
> 2. Similar except with the SFINAE idiom (Substitution Failure Is Not An Error) for C++, to pick the right overload based on statically determined constraints.
> E.g. define if Atomic::has_general_byte_CAS and based on whether this is defined or not, pick the general or specific overload variant of the CAS member function.
>
> Not sure what this one is but it sounds like a manual virtual dispatch - which seems not a good solution.
>
> 3. Simply make the current CAS a normal function which is called from billions of new inline method definitions that we have to create for every single architecture.
>
> I think the simple version of 3 is just move cmpxchg(jbtye) out of the shared code and define for each platform - there aren't that many and it is consistent with many of the other variants.
>
> What do we prefer here? Does anyone else have a better idea? Also, should I start a new thread or is it okay to post it here?
>
> Continuing this thread is fine by me.
>
> I think short-term the simple version of 3 is preferable.
>
> Thanks,
> David
>
Erik Österlund
2014-10-03 00:33:14 UTC
Permalink
G'Day David,

No problem mate!
Rage on at JavaOne, hope it's a ripper and talk to you when you get back to Australia mate. ;)

Thanks,
/Erik

> On 1 okt 2014, at 21:41, David Holmes <david.holmes at oracle.com> wrote:
>
> Hi Erik,
>
> Currently at JavaOne and then travelling back to Australia tomorrow. It will be Tuesday (Holiday Monday) before I can take another look at this.
>
> Sorry.
>
> David
>
>> On 2/10/2014 4:32 AM, Erik ?sterlund wrote:
>> Hi,
>>
>> There is now a new webrev with my jbyte cmpxchg changes here: http://cr.openjdk.java.net/~jwilhelm/8058255/webrev.01/
>>
>> The change from last time is:
>> 1. Fixed an issue where cmpxchgl was incorrectly emitted instead of cmpxchgb for one of the configurations as pointed out by Andrew Haley, cheers for finding that
>> 2. Changed the "design" around a bit to make it easy to override the cmpxchgb for different platforms without having #if defined(PLATFORM_X) || defined(PLATFORMY) etc. in the headers.. If not explicitly overridden, the old general solution be used instead like normal and this should hence not break any of the different ports to other architectures not included in the main hotspot project.
>>
>> Jesper ran it through jprt and it was all green. Thanks for checking that!
>>
>> Would be nice if I could get some reviews. :)
>>
>> Thanks,
>> Erik
>>
>>
>> On 20 Sep 2014, at 23:53, Erik ?sterlund <erik.osterlund at lnu.se<mailto:erik.osterlund at lnu.se>> wrote:
>>
>> I don't want to break the ported builds so I made a special variant of the third option where the x86 inline methods also have a #define next to them.
>> The atomic.inline.hpp file then simply defines an inline Atomic::cmpxchg calling the general solution if there is no #define for an override.
>>
>> That way it's using variant three, but there is no need to write overrides for every platform port (which are sometimes in other repos) there is to simply run the default member function.
>> We can add them one by one instead. :)
>>
>> Hope this seems satisfactory.
>>
>> Thanks,
>> Erik
>>
>> On 17 Sep 2014, at 14:28, David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote:
>>
>> On 17/09/2014 9:13 PM, Erik ?sterlund wrote:
>> I am back! Did you guys have time to do some thinking? I see three different solutions:
>>
>> 1. Good old class inheritance! Class Atomic is-a Atomic_YourArchHere is-a AtomicAbstract
>> Using the CRTP (Curiously Recurring Template Pattern) for C++, this could be done without a virtual call where we want inlining.
>>
>> I would prefer this approach (here and elsewhere) but it is not a short-term option.
>>
>> 2. Similar except with the SFINAE idiom (Substitution Failure Is Not An Error) for C++, to pick the right overload based on statically determined constraints.
>> E.g. define if Atomic::has_general_byte_CAS and based on whether this is defined or not, pick the general or specific overload variant of the CAS member function.
>>
>> Not sure what this one is but it sounds like a manual virtual dispatch - which seems not a good solution.
>>
>> 3. Simply make the current CAS a normal function which is called from billions of new inline method definitions that we have to create for every single architecture.
>>
>> I think the simple version of 3 is just move cmpxchg(jbtye) out of the shared code and define for each platform - there aren't that many and it is consistent with many of the other variants.
>>
>> What do we prefer here? Does anyone else have a better idea? Also, should I start a new thread or is it okay to post it here?
>>
>> Continuing this thread is fine by me.
>>
>> I think short-term the simple version of 3 is preferable.
>>
>> Thanks,
>> David
>>
David Holmes
2014-10-09 04:40:40 UTC
Permalink
Hi Erik,

Sorry for the further delay on this.

Not sure I like VM_HAS_SPECIALIZED_BYTE_CMPXCHG. Would like to hear
opinions from others. (I also can't confirm the correctness of the
assembler code nor the changes to the .il files, so other's need to
weigh in here regardless.)

With this approach does every arch need an entry in
stubGenerator_<arch>.ccp of the form:

StubRoutines::_atomic_cmpxchg_byte_entry = <something>;

? If we have to touch the code for each arch then maybe simply
duplicating what is presently the shared implementation is a better way
to go?

In src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp you have:

126 #define VM_HAS_SPECIALIZED_BYTE_CMPXCHG

and

220 #define VM_HAS_SPECIALIZED_BYTE_CMPXCHG

Thanks,
David
-----

On 2/10/2014 5:41 AM, David Holmes wrote:
> Hi Erik,
>
> Currently at JavaOne and then travelling back to Australia tomorrow. It
> will be Tuesday (Holiday Monday) before I can take another look at this.
>
> Sorry.
>
> David
>
> On 2/10/2014 4:32 AM, Erik ?sterlund wrote:
>> Hi,
>>
>> There is now a new webrev with my jbyte cmpxchg changes here:
>> http://cr.openjdk.java.net/~jwilhelm/8058255/webrev.01/
>>
>> The change from last time is:
>> 1. Fixed an issue where cmpxchgl was incorrectly emitted instead of
>> cmpxchgb for one of the configurations as pointed out by Andrew Haley,
>> cheers for finding that
>> 2. Changed the "design" around a bit to make it easy to override the
>> cmpxchgb for different platforms without having #if
>> defined(PLATFORM_X) || defined(PLATFORMY) etc. in the headers.. If not
>> explicitly overridden, the old general solution be used instead like
>> normal and this should hence not break any of the different ports to
>> other architectures not included in the main hotspot project.
>>
>> Jesper ran it through jprt and it was all green. Thanks for checking
>> that!
>>
>> Would be nice if I could get some reviews. :)
>>
>> Thanks,
>> Erik
>>
>>
>> On 20 Sep 2014, at 23:53, Erik ?sterlund
>> <erik.osterlund at lnu.se<mailto:erik.osterlund at lnu.se>> wrote:
>>
>> I don't want to break the ported builds so I made a special variant of
>> the third option where the x86 inline methods also have a #define next
>> to them.
>> The atomic.inline.hpp file then simply defines an inline
>> Atomic::cmpxchg calling the general solution if there is no #define
>> for an override.
>>
>> That way it's using variant three, but there is no need to write
>> overrides for every platform port (which are sometimes in other repos)
>> there is to simply run the default member function.
>> We can add them one by one instead. :)
>>
>> Hope this seems satisfactory.
>>
>> Thanks,
>> Erik
>>
>> On 17 Sep 2014, at 14:28, David Holmes
>> <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote:
>>
>> On 17/09/2014 9:13 PM, Erik ?sterlund wrote:
>> I am back! Did you guys have time to do some thinking? I see three
>> different solutions:
>>
>> 1. Good old class inheritance! Class Atomic is-a Atomic_YourArchHere
>> is-a AtomicAbstract
>> Using the CRTP (Curiously Recurring Template Pattern) for C++, this
>> could be done without a virtual call where we want inlining.
>>
>> I would prefer this approach (here and elsewhere) but it is not a
>> short-term option.
>>
>> 2. Similar except with the SFINAE idiom (Substitution Failure Is Not
>> An Error) for C++, to pick the right overload based on statically
>> determined constraints.
>> E.g. define if Atomic::has_general_byte_CAS and based on whether this
>> is defined or not, pick the general or specific overload variant of
>> the CAS member function.
>>
>> Not sure what this one is but it sounds like a manual virtual dispatch
>> - which seems not a good solution.
>>
>> 3. Simply make the current CAS a normal function which is called from
>> billions of new inline method definitions that we have to create for
>> every single architecture.
>>
>> I think the simple version of 3 is just move cmpxchg(jbtye) out of the
>> shared code and define for each platform - there aren't that many and
>> it is consistent with many of the other variants.
>>
>> What do we prefer here? Does anyone else have a better idea? Also,
>> should I start a new thread or is it okay to post it here?
>>
>> Continuing this thread is fine by me.
>>
>> I think short-term the simple version of 3 is preferable.
>>
>> Thanks,
>> David
>>
Erik Österlund
2014-10-09 09:13:04 UTC
Permalink
Hi David,

On 09 Oct 2014, at 06:40, David Holmes <david.holmes at oracle.com> wrote:

> Hi Erik,
>
> Sorry for the further delay on this.

No problem.

> Not sure I like VM_HAS_SPECIALIZED_BYTE_CMPXCHG. Would like to hear opinions from others. (I also can't confirm the correctness of the assembler code nor the changes to the .il files, so other's need to weigh in here regardless.)

Okay. I did this in order not to break the code for maintainers of ports outside of our repo. Did not want a performance optimization for x86 to be intrusive and break the code of all others. I'm open to suggestions if anyone else has a better solution to this (that does not need virtual calls). And if you want me to ruthlessly just define a jbyte Atomic::cmpxchg inline member function that forwards to the old solution for the platforms in our repo (and breaking the code for all others outside), I can do that too.

> With this approach does every arch need an entry in stubGenerator_<arch>.ccp of the form:
>
> StubRoutines::_atomic_cmpxchg_byte_entry = <something>;
>
> ? If we have to touch the code for each arch then maybe simply duplicating what is presently the shared implementation is a better way to go?

This is currently used only by very awkward configurations that have no compiler support to generate the machine code I want. Instead it is generated at runtime while bootstrapping the jbyte Atomic::cmpxchg member function.

Other platforms need not care about defining such entry as it is not being used by them. Unless they actually want to use it of course. :)

> In src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp you have:
>
> 126 #define VM_HAS_SPECIALIZED_BYTE_CMPXCHG
>
> and
>
> 220 #define VM_HAS_SPECIALIZED_BYTE_CMPXCHG

I put the #define next to the Atomic::cmpxchg definitions, and Windows has two such definitions because of different possible configurations, that's why. One of the configurations generates a stub at runtime, the other one has compiler support. In case a configuration for Windows was added with no support for jbyte Atomic::cmpxchg, we would otherwise run into a bit of trouble. It won't be #defined twice for the same reason the member function is not defined twice.

Thank you for having a look at this David! :)

/Erik

>
> Thanks,
> David
> -----
>
> On 2/10/2014 5:41 AM, David Holmes wrote:
>> Hi Erik,
>>
>> Currently at JavaOne and then travelling back to Australia tomorrow. It
>> will be Tuesday (Holiday Monday) before I can take another look at this.
>>
>> Sorry.
>>
>> David
>>
>> On 2/10/2014 4:32 AM, Erik ?sterlund wrote:
>>> Hi,
>>>
>>> There is now a new webrev with my jbyte cmpxchg changes here:
>>> http://cr.openjdk.java.net/~jwilhelm/8058255/webrev.01/
>>>
>>> The change from last time is:
>>> 1. Fixed an issue where cmpxchgl was incorrectly emitted instead of
>>> cmpxchgb for one of the configurations as pointed out by Andrew Haley,
>>> cheers for finding that
>>> 2. Changed the "design" around a bit to make it easy to override the
>>> cmpxchgb for different platforms without having #if
>>> defined(PLATFORM_X) || defined(PLATFORMY) etc. in the headers.. If not
>>> explicitly overridden, the old general solution be used instead like
>>> normal and this should hence not break any of the different ports to
>>> other architectures not included in the main hotspot project.
>>>
>>> Jesper ran it through jprt and it was all green. Thanks for checking
>>> that!
>>>
>>> Would be nice if I could get some reviews. :)
>>>
>>> Thanks,
>>> Erik
>>>
>>>
>>> On 20 Sep 2014, at 23:53, Erik ?sterlund
>>> <erik.osterlund at lnu.se<mailto:erik.osterlund at lnu.se>> wrote:
>>>
>>> I don't want to break the ported builds so I made a special variant of
>>> the third option where the x86 inline methods also have a #define next
>>> to them.
>>> The atomic.inline.hpp file then simply defines an inline
>>> Atomic::cmpxchg calling the general solution if there is no #define
>>> for an override.
>>>
>>> That way it's using variant three, but there is no need to write
>>> overrides for every platform port (which are sometimes in other repos)
>>> there is to simply run the default member function.
>>> We can add them one by one instead. :)
>>>
>>> Hope this seems satisfactory.
>>>
>>> Thanks,
>>> Erik
>>>
>>> On 17 Sep 2014, at 14:28, David Holmes
>>> <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote:
>>>
>>> On 17/09/2014 9:13 PM, Erik ?sterlund wrote:
>>> I am back! Did you guys have time to do some thinking? I see three
>>> different solutions:
>>>
>>> 1. Good old class inheritance! Class Atomic is-a Atomic_YourArchHere
>>> is-a AtomicAbstract
>>> Using the CRTP (Curiously Recurring Template Pattern) for C++, this
>>> could be done without a virtual call where we want inlining.
>>>
>>> I would prefer this approach (here and elsewhere) but it is not a
>>> short-term option.
>>>
>>> 2. Similar except with the SFINAE idiom (Substitution Failure Is Not
>>> An Error) for C++, to pick the right overload based on statically
>>> determined constraints.
>>> E.g. define if Atomic::has_general_byte_CAS and based on whether this
>>> is defined or not, pick the general or specific overload variant of
>>> the CAS member function.
>>>
>>> Not sure what this one is but it sounds like a manual virtual dispatch
>>> - which seems not a good solution.
>>>
>>> 3. Simply make the current CAS a normal function which is called from
>>> billions of new inline method definitions that we have to create for
>>> every single architecture.
>>>
>>> I think the simple version of 3 is just move cmpxchg(jbtye) out of the
>>> shared code and define for each platform - there aren't that many and
>>> it is consistent with many of the other variants.
>>>
>>> What do we prefer here? Does anyone else have a better idea? Also,
>>> should I start a new thread or is it okay to post it here?
>>>
>>> Continuing this thread is fine by me.
>>>
>>> I think short-term the simple version of 3 is preferable.
>>>
>>> Thanks,
>>> David
>>>
David Holmes
2014-10-09 11:25:35 UTC
Permalink
On 9/10/2014 7:13 PM, Erik ?sterlund wrote:
> Hi David,
>
> On 09 Oct 2014, at 06:40, David Holmes <david.holmes at oracle.com> wrote:
>
>> Hi Erik,
>>
>> Sorry for the further delay on this.
>
> No problem.
>
>> Not sure I like VM_HAS_SPECIALIZED_BYTE_CMPXCHG. Would like to hear opinions from others. (I also can't confirm the correctness of the assembler code nor the changes to the .il files, so other's need to weigh in here regardless.)
>
> Okay. I did this in order not to break the code for maintainers of ports outside of our repo. Did not want a performance optimization for x86 to be intrusive and break the code of all others. I'm open to suggestions if anyone else has a better solution to this (that does not need virtual calls). And if you want me to ruthlessly just define a jbyte Atomic::cmpxchg inline member function that forwards to the old solution for the platforms in our repo (and breaking the code for all others outside), I can do that too.
>
>> With this approach does every arch need an entry in stubGenerator_<arch>.ccp of the form:
>>
>> StubRoutines::_atomic_cmpxchg_byte_entry = <something>;
>>
>> ? If we have to touch the code for each arch then maybe simply duplicating what is presently the shared implementation is a better way to go?
>
> This is currently used only by very awkward configurations that have no compiler support to generate the machine code I want. Instead it is generated at runtime while bootstrapping the jbyte Atomic::cmpxchg member function.

Some platforms need to dynamically adjust the runtime code based on the
actual version of the platform in use.

> Other platforms need not care about defining such entry as it is not being used by them. Unless they actually want to use it of course. :)

So I'm not sure if the answer to my question is yes, no, or maybe :)

>> In src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp you have:
>>
>> 126 #define VM_HAS_SPECIALIZED_BYTE_CMPXCHG
>>
>> and
>>
>> 220 #define VM_HAS_SPECIALIZED_BYTE_CMPXCHG
>
> I put the #define next to the Atomic::cmpxchg definitions, and Windows has two such definitions because of different possible configurations, that's why. One of the configurations generates a stub at runtime, the other one has compiler support. In case a configuration for Windows was added with no support for jbyte Atomic::cmpxchg, we would otherwise run into a bit of trouble. It won't be #defined twice for the same reason the member function is not defined twice.

I looked for, but missed the #ifdef AMD64. Must admit I don't understand
why the 32-bit and 64-bit windows code is different for these atomic
functions.

David

> Thank you for having a look at this David! :)
>
> /Erik
>
>>
>> Thanks,
>> David
>> -----
>>
>> On 2/10/2014 5:41 AM, David Holmes wrote:
>>> Hi Erik,
>>>
>>> Currently at JavaOne and then travelling back to Australia tomorrow. It
>>> will be Tuesday (Holiday Monday) before I can take another look at this.
>>>
>>> Sorry.
>>>
>>> David
>>>
>>> On 2/10/2014 4:32 AM, Erik ?sterlund wrote:
>>>> Hi,
>>>>
>>>> There is now a new webrev with my jbyte cmpxchg changes here:
>>>> http://cr.openjdk.java.net/~jwilhelm/8058255/webrev.01/
>>>>
>>>> The change from last time is:
>>>> 1. Fixed an issue where cmpxchgl was incorrectly emitted instead of
>>>> cmpxchgb for one of the configurations as pointed out by Andrew Haley,
>>>> cheers for finding that
>>>> 2. Changed the "design" around a bit to make it easy to override the
>>>> cmpxchgb for different platforms without having #if
>>>> defined(PLATFORM_X) || defined(PLATFORMY) etc. in the headers.. If not
>>>> explicitly overridden, the old general solution be used instead like
>>>> normal and this should hence not break any of the different ports to
>>>> other architectures not included in the main hotspot project.
>>>>
>>>> Jesper ran it through jprt and it was all green. Thanks for checking
>>>> that!
>>>>
>>>> Would be nice if I could get some reviews. :)
>>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>>
>>>> On 20 Sep 2014, at 23:53, Erik ?sterlund
>>>> <erik.osterlund at lnu.se<mailto:erik.osterlund at lnu.se>> wrote:
>>>>
>>>> I don't want to break the ported builds so I made a special variant of
>>>> the third option where the x86 inline methods also have a #define next
>>>> to them.
>>>> The atomic.inline.hpp file then simply defines an inline
>>>> Atomic::cmpxchg calling the general solution if there is no #define
>>>> for an override.
>>>>
>>>> That way it's using variant three, but there is no need to write
>>>> overrides for every platform port (which are sometimes in other repos)
>>>> there is to simply run the default member function.
>>>> We can add them one by one instead. :)
>>>>
>>>> Hope this seems satisfactory.
>>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>> On 17 Sep 2014, at 14:28, David Holmes
>>>> <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote:
>>>>
>>>> On 17/09/2014 9:13 PM, Erik ?sterlund wrote:
>>>> I am back! Did you guys have time to do some thinking? I see three
>>>> different solutions:
>>>>
>>>> 1. Good old class inheritance! Class Atomic is-a Atomic_YourArchHere
>>>> is-a AtomicAbstract
>>>> Using the CRTP (Curiously Recurring Template Pattern) for C++, this
>>>> could be done without a virtual call where we want inlining.
>>>>
>>>> I would prefer this approach (here and elsewhere) but it is not a
>>>> short-term option.
>>>>
>>>> 2. Similar except with the SFINAE idiom (Substitution Failure Is Not
>>>> An Error) for C++, to pick the right overload based on statically
>>>> determined constraints.
>>>> E.g. define if Atomic::has_general_byte_CAS and based on whether this
>>>> is defined or not, pick the general or specific overload variant of
>>>> the CAS member function.
>>>>
>>>> Not sure what this one is but it sounds like a manual virtual dispatch
>>>> - which seems not a good solution.
>>>>
>>>> 3. Simply make the current CAS a normal function which is called from
>>>> billions of new inline method definitions that we have to create for
>>>> every single architecture.
>>>>
>>>> I think the simple version of 3 is just move cmpxchg(jbtye) out of the
>>>> shared code and define for each platform - there aren't that many and
>>>> it is consistent with many of the other variants.
>>>>
>>>> What do we prefer here? Does anyone else have a better idea? Also,
>>>> should I start a new thread or is it okay to post it here?
>>>>
>>>> Continuing this thread is fine by me.
>>>>
>>>> I think short-term the simple version of 3 is preferable.
>>>>
>>>> Thanks,
>>>> David
>>>>
>
Loading...