Planetside Software Forums

General => Open Discussion => Topic started by: PabloMack on July 31, 2019, 03:45:01 pm

Title: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on July 31, 2019, 03:45:01 pm
While writing an assembler and linker for the AMD64, I learned that the instruction set can only directly reach across images up to 2GBytes. The PE32+FileFormat which defines Microsoft's EXE format has the same limitations. I just ran TG4 just to see how big the image is and it shows to be about 0.756 GBytes. I am running another application that shows to be 1.721 GBytes. This is almost knocking on the door for being too large for this architecture. What is going to happen if the TG application grows by a factor of 3? How is this image size limit going to be handled? Maybe it is planned obsolescence?
[attach=1]
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: Matt on August 02, 2019, 01:39:10 am
Are you sure that is reporting anything to do with the instruction set? I think that's just the overall RAM use including data, right? The code portion of Terragen cannot be more than a few dozen Mb, but I don't know how much scattering occurs at run time.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 02, 2019, 03:51:18 am
A strange topic. Address space, or what a application or process can access is in memory. But this memory isn't just physical (and shouldn't be on modern OSes). 2GB limitation is a remnant of older 32bit OSes and that's where the Large Address Awareness comes in with 64bit applications to allow up to 4GB of address space.

This is also just address space (addressable at a time) and not cached assets where you can see memory allocated above 4GB.

A good example of this is my inability to create a appropriate caching properly (as well as pure optimization of dirty code) for a shader exporter I am writing and it eating up all my memory (literally gigabytes of memory for a single shader at 12k) lol

Large memory address aware applications like games allow them to load large chunks of open world assets and stream them from cache too, dramatically speeding up loading times rather than pulling from the HDD.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 02, 2019, 11:48:25 am
Also I am not sure what you mean by images? Executables are not images. Images use sectors and headers cause originally they were snapshots of actual HDDs. IMG, IMA, etc have this info so they can be mountable as a medium.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 02, 2019, 01:20:25 pm
Quote from: WASasquatch on August 02, 2019, 11:48:25 am
Also I am not sure what you mean by images? Executables are not images. Images use sectors and headers cause originally they were snapshots of actual HDDs. IMG, IMA, etc have this info so they can be mountable as a medium.


I am using Microsoft terminology. The PE32+ format is what is used for AMD64. I think it is called
PE32+ instead of PE64 because it only supports 32-bit images but was upgraded from PE32 so
that it can load them into a 64-bit address space. However, the specification is complicated and
documentation is not well written in my opinion. So it may be that loading an executable that uses
a list of sections instead of an image may make it possible to load programs much larger than
what will fit into a 32-bit section of address space.  I can't say.

What I understand an image to be is a section of address space that can be reached using 32-bit
offsets. Since these are signed (plus or minus) either direction will only reach 2 GBytes (not 4). In
other words, a reference can reach up to 2GB forward or 2GB backward. If the reference was at the
beginning of the image, it can only reach 2GB forward. Reaching backward will not be used because
the reference being at the beginning of the image means that there is nothing to reach back there.
Same goes for a reference being at the end of an image. The reference will only reach backward
because there is nothing forward to reach because the reference is at the end of the image.

The image includes the program plus the statically defined data which includes globals and constants.
Dynamic memory (which is called the heap in C/C++) is another matter and that is implemented using
RAM that is not part of the image. So it may be that the TG memory usage that I see not only includes
the image but also dynamic memory which will grow and shrink depending on what it is doing. So I don't
think TG has to worry about its executable becoming too large because the image is certainly a lot smaller
than 2GBytes. The TGD.EXE file shows to be 488KB. That plus any DLLs that are used are probably a
good approximation of what TG uses statically. Sorry if I alarmed anyone.

Here is a picture I took out of a document that describes the PE32 file format it uses the term "image"
in two places. The PE32+ is similar but it has some fields extended to 64-bit to support a large address
space. But images are still limited to 2GBytes.

The reason why Windows32 needs 4GBytes is so that it can look at system space at the same time
as it looks at user space, each being 2GBytes in size.

[attach=1]
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 02, 2019, 03:18:05 pm
I see what you mean about images now. I spend to much time in Linux. Lol

Seems PE+/PE32+ is a .NET PE 64bit extension specification introduced for Windows CE. Didn't AMD introduced 64bit architecture first, and Intel followed suit with the modified NetBurst? Not sure about the hearts of the CPUs these days, but during the Athlon battle days, AMD64 was considered "true 64bit". Where AMD64 for was from R&D and Intel's a modification, and likely RE.

What you are saying doesn't seem to be true, there are many programs I shouldn't be running on AMD64, let alone my KVMs... :O

From researching the "2 Gigabyte Problem" which gave us PAE, there is no mention of AMD64 as any problem. When the 4GT flag is set in the BIOS, a 64bit executable is Large Address Aware  and can use "3GB" (seems odd) -- While a 32bit process that has  a Large Address Aware flag can use 4GB from 2GB.

I can't actually find anything about AMD64 limited to 2GB, and PE32+ relates to .NET PE and a 64bit implementation.

A good game example for this problem is the Morrowind Graphics Overhaul. It required 4GB address space to run, and uses the Large Address Aware patch to give the morrowind executable the LAA flag, and than can use 4GB of address space. Also probably my favorite game of all time. Just throwing that out there...
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 02, 2019, 03:38:06 pm
Also what you are saying just confuses me as it would mean AMD64, a breakthrough is really no different than x86? Utilizing some sort of hack that even LAA 32bit processes don't even need to do. I'm just confused.  :o
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 02, 2019, 05:12:11 pm
Quote from: Matt on August 02, 2019, 01:39:10 am
I think that's just the overall RAM use including data, right? The code portion of Terragen cannot be more than a few dozen Mb, but I don't know how much scattering occurs at run time.


I think that is all correct.

Quote from: WASasquatch on August 02, 2019, 03:38:06 pm
Also what you are saying just confuses me as it would mean AMD64, a breakthrough is really no different than x86? Utilizing some sort of hack that even LAA 32bit processes don't even need to do. I'm just confused.  :o


Well...I'm confused too so we're in the same boat. The x86 is a big mess and I don't think anyone who knows this architecture doubts that.

When the x86-32 became x86-64 (AMD64), the main thing that happened was that it now had a 64-bit address space. That is not a trivial improvement. That is a big improvement. But its addressing modes changed very little. Programs that are linked as one image, still only have a choice between 8- or 32-bit (signed) offsets. That's all. You can't build an image that is so large that references within the image can't be resolved. This is what caps the maximum size of your program to 2 G-Bytes. Also, keep in mind that program space pointed to by the code segment can be in a different 32-bit space from the data space pointed to by the data segment. Initialized data is part of the image. So Code Space plus Data Space together can be up to 4GBytes.

Building an image is a pre-load (link time) process. This means that all references within itself must be resolved and there are no lingering references that know anything about anything outside of its 32-bit address space. I don't think this includes DLL's because they don't link until needed (That's what's meant by Dynamic which is the 'D' in DLL). But this does not say that you can't do post-load time (i.e. run time) address calculations to reach more than that. This is RAM and it is uninitialized and it is outside the image. It will still show up as memory usage and this part of memory can become very huge. When a program like TG loads such things as texture maps and geometries, these are not part of its image. But they are part of your process's memory. So when you see program usage in the Task Manager, it doesn't break down usage by image/segment and dynamically allocated memory. It just shows you the total.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 02, 2019, 06:44:25 pm
Quote from: PabloMack on August 02, 2019, 05:12:11 pm
When the x86-32 became x86-64 (AMD64), the main thing that happened was that it now had a 64-bit address space. That is not a trivial improvement. That is a big improvement. But its addressing modes changed very little. Programs that are linked as one image, still only have a choice between 8- or 32-bit (signed) offsets. That's all. You can't build an image that is so large that references within the image can't be resolved. This is what caps the maximum size of your program to 2 G-Bytes. Also, keep in mind that program space pointed to by the code segment can be in a different 32-bit space from the data space pointed to by the data segment. Initialized data is part of the image. So Code Space plus Data Space together can be up to 4GBytes.

Building an image is a pre-load (link time) process. This means that all references within itself must be resolved and there are no lingering references that know anything about anything outside of its 32-bit address space. I don't think this includes DLL's because they don't link until needed (That's what's meant by Dynamic which is the 'D' in DLL). But this does not say that you can't do post-load time (i.e. run time) address calculations to reach more than that. This is RAM and it is uninitialized and it is outside the image. It will still show up as memory usage and this part of memory can become very huge. When a program like TG loads such things as texture maps and geometries, these are not part of its image. But they are part of your process's memory. So when you see program usage in the Task Manager, it doesn't break down usage by image/segment and dynamically allocated memory. It just shows you the total.


That sorta defeats the purpose of the logic, and again, what LAA does, making THAT process 4GB aware. Which I guess the 3GB (mentioned above) actually comes from 32 process (and OS) with the /3GB switch enabled (https://docs.microsoft.com/en-us/previous-versions/tn-archive/bb124810(v=exchg.65) (https://docs.microsoft.com/en-us/previous-versions/tn-archive/bb124810(v=exchg.65))).

I did find this interesting Microsoft blog post by ASP.NET Debugging regarding processes. And I do see a limitation imposed on .NET applications.


"2800 MB if using a 4 GB process or more if more RAM (around 70% of RAM + Pagefile)"

Further going on to say

"Keep in mind that although a .NET process can grow this large, if the process is multiple GB in size, it can become very difficult for the Garage Collector to keep up with the memory as Generation 2 will become very large.  I'll talk about the generations more in an upcoming post."

https://blogs.msdn.microsoft.com/tom/2008/04/10/chat-question-memory-limits-for-32-bit-and-64-bit-processes/

I assume that 4 GB Process refers to a LAA process. It does though mention that it can go above this like a normal 64bit process if you have the ram available. Which it does look like the offset for that is pretty cracy at 70% plus Pagefile to match (I already do this from the get go for best windows performance. Had many convos here about that).

In the end though, this seems to be a limitation of .NET, not AMD64/Intel64.

This is actually enlightening information about the stability of some .NET programs that grow in memory.

Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 02, 2019, 07:49:01 pm
I'm pretty much talking about hardware. There are software systems layered over the hardware for gaming and such that can manage additional memory beyond the 4GByte image. These software systems have their own switches, parameters and metrics to do their own work. A lot of this kind of thing is hidden in their software layers. But the bottom line is, ultimately, software can only do what the hardware can do. What is hidden in the software layers is out of sight. The IBM AS/400 got to be very good at hiding how it did things by this layering upon layers. There is nothing to prevent a program from loading additional executable code to many places in that huge 64-bit address space to make a program that goes far beyond a single image. But they do this at additional cost by making more calls to the OS and doing things like link-loading other files and using "manual" address calculations to get to those other places in memory. This kind of work-around is common in the Microsoft world. Remember memory extenders in DOS? It's very nasty stuff. It seems it's de-ha-veau all over again.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 02, 2019, 10:35:08 pm
Quote from: PabloMack on August 02, 2019, 07:49:01 pm
I'm pretty much talking about hardware. There are software systems layered over the hardware for gaming and such that can manage additional memory beyond the 4GByte image. These software systems have their own switches, parameters and metrics to do their own work. A lot of this kind of thing is hidden in their software layers. But the bottom line is, ultimately, software can only do what the hardware can do. What is hidden in the software layers is out of sight. The IBM AS/400 got to be very good at hiding how it did things by this layering upon layers. There is nothing to prevent a program from loading additional executable code to many places in that huge 64-bit address space to make a program that goes far beyond a single image. But they do this at additional cost by making more calls to the OS and doing things like link-loading other files and using "manual" address calculations to get to those other places in memory. This kind of work-around is common in the Microsoft world. Remember memory extenders in DOS? It's very nasty stuff. It seems it's de-ha-veau all over again.


Hardware is what I am talking about. The hardware switches for 3gb and 4gb allow SINGLE 3gb/4gb address calls.

No where can I find that 64bit processes are limited to 2gb draws, for AMD64 or otherwise. The only limitation is in .NET. like you mentioned PE32+  but it's a extension for Windows CE, a different breed to WinXP or otherwise.

The limitations you describe seem to fall in line with .NET memory limitations exclusively.

Also, you can explore sub processes of processes to see what is all linked. Expand the process in task manager or view it in process hacker.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 03, 2019, 11:37:57 am
Quote from: WASasquatch on August 02, 2019, 10:35:08 pm
Hardware is what I am talking about. The hardware switches for 3gb and 4gb allow SINGLE 3gb/4gb address calls.


The document you showed me was old and pre-AMD64. What you are talking about deals with how a 32-bit operating system can use a 32-bit address space (in a pre-AMD64 system) to map in system and user space. I believe the way both Windows and Linux work is that when the OS is in Supervisor Mode, it can see both user space and system space. This makes it easier to process requests because it can directly see what the user mode sees while simultaneously seeing its own resources.

In a system call, the user passes an address to the supervisor in a request for some service to be provided. The user will always pass an address that has meaning in its own user space because it doesn't have direct access to supervisor space. If a user tries to access the space that is mapped for use by the supervisor, then it will cause a fault and the process will be aborted. This is because the MMU will not allow the processor to make the access that is protected for system use only. But when in supervisor mode, the supervisor can make accesses to both its own space and user's space. In the first implementation of NT, 2GBytes of address space were used for mapping in system resources while the other 2GBytes were used to map in user memory. Later on when many user mode programs were needing access to more memory, the partitioning was changed to give more to the user and less for use by the operating system. In order to provide a user mode process more than 2GB of address space, a special setup was used where the system will make-do with only 1GByte for itself, leaving 3GBytes for use for user programs. This is not so much a "hardware switch". It is the way the operating system configures the MMU (which is hardware) for simultaneous use by the system and the user.

Quote from: WASasquatch on August 02, 2019, 10:35:08 pm
No where can I find that 64bit processes are limited to 2gb draws, for AMD64 or otherwise. The only limitation is in .NET. like you mentioned PE32+  but it's a extension for Windows CE, a different breed to WinXP or otherwise.


Let me also say that "processes" are not hardware but are software entities created and managed by the operating system.

What do you mean by a "draw"? Also, I've never used .NET but it has to run on the same hardware as everything else. Same applies to Windows-CE. That is just software. They use the same processors as regular Windows. But it uses the hardware differently by configuring memory management differently.

Quote from: WASasquatch on August 02, 2019, 10:35:08 pm
Also, you can explore sub processes of processes to see what is all linked. Expand the process in task manager or view it in process hacker.


This may be the case. But still, if different sub-processes are running within the same "process" and together they can grow to be very much larger than 4GB, then in effect, each sub-process has its own "image" so we are talking about multiple images now and multiple 4GB images can reside within the same 64-bit address space at the same time. This is something that Windows-32 or Linux-32 could not do because all they had was a 32-bit address space. But still, an instruction that is executing within one of the images can't directly reach a resource that is located within another image using one of the standard addressing modes that is defined in the Mod/RM byte of the instruction. The offsets are only 8 or 32-bits and there are no 64-bit offsets in any of the addressing modes. But you can load a 64-bit value into a register and then use it as an address but this is not a 64-bit addressing mode.

It would be like one guy says "You can't kill anyone with an unloaded gun". Then to prove that he can, he takes the empty gun and hits someone over the head with it and kills him. So you can kill someone with an empty gun. You just can shoot someone with it to kill him. These conversations are all about how you say things and what you mean by saying them. I think in some ways we are defining our words differently. For example, the kind of "linking" you are talking about may be different than what I meant by the word. Sometimes in computer science, the same words are used for very different things. When they talk about "linked-lists" they are not talking about what a "linker" does. These are talking about completely different things.

So it is no wonder that we sometimes disagree because we don't realize we are talking about different things in the first place or define our words to mean different things. To me a "hardware switch" is a flip-flop where you write a "1" into it and the way it controls other hardware is changed. To you, it might be like a command-line "switch" using terminology used in command line interpreters that somehow configure things behind the scenes so that the software works differently somehow.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 03, 2019, 12:36:24 pm
Quote from: PabloMack on August 03, 2019, 11:37:57 am
Quote from: WASasquatch on August 02, 2019, 10:35:08 pm
Hardware is what I am talking about. The hardware switches for 3gb and 4gb allow SINGLE 3gb/4gb address calls.


The document you showed me was old and pre-AMD64. What you are talking about deals with how a 32-bit operating system can use a 32-bit address space (in a pre-AMD64 system) to map in system and user space. I believe the way both Windows and Linux work is that when the OS is in Supervisor Mode, it can see both user space and system space. This makes it easier to process requests because it can directly see what the user mode sees while simultaneously seeing its own resources.

In a system call, the user passes an address to the supervisor in a request for some service to be provided. The user will always pass an address that has meaning in its own user space because it doesn't have direct access to supervisor space. If a user tries to access the space that is mapped for use by the supervisor, then it will cause a fault and the process will be aborted. This is because the MMU will not allow the processor to make the access that is protected for system use only. But when in supervisor mode, the supervisor can make accesses to both its own space and user's space. In the first implementation of NT, 2GBytes of address space were used for mapping in system resources while the other 2GBytes were used to map in user memory. Later on when many user mode programs were needing access to more memory, the partitioning was changed to give more to the user and less for use by the operating system. In order to provide a user mode process more than 2GB of address space, a special setup was used where the system will make-do with only 1GByte for itself, leaving 3GBytes for use for user programs. This is not so much a "hardware switch". It is the way the operating system configures the MMU (which is hardware) for simultaneous use by the system and the user.

Quote from: WASasquatch on August 02, 2019, 10:35:08 pm
No where can I find that 64bit processes are limited to 2gb draws, for AMD64 or otherwise. The only limitation is in .NET. like you mentioned PE32+  but it's a extension for Windows CE, a different breed to WinXP or otherwise.


Let me also say that "processes" are not hardware but are software entities created and managed by the operating system.

What do you mean by a "draw"? Also, I've never used .NET but it has to run on the same hardware as everything else. Same applies to Windows-CE. That is just software. They use the same processors as regular Windows. But it uses the hardware differently by configuring memory management differently.

Quote from: WASasquatch on August 02, 2019, 10:35:08 pm
Also, you can explore sub processes of processes to see what is all linked. Expand the process in task manager or view it in process hacker.


This may be the case. But still, if different sub-processes are running within the same "process" and together they can grow to be very much larger than 4GB, then in effect, each sub-process has its own "image" so we are talking about multiple images now and multiple 4GB images can reside within the same 64-bit address space at the same time. This is something that Windows-32 or Linux-32 could not do because all they had was a 32-bit address space. But still, an instruction that is executing within one of the images can't directly reach a resource that is located within another image using one of the standard addressing modes that is defined in the Mod/RM byte of the instruction. The offsets are only 8 or 32-bits and there are no 64-bit offsets in any of the addressing modes. But you can load a 64-bit value into a register and then use it as an address but this is not a 64-bit addressing mode.

It would be like one guy says "You can't kill anyone with an unloaded gun". Then to prove that he can, he takes the empty gun and hits someone over the head with it and kills him. So you can kill someone with an empty gun. You just can shoot someone with it to kill him. These conversations are all about how you say things and what you mean by saying them. I think in some ways we are defining our words differently. For example, the kind of "linking" you are talking about may be different than what I meant by the word. Sometimes in computer science, the same words are used for very different things. When they talk about "linked-lists" they are not talking about what a "linker" does. These are talking about completely different things.

So it is no wonder that we sometimes disagree because we don't realize we are talking about different things in the first place or define our words to mean different things. To me a "hardware switch" is a flip-flop where you write a "1" into it and the way it controls other hardware is changed. To you, it might be like a command-line "switch" using terminology used in command line interpreters that somehow configure things behind the scenes so that the software works differently somehow.


Uhmm. AMD64 was released 4 years earlier in 2004 (why the article is about 64bit and 32bit OSes; and R&D finished in 2001), I've had AMD64 since 2004 (with the new 2005 model HP Media Center, as the AMD64 allowed unrivaled performance at the time). I'm not a Intel guy. Haven't had one I wanted since my Toshiba in 97 with 233mhz MMX. The other article is maintained, as of 2014.

And yes, processes are not hardware, but their hardware specific hardware flags are for specific use of hardware functionality...

Can you provide any concrete evidence of this limitation outside of .NET software?
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 03, 2019, 01:54:37 pm
Quote from: WASasquatch on August 03, 2019, 12:36:24 pm
Can you provide any concrete evidence of this limitation outside of .NET software?


This is taken from the AMD64 specification Volume 3 in the discussion on the MOV instruction
which is the only instruction that can even do a 64-bit access. And these instructions can only
do them using register RAX. So any memory access using any of the other instruction or any
other register is limited to a 32-bit reach.

This is the quote:
"Opcodes A0-A3, in 64-bit mode, are the only cases that support a 64-bit offset value.
(In all other cases, offsets and displacements are a maximum of 32 bits.) The B8 through
BF (B8 +rq) opcodes, in 64-bit mode, are the only cases that support a 64-bit immediate value
(in all other cases, immediate values are a maximum of 32 bits)."

The reason why code with only 32-bit offsets can operate anywhere within 64-bit address
space is because these addresses are added to the bases of the segments that are associated
with the registers doing the access. These 64-bit bases are managed by the operating
system and are not directly handled by the application.

The following are the instruction encodings as taken from the same section about MOV in volume 3.
Keep in mind that they don't all apply to 64-bit mode:

MOV AL, moffset8 A0 Move 8-bit data at a specified memory offset to the AL register.
MOV AX, moffset16 A1 Move 16-bit data at a specified memory offset to the AX register.
MOV EAX, moffset32 A1 Move 32-bit data at a specified memory offset to the EAX register.
MOV RAX, moffset64 A1 Move 64-bit data at a specified memory offset to the RAX register.
MOV moffset8, AL A2 Move the contents of the AL register to an 8-bit memory offset.
MOV moffset16, AX A3 Move the contents of the AX register to a 16-bit memory offset.
MOV moffset32, EAX A3 Move the contents of the EAX register to a 32-bit memory offset.
MOV moffset64, RAX A3 Move the contents of the RAX register to a 64-bit memory offset.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 03, 2019, 03:45:31 pm
And how does this relate to AMD64 2GB limit of address space? I'm still confused what you're explicitly talking about. What you started this topic about doesn't seem to be an issue anywhere in development space that I can find besides .NET, which is not a AMD64 related issue.

The problem you introduce seems to nullify the point of LAA

For reference, here is Volume 3 (.p226-227): https://www.amd.com/system/files/TechDocs/24594.pdf
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 03, 2019, 09:11:14 pm
Quote from: WASasquatch on August 03, 2019, 03:45:31 pmAnd how does this relate to AMD64 2GB limit of address space?

Are you familiar with WOW, what it means and what it is for? (and I don't mean World of Warcraft)
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 03, 2019, 10:01:04 pm
Quote from: PabloMack on August 03, 2019, 09:11:14 pm
Quote from: WASasquatch on August 03, 2019, 03:45:31 pmAnd how does this relate to AMD64 2GB limit of address space?

Are you familiar with WOW, what it means and what it is for? (and I don't mean World of Warcraft)


You mean WoW64? I don't know of any WOW. I don't follow. WoW64 is layer for 32bit processes on a 64bit OS.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 04, 2019, 08:31:07 am
Quote from: WASasquatch on August 03, 2019, 10:01:04 pmYou mean WoW64? I don't know of any WOW. I don't follow. WoW64 is layer for 32bit processes on a 64bit OS.

Yes. That's what I mean. Thank you for the correction. This is probably something you already understand. It is a way for a 64-bit OS to provide an environment that appears to be identical to the only one that a 32-bit OS can provide for a program to run. A 32-bit process can't see outside of this address space because that is all there is. The supervisor that manages such a process, though, runs in a 64b-bit mode so that, from the perspective of the running process, it appears to be the same as though it were using a 32-bit OS. 

When the same program is compiled and linked to run in a 64-bit address space, it's segment bases can place it anywhere within a much larger address space. According to the video that I posted earlier about OpenVMS, only 48-bits of the 64-bits are supported. But this is still a much larger address space than the 32-bit environment provides. And even though it is a 64-bit process, the instruction set of the AMD64 is still limited to the same set of addressing modes though there are a very few more. But what is available to one that is running in 64-bit mode is not enough to make it a full 64-bit program within a 64-bit process. The standard addressing modes available to any program using the AMD64 instruction set still limits it to only see within the 32-bit window where it resides. It is like taking your house that is on the ground floor, and hoisting it up into a high-rise so that it is turned into a condominium. Many of such houses can now share a large address space. When you walk around your own house, it is still the same size, even though it is no longer sitting on the ground by itself. But just because your house is sharing a larger address space with other houses, it doesn't make your house any larger on its own. Any one person within his house still only has the same reach when using the addressing that can be used by the linker. Beyond this, though, there are a few computational tools to reach anywhere within the full 64-bit address space. But this must be done after it is linked and loaded (i.e. during run-time). It would be like a guy who can go visit someone else's condominium by using the stair wells. When he does, he can no longer reach the things that are in his house because he didn't grow longer arms. But he can now reach the things in the neighbor's house. With a true 64-bit program, the guy would be able to reach the kitchen cupboards in his neighbor's house to borrow a cup of sugar without leaving his own condominium. The whole condo high-rise is like the 64-bit process, but the size of his house and length of his arms are like the 32-bit program running inside it.

So I guess I am saying that, on an AMD64/Intel64 machine, there is no such thing as a 64-bit program, only 32-bit programs that run as part of a 64-bit process. Can you accept that statement?
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 04, 2019, 12:42:29 pm
I'm still confused. Your explanations seem reasonable, but they fall apart outside the forum, where none of the issues are even recorded. I still cannot find anything on this limitation, and a single 64bit process (program) can address 4GB of address space (btw, which is virtual before physical why it's crucial to really match your physical despite arguments here) on AMD64/Intel64. This is just how it is, so there must be something you're not understanding correctly.

Again, the LAA hack for even 32bit programs, allows this same SINGLE process use of up to 4GB addressable VIRTUAL memory (RAM cache is different).

The only thing I found regarding a limitation on size due to "48bits" is PAE and 32bit processes. PAE is riddled with problems and should be avoided and buried.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 04, 2019, 02:49:28 pm
WASasquatch, I mean no disrespect. If you were an avid AMD64 programmer at the assembly level, you could look at what I showed you in the AMD64 specification and you would easily come to the same conclusion I did. It is like a mathematician who doesn't need to ask someone else for an equation or find it in a table in a book somewhere. He knows enough to derive the equation from scratch by knowing the rules. Almost nothing happens in Windows or Linux without the CPU executing instructions. And so it is by understanding the instructions and their addressing modes you can surmise their limitations. I doubt that many people on gaming forums understand enough to do this for themselves. They need to search the web to see how many people conclude about some issue by consensus. This is not understanding.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 04, 2019, 02:56:14 pm
Quote from: PabloMack on August 04, 2019, 02:49:28 pmWASasquatch, I mean no disrespect. If you were an avid AMD64 programmer at the assembly level, you could look at what I showed you in the AMD64 specification and you would easily come to the same conclusion I did. It is like a mathematician who doesn't need to ask someone else for an equation or find it in a table in a book somewhere. He knows enough to derive the equation from scratch by knowing the rules. Almost nothing happens in Windows or Linux without the CPU executing instructions. And so it is by understanding the instructions and their addressing modes you can surmise their limitations. I doubt that many people on gaming forums understand enough to do this for themselves. They need to search the web to see how many people conclude about some issue by consensus. This is not understanding.


I mean no disrespect either, and I think you're misunderstanding the specifications in some fashion. As this isn't discussed ANYWHERE else, and you'd think programmers would be talking about it when building their software to universally run on 64bit [Windows] systems.

Beyond you now indexed on Google, I can't find anything from developers.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 04, 2019, 03:03:53 pm
Which brings us back to your original post and Terragen and stuff, and how it really relates.

Also still confused by calling AMD64 64bit really 64 when it's kinda common knowledge AMD64 is a true specification over Intel, where it's an adaptation spec.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 04, 2019, 03:34:56 pm
Quote from: WASasquatch on August 04, 2019, 03:03:53 pmAlso still confused by calling AMD64 64bit really 64 when it's kinda common knowledge AMD64 is a true specification over Intel, where it's an adaptation spec.

Can you rephrase this for me? It is not clear what you mean.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 04, 2019, 03:41:12 pm
You said AMD64 is not really 64bit, but a hack, but AMD64, through R&D has always been considered a true implementation of 64bit specification, where Intel modifiying NetBurst as an adaptation to the market. For example like I mentioned early PAE with 64bit for 32bit processes is a mess from Intel.

Additionally, the specifications you linked in Volume 3 shows 3 dedicated 64bit ops? with the exception of the others.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 04, 2019, 03:56:08 pm
I'm curious, you're a assembly developer? Couldn't you just demonstrate this 2GB limitation with some 4GB asset in a single process calling for 4GB call and attempting to cache it safely in RAM?
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 04, 2019, 03:58:12 pm
I never used the word "hack". It's not like me.

Take a look at the following which is taken directly out of microsoft's PE32/PE32+ specification. They extended PE32 so that programs could be run in a 64-bit address space. Notice that on the line "SizeOfCode", the size is 4 and is unchanged from PE32. So tell me how many bits is in 4 bytes and what the address range (i.e. "limit") would be for an offset of this size.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 04, 2019, 04:20:07 pm
I'll reiterate some past posts.


For example, like I mentioned before, with Morrowind, LAA extension allows 4K gameplay with hundreds of mods, making calls above 2GB which is the vanilla limitation. So if these calls were limited to 2GB on a single process, it wouldn't really matter and Morrowind would be fine with all mods and calls going through virtual memory -- but if you run Morrowind without the executable modified with the flag, it will crash with due to calls larger than it can handle.

Which is a great example of the same process running, even 32bit, Large Address Aware, able to bypass this as a single process.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 04, 2019, 04:24:22 pm
Just answer my questions and think for yourself.

1. How many bits is in 4 bytes?
2. What would be the address range (i.e. "limit") for an offset of this size?

Title: Re: Address space Limitations of the AMD64/Intel64
Post by: WAS on August 04, 2019, 04:29:48 pm
Quote from: PabloMack on August 04, 2019, 04:24:22 pmJust answer my questions and think for yourself.

1. How many bits is in 4 bytes?
2. What would be the address range (i.e. "limit") for an offset of this size?



Again, I'm not really interested in specifications you may be interpreting poorly, and that mean nothing to me, but actual working code (literally for over a decade in a lot of instances) that makes these single process calls, and these single process calls even documented in specification too.

You came into the topic entirely misunderstanding Windows Task Manager, for one, which is very disconcerting, and have yet to even explain the ramifications clearly, let alone demonstrate it as a assembly programmer (?). Plenty you can use to demonstrate it, and can easily fabricate a dummy asset to use with Windows/Linux in a single command. I'm sure you could practically demonstrate a real world 2GB limitation over the established 3GB/4GB extensions (and prove it without a doubt).

At this point I'm not really even interested in the conversation.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 04, 2019, 04:57:15 pm
I don't know why you won't answer two very simple questions so I'm not interested in the conversation either.

What makes a programmer and an end user fundamentally different is that a programmer has to be able to understand what does not yet exist. In order to do this, he has to understand what the capabilities are for the tools he is going to use to build it. So specs are a lot more important to me than to you because without them, I am blind and what I want to make will never be made. I can't just try this and try that. I will never get very far if I depend on that. As an end user, seeing a working program is believing. The proof that something can be done is seeing someone who has already done it. But you will not understand how it was done so you just have to guess, take someone's word for it or try to get a consensus among all of the end user observers.  That kind of evidence is worth very little to me so that's why I have just brushed it off.

In your Terragen work, you don't just copy what someone else has done. You want to make images of things that have never been seen before. At least you can see that much.

On a positive note, I have learned a couple of things in all of this. In the past I have used Task Manager mostly for killing programs that were hung in infinite loops and for other reasons like to see what is running. But now I realize that the memory (Private Working Set) means that it accounts for all of the memory used by the process, not just the image. I would say that "entirely misunderstanding Windows Task Manager" is an overstatement.

After I provided you with "evidence" that you asked for and you then said you were not interested in it. It took me a while to realize that evidence was not what you wanted. You just wanted corroboration and I can't corroborate my own findings. We do define our words differently.
Title: Re: Address space Limitations of the AMD64/Intel64
Post by: PabloMack on August 07, 2019, 05:24:49 pm
Quote from: WASasquatch on August 04, 2019, 03:56:08 pmI'm curious, you're a assembly developer? Couldn't you just demonstrate this 2GB limitation with some 4GB asset in a single process calling for 4GB call and attempting to cache it safely in RAM?
The program you suggest won't even get far enough to run because the image is so large that it will fail to link. But, just for grins, I decided to do a test just for you.  I created a test assembly program and put in 1000 instances of a line that creates 4MB of data (each):

      jmp Here
      db 1024*1024*4 dup('w') ;1000 times
      ...
Here:

The program ostensibly tries to jump over this large block to make sure that the offset will be a span that is that large. But just as I suspected, it failed to link because the image is too large. So it will not run because it will not load and it will not load because it will not link.