User Tools

Site Tools


struct_implementation

Goal

store struct member data as hardware RAM modules.

For example in dhrystone with union removed: %struct.record = type { %struct.record *, i32, i32, i32, [31 x i8], i32, [31 x i8], i8, i8 }

Contains: i8, i32, array of i8 and struct pointer (i32)

Alternatives

1: Smallest member addressable RAM

84 x i8 (84 bytes)

Pros:

  • Writing does not require a read

Cons:

  • Reading, writing i32 takes 4 cycles

2: Largest member addressable RAM

21 x i32 (84 bytes)

Pros:

  • Reading any member takes one cycle

Cons:

  • For i8, writing requires a read so that data is not overwritten
  • Extra space may be required
  • May read unnecessary data, but this data may be used as a cache

3: Entirely addressable RAM

1 x i672 (84 bytes)

Pros:

  • Struct copying is fast

Cons:

  • Writing always requires a read so that data is not overwritten

4: Split into groups by addressability

5 x i32, 64 x i8 (84 bytes)

Pros:

  • Reading and writing any member takes one cycle
  • Different blocks can be read/written in parallel

Cons:

  • Code complexity
    • Need to keep track of each member's RAM module and offset
    • Must manage each block separately

5. One RAM per member

i32, i32, i32, i32, i32, 31xi8, 31xi8, i8, i8 (84 bytes)

Pros:

  • Reading and writing any member takes one cycle
  • Simple to code

Cons:

  • Too many RAMs, allocation restrictions

6. Word-addressable RAM

21 x i32 (84 bytes)

Pros:

  • Structs are already word-aligned, so this makes memcpy, memset much easier
  • Easy to implement

Cons:

  • Need to read before write possibly
  • May waste space

7. Word-addressable RAM x2 (if word-address % 2 = 0, store in ram 1, otherwise store in ram 2)

11 x i32 + 10 x i32 (84 bytes)

Pros:

  • Structs are already word-aligned, so this makes memcpy, memset easier
  • memcpy and memset can be performed at twice the speed, copying to both rams in the same cycle
  • With two RAM's, i64 can be read/written in one cycle

Cons:

  • Pointer dereferencing is more complicated for i64 since it must load/store from both RAM's at the same time
  • Pointer dereferencing otherwise must pick which RAM to read/write from

8. Smallest member addressable RAM x largest member size / smallest member size

21 x i8, 21 x i8, 21 x i8, 21 x i8 (84 bytes)

Pros:

  • All data types can be read/written in one cycle
  • Similar storage geometry to default word alignment
  • memcpy and memset can be performed faster depending on number of RAM's

Cons:

  • memcpy and memset will have to be modified to work
  • Pointer dereferencing otherwise must pick which RAM to read/write from
  • Uses more RAM's
  • More complex to program

Other

Arrays of Structs:

  • Treat like multi-dimensional arrays

Arrays of Structs, Structs of Arrays, Structs of Structs…:

  • Must be done recursively or with a stack

Struct Alignment:

  • Structs are declared as align 8 by llvm-gcc, but seem to be internally align 4, so a char proceeded by an int takes up 8 bytes (3 unused). char, int, char takes up 12 bytes, however char, char, int uses 8.

Instructions To Implement

alloca: alloca %struct.conglomerate, align 8

getelementptr: getelementptr inbounds %struct.conglomerate* %r, i32 0, i32 3

memcpy: call void @llvm.memcpy.i32(i8* %r22, i8* %r1, i32 124, i32 8)

memset: call void @llvm.memset.i32(i8* %r1, i8 0, i32 124, i32 8)

load: load i32* %5, align 4

store: store i8 97, i8* %r1, align 8

bitcast: bitcast %struct.record* %1 to i8*

ptrtoint: %78 = ptrtoint %struct.record* %77 to i32

alloca

Find space required for the struct, figure out alignment size

getelementptr

Make sure pointer returned is aligned with the same size as the struct

memcpy/memset

memcpy/memset to the struct alignment

load

May take more than one cycle, depending on size

store

May need to read prior, may need more than one cycle to store, depending on size

bitcast

May not need to modify

ptrtoint

May not need to modify

struct_implementation.txt · Last modified: 2010/12/15 15:53 (external edit)