postgresql - Does its performance critical to define low cardinality column as int and not text -


i have column 4 options. column define text. table big table 100 millions of record , keep going. table use report table. index on table - provider_id,date,enum_field.

i wonder if should change enum_filed text int , how performance critical.

using postgres 9.1

table: provider_report: id bigserial not null, provider_id bigint, date timestamp without time zone, enum_field character varying, ....

index: provider_id,date,enum_field

tl;dr version: worrying not worth time.

long version:

there enum type in postgres:

create type myenum enum('foo', 'bar'); 

there pros , cons related using vs varchar or integer field. pros imho.

in terms of size, it's stored oid, int32 type. makes smaller varchar populated typical values (e.g. 'draft', 'published', 'pending', 'completed', whatever enum about), , same size int type. if you've few values, smallint / int16 admittedly smaller. of performance change come there (smaller vs larger field, i.e. negligible).

validation possible in each case, through built-in catalog lookup enum, or check constraint or foreign key varchar or int. of performance change come there, , it'll not worth time either.

another benefit of enum type, is ordered. in above example, 'foo'::myenum < 'bar'::myenum', making possible order enumcol. achieve same using varchar or int, you'll need separate table sortidx column or something... in case, enum can yield enormous benefit if ever want order enum's values. brings (imho) gotcha, related how enum type stored in catalog...

internally, each enum's value carries oid, , latter stored as is within table. it's technically int32. when create enum type, values stored in correct order within catalog. in above example, 'foo' have oid lower 'bar'. makes efficient postgres order enum's value, since amounts sorting int32 values.

when alter enum, however, may end in situation change order. instance, imagine alter above enum in such way myenum ('foo', 'baz', 'bar'). reasons tied efficiency, postgres not assign new oid existing values , rewrite tables use them, let alone invalidate cached query plans use them. instead, populate separate field in the pg_catalog, make yield correct sort order. point forward, ordering enum field requires lookup, de facto amounts joining table separate values table carries sortidx field -- varchar or int if ever wanted sort them.

this fine , acceptable. occasionally, it's not. when not there solution: alter tables enum type, , change values varchar. locate , adjust functions , triggers make use of do. drop type entirely, , recreate fresh oid values. , alter tables were, , readjust functions , triggers. not trivial, feasible.


Comments