Find string entry in another string list

kolonuk

Member
Hi All,

I haven't posted for a while, so thought I'd tax your brains a little today (as well as test my luck!)...

I have two comma separated strings that I need to compare. Assume one string is a list of item composition attributes (ie, wood, plastic, metal, glass, etc). The other list is a list of attributes I want to search for (items containing glass AND plastic).

The result I would like is for it to work out if both "glass" and "plastic" are in the list, but not only just those two. For example, to find matches with "plastic,glass":

Code:
Item 1    "plastic"   FALSE
Item 2   "wood"   FALSE
Item 3   "metal,glass,paper,plastic"   TRUE
Item 4   "glass,plastic"   TRUE

Obviously, this can simply be done with separate tables in a FOR EACH statement, a simple DO loop, or some combination of multiple LOOKUP, MATCHES, or CAN-DO statements, but for reasons of complexity can't be used in that way. Also, this will sit inside multiple FOR EACH and FIND WHERE clauses, so a user defined function doesn't work (which would be a really nice answer!)

MATCHES logically comes close, but with it not using REGEX, I can't think of a way it can be used in this case. And I know it will screw any indexes, but other obscure search requirements already do that.

I can explain how my code works, but it would take a long post of a handful of long paragraphs, so I haven't; suffice to say it's not urgent, and I'm not actually expecting something like that to exist, but just thought I'd ask...

Thanks in advance.
 

TheMadDBA

Active Member
Change the commas to a space and define a word index on that field... use CONTAINS.

Otherwise you are pretty much out of luck for indexing anyways.
 

Cringer

ProgressTalk.com Moderator
Staff member
Can't you just stash the fields into temp tables dynamically and then join them?
 

kolonuk

Member
@TheMadDBA - I'm just reading through the Progress documentation about CONTAINS, and it seems like that will actually do the job - Thanks!. We've never used it before, and I can immediately think of a few places where it might come in handy and maybe speed things up. I've just had a look at word-indexes, and get how to create them, but have no idea what impact it will have on the DB - does it make the indexes absolutely huge, does it effect anything else? (don't answer, it's rhetorical, I am about to test on our dev DB!)

@Cringer - I'm not quite sure what you mean. I could load both the item table and attribute table into one temp table, but wont that take ages to fill? I guess that the lookup would be faster, but again not sure how I would match my comma separated string to another comma separated string...
 

GregTomkins

Active Member
1) Your post implies that you're going to read 100% of the relevant DB records no matter what due to "other obscure search requirements", and that may be OK if you know the # of records involved will always be reasonable compared to the time available to process the query. If this is the case, then a simple pair of nested DO loops and a 'found' variable seems pretty trivial.

2) If this is not true, then a word index (possibly along with a DO loop or two) seems like it would do the trick. We have never used these and I'd be a little nervous of doing so, but I'm sure they can be made to work.
 

TheMadDBA

Active Member
As always your mileage may vary but I have had great luck with word indexes for troublesome/free form searches. Index size/overhead is of course relative to how much data you have and how often it changes. But you should use word indexes as a last resort for most things since the overhead is larger than traditional indexes.

Also read up on the documentation and look out for characters that Progress considers word breaks or special in some way and adjust your data as needed.
 

GregTomkins

Active Member
Just out of curiosity ... are Progress word indices updated real-time, same as regular indices?

You'd think that would be a given ... but I ask because back in my MS-SQL days, this was not true ... they had word indices but they had to be loaded independent of regular INSERT operations, and updating them was kind of like an index build in Progress. They've probably changed this by now.
 

TomBascom

Curmudgeon
A tidbit of advice -- create word indexes in a dedicated storage area. There are certain index rebuild utility features that cannot be used if a word index is in the same area as a "normal" index.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
A tidbit of advice -- create word indexes in a dedicated storage area. There are certain index rebuild utility features that cannot be used if a word index is in the same area as a "normal" index.

I believe the constraint is on the location of the table with the word index, rather than the word index itself. Rules for multi-threading of data scan/key build phase of idxbuild via -datascanthreads:
  • —Index rebuild run “with sort” option;
  • —No index being rebuilt exists in table area being scanned;
  • —No word indexes are being rebuilt for table data being scanned;
  • —Data area being scanned is a Type II storage area.
So tables that have word indexes should be segregated to their own areas, away from other tables.
 

TomBascom

Curmudgeon
Rob is right. It is the *data* area that must be distinct.

The details from kbase 000030594:

-datascanthreads <n>: the number of concurrent threads used to scan table data associated with the indexes being rebuilt. When specified, -datascanthreads invokes a multi-threaded data scan that spawns <n> threads for the data scan phase. The data being scanned, must meet all of the following criteria for a multi-threaded data scan to occur:
  • The data area being scanned must be a Type II area
  • No index within the data area being scanned is also being rebuilt
  • No index associated with the data area being scanned is a word index
  • Index rebuild must be run with the "sort" option, that is answering "y" when asked if you have enough room for sorting.
If any requirement is not met, the data scan is performed using the original single threaded mechanism. If not specified, the -datascanthreads value defaults to 0, also indicating the original single threaded mechanism.

You can confirm that the data scan phase is multi-threaded by observing the following messages during the data scan phase of index rebuild:
Processing area 689 : (11463)
Start 4 threads for the area. (14536)
Area 689: Multi-threaded record scan (Type II) complete. Elapsed time: 6.740
 

kolonuk

Member
Thanks all for the replies. I've been doing some experimentation, and it turns out you can't use word indexes/CONTAINS in FIND statements...

Also, you can only create an index on the search field, so doesn't include any other fields I need.

Back to the drawing board.
 

TheMadDBA

Active Member
You just have to use FOR nstead of FIND... an added benefit of using FOR is being able to use multiple indexes for certain queries/indexes.

A lot of times when I had to use a word index I also just put the key fields in the same field with a prefix... like ACCT1234 so I could find all entries for Acct# 1234 very quickly.
 

TomBascom

Curmudgeon
Or you could OPEN QUERY and use GET...

"mutliple indexes" often sounds like it would be good but, in practice, it doesn't usually work out. It is definitely not a panacea and you most definitely should not dump all of your composite indexes and replace them with lost of single-component indexes (some people though that might be a good idea 15 or 20 years ago and it still comes up now and then...)
 
Top