REBOL

REBOL3 - Profiling (Rebol code optimisation and algorithm comparisons. [web-public])

Return to Index Page
Most recent messages (300 max) are listed first.

#UserMessageDate
459Maximyeah, its much easier to read. I only use first/second in R2 because their faster.27-Jan-11 21:03
458Steevewhen I have the choice, indeed27-Jan-11 21:00
457SteeveI don't use first, second, third .... anymore in R3.27-Jan-11 20:59
456Maximbtw the first version is exaclty the same speed as in R2, on my system.27-Jan-11 20:07
455Maximso, claims that the path access is faster in R3 seems to be true across the board27-Jan-11 20:07
454Maximjust tried it... A100

in R3 they *are* roughly equivalent :-)

in fact in R3 the second version is ~ 10% FASTER !

27-Jan-11 20:05
453Henrikthat's very interesting for DRAW blocks. I wonder if R3 has the same issue.27-Jan-11 19:59
452Maximin my mind it would have been roughly equivalent.27-Jan-11 19:56
451Maximall path access is slow...but I woudn't have thought that using a series function and multiplying both values in the pair would be twice as fast!27-Jan-11 19:56
450Henrikso, pair refinements are slow?27-Jan-11 19:54
449Maximsome things are stunningly non-obvious when it comes time to optimize REBOL

>> a: 1x1 b: 10x10 s: now/precise loop 1000000 [second b - a] difference now/precise s == 0:00:00.5

>> a: 1x1 b: 10x10 s: now/precise loop 1000000 [b/y - a/y] difference now/precise s == 0:00:00.969

27-Jan-11 19:50
448SteeveAs you wish Brianh4-Jun-10 23:22
447BrianHSteeve, could you submit your CureCode #574 DP function to DevBase? Or can I? Good work, and it uses well features introduced to R3 since DP was originally written. We would do well to have this function :)4-Jun-10 21:56
446Gabriele(btw, i guess Linux is doing timezone conversion in the time system calls, while osx is not, and that would explain the huge difference in speed.)3-Jun-10 9:30
445GabrieleMax: scratch your theory, this is Linux running on an iMac 24"3-Jun-10 9:29
444Gabriele>> time-block [] 0,05 == 1.11503601074219E-8 >> time-block [] 0,05 == 1.11885070800781E-8 >> time-block [] 0,05 == 1.11618041992188E-8 >> time-block [] 0,05 == 1.12886428833008E-83-Jun-10 9:26
443Gabriele>> time-block [now/precise] 0,05 == 2.43203125E-5 >> time-block [now/precise] 0,05 == 1.526171875E-5 >> time-block [now/precise] 0,05 == 2.44140625E-5 >> time-block [now/precise] 0,05 == 2.42890625E-5 >> time-block [now/precise] 0,05 == 2.4203125E-5 >> time-block [now/precise] 0,05 == 2.4265625E-53-Jun-10 9:25
442Davide>> time-block [now/precise] 0,05 == 2.328125E-62-Jun-10 15:27
441PeterWoodFrom a quick browse of the Virtual Box User Manual, a VM uses the host systems "time stamp counter" by default. There are a number of options to change the "guest clock".

It's section 9.10 in the Virtual Box User Manual if you want to take a look.

2-Jun-10 15:07
440Ladislavfunny2-Jun-10 14:46
439Ladislavso, the time granularity of XP running in VirtualBox looks smaller, than when running natively2-Jun-10 14:46
438Ladislav>> time-block [now/precise] 0,05 == 2.11906433105469E-62-Jun-10 14:45
437Ladislav>> time-block [] 0,05 == 7.97957181930542E-92-Jun-10 14:44
436LadislavPeter, I obtain the same value when running XP in the VirtualBox:

>> include %timblk.r == 9.93798449612403E-3

2-Jun-10 14:43
435MaximIts also possible that MS compilers are better at optimisation on intel CPUs than Apple.2-Jun-10 14:11
434PeterWoodMax: As I understand, there is at least a 10 per cent performance penalty for running Windows under a VM rather than directly (using Bootcamp).2-Jun-10 14:09
433PeterWoodI was less surprised that XP(VM) performed time-block [ ] 0,05 about 1.6 times faster than under OS X (Native) as I suspect that Carl is more likely to have better optimised Rebol on Windows than other systems.2-Jun-10 14:08
432Maxim(i.e. not using a VM)2-Jun-10 14:07
431Maximso... does an XP running on a OSX VM run faster than the same on a PC board? ;-)2-Jun-10 14:07
430Maximcould be that the motherboard and I/O chips are better integrated on OSX which means any access to hardware has less latency.2-Jun-10 14:06
429PeterWoodThat certainly doesn't suggest that OS X system calls are faster than Windows ones.2-Jun-10 14:04
428PeterWoodWhat initially surprised me was that there did not appear to be a significant difference bewteen OS X (native) and XP(VM) with time-block[now/precise] 0,052-Jun-10 14:03
427Ladislavyou do not have to run the script, it is the

tick-time: time-tick 0.05

expression

2-Jun-10 13:58
426PeterWoodI've just run the timblk script another 5 times each time the result is ~9.937E-32-Jun-10 13:57
425Ladislavhmm, surprising, normally you would obtain 15.5 milliseconds as the first value...2-Jun-10 13:54
424PeterWoodXP SP32-Jun-10 13:53
423LadislavNevertheless, according to the results, your XP is not SP2 or later, I would say2-Jun-10 13:53
422PeterWoodThe times for time-block [] and [now/precise] are quite consistent. The time for timblk.r improved to around 9.93E-3. Windows may still have been completing its start-up when I ran the very first test.2-Jun-10 13:52
421LadislavI think, that I read somewhere, that system clocks in virtual machines may be problematic...2-Jun-10 13:50
420PeterWoodThat could well be the case - let me run them again2-Jun-10 13:49
419Ladislavhmm, this looks, that the "system clock" are quite inexact in your VirtualBox, I guess2-Jun-10 13:47
418PeterWoodOut of interest, I ran the same tests in Windows/XP running under Virtual Box:

>> do http://www.fm.tul.cz/~ladislav/rebol/timblk.r == 1.00330578512397E-2 >> time-block [now/precise] 0,05 == 7.59124755859375E-7 >> time-block [ ] 0,05 == 7.57351517677307E-9

2-Jun-10 13:43
417LadislavNevertheless, taking into account, that the clock resolution is better, it looks like the OSX actually does more (useful) work than Windows 72-Jun-10 13:38
416Maximchitty = shitty... hehhe2-Jun-10 13:34
415Maximmy mac mini is a super chitty system HW, but its still faster at editing video than PCs four times as capable.2-Jun-10 13:34
414MaximI wouldn't be surprised that just about all system calls are faster on OSX2-Jun-10 13:33
413Ladislavsure2-Jun-10 13:33
412PeterWoodCouldn't that be influenced by the underlying system calls2-Jun-10 13:33
411LadislavYes, interesting, so, even though your processor is a bit slower than mine, the NOW/PRECISE evaluation is significantly faster2-Jun-10 13:32
410PeterWood>> time-block [ ] 0,05 == 1.251220703125E-82-Jun-10 13:31
409Ladislavso, I should definitely have it patented ;-)2-Jun-10 13:28
408Ladislav:-)2-Jun-10 13:28
407Maximok makes sense.2-Jun-10 13:26
406Ladislavbecause we have two measurements, the previous one using two times less cycles, i.e. possibly having twice the influence of the resolution2-Jun-10 13:25
405MaximI'm curious as to why the resolution is multiplied by 3?2-Jun-10 13:24
404Ladislavadds the relative error possibly caused by clock resolution to the figure to make sure we achieve the desired accuracy2-Jun-10 13:21
403Maximwhat does this do in the time-block while condition?

(3 * tick-time / time)

2-Jun-10 13:20
402Ladislav, so, it looks, that the time the system spends in NOW/PRECISE is nonnegligible2-Jun-10 13:16
401LadislavYes, certainly faster2-Jun-10 13:15
400Rebolek>> time-block [] 0.05 == 1.45263671875E-82-Jun-10 13:14
399LadislavMy system is Windows 7 Business 64-bit, AMD Athlon X2 250 at 3GHz2-Jun-10 13:14
398Ladislavfinding out it is so obvious, I should probably patent the idea, shouldn't I?2-Jun-10 13:11
397Ladislav(the "trick" is pretty simple, everybody knows how to do that, check the source)2-Jun-10 13:10
396LadislavMax: "how so?" - I suppose, that you do see, that the TIME-BLOCK function is able to measure evaluation time even if it is shorter than the system clock resolution?2-Jun-10 13:09
395LadislavPeter, how about this one?

>> time-block [] 0,05 == 7.80820846557617E-9

2-Jun-10 13:07
394LadislavOf course, the evaluation time of NOW/TIME may be operating system dependent, but it is still interpretation time (of the above expression)2-Jun-10 13:05
393Maximhow so?2-Jun-10 13:04
392Ladislavwrong2-Jun-10 13:04
391Maximif OSX has ten times the resolution, it will give you the impression that its ten times faster.2-Jun-10 13:04
390Ladislavso what?2-Jun-10 13:03
389Maximand how do you measure the execution? by accessing some form of timing or time information.2-Jun-10 13:03
388Ladislavthis does not have anything in common with resolution, this is about interpretation speed, Max2-Jun-10 13:02
387MaximI wouldn't be surprised if OSX had an order of magnitude better resolution in all things related to time and its measurement.2-Jun-10 13:02
386PeterWood>> time-block [now/precise] 0,05 == 7.1484375E-7

OSX, Core2Duo@2.4GHz

2-Jun-10 13:00
385Ladislavyes, your values correspond well to each other, the loop body calls now/precise and uses a comparison and IF2-Jun-10 12:56
384Rebolek>> do http://www.fm.tul.cz/~ladislav/rebol/timblk.r connecting to: www.fm.tul.cz Script: "Time-block" (1-Jun-2010/13:40:09+2:00) Warning: clock tick too short, the result is the loop body execution time! == 1.78988326848249E-62-Jun-10 12:55
383Rebolek>> time-block [now/precise] 0,05 == 9.0625E-7 OSX, Core2Duo@2Ghz2-Jun-10 12:55
382Ladislavsorry, disregard the last three lines above, please (wrong edit)2-Jun-10 12:51
381LadislavI tell you what is so surprising about the results:

Since the result is the loop body execution time, it looks, that the loop body in Peter's computer works ten times faster. Gabriele, Davide, and Peter, could you please post here your result of the following expression?:

>> time-block [now/precise] 0,05 == 2.3651123046875E-6

it looks that your processor works faster, Peter: your result is the loop body execution time, which is about ten times shorter, than the loop body execution time Ga

2-Jun-10 12:50
380PeterWoodSo Gabriele's processor is faster than mine.2-Jun-10 10:23
379GabrieleCPU[-Dual core Intel Core2 Duo E8135 (SMP) clocked at 2660.000 Mhz-] Kernel[-2.6.28-18-generic x86_64-] Up[-20 min-] Mem[-808.7/3615.9MB-] HDD[-640.1GB(9.1% used)-] Procs[-171-] Client[-Shell-] inxi[-1.1.13-]2-Jun-10 9:30
378PeterWoodThis is the result of speed.r on my machine under Rebol 2.7.5

Console: 0:00:00.150098 - 3372 KC/S Processor: 0:00:00.249708 - 3460 RHz (REBOL-Hertz) Memory: 0:00:00.468639 - 101 MB/S Disk/File: 0:00:00.37127 - 82 MB/S

2-Jun-10 8:11
377PeterWoodThis entry at Wikipedia seems to suggest that is it OS dependent - http://en.wikipedia.org/wiki/System_time#Operating_systems2-Jun-10 8:09
376RebolekIsn't it OS-dependent? BSD&OSX value seems to be about 2E-6, while Linux is around 2E-5 .2-Jun-10 6:28
375LadislavWell, Peter, that is possible, but it looks, that your processor is about ten times faster than the one Gabriele or Andreas use2-Jun-10 5:22
374DavideOpenBSD.i386

REBOL/Core 2.7.7.9.4 (5-Apr-2010) >> do http://www.fm.tul.cz/~ladislav/rebol/timblk.r connecting to: www.fm.tul.cz Script: "Time-block" (1-Jun-2010/13:40:09+2:00) Warning: clock tick too short, the result is the loop body execution time! == 2.94573643410853E-6

1-Jun-10 14:11
373PeterWoodLadislav - It's a 2.5 year old MacBookPro with a 2.4GHz Core 2 Duo - I think there are much faster processors around these days.1-Jun-10 12:44
372Gabrielethis is Linux Mint 7 64bit1-Jun-10 9:53
371GabrieleREBOL/Core 2.7.6.4.2 (14-Mar-2008) >> do http://www.fm.tul.cz/~ladislav/rebol/timblk.r connecting to: www.fm.tul.cz Script: "Time-block" (31-May-2010/11:01:46+2:00) Warning: clock tick too short, the result is the loop body execution time! == 2.48837209302326E-51-Jun-10 9:51
370Ladislavbut, generally, it looks, that Windows may well be the only operating system currently, that still has a detectable resolution time of operating system clock1-Jun-10 5:13
369LadislavPeter - hmm, better resolution, than 2 microseconds, that is quite fast machine1-Jun-10 5:08
368IzkataScript: "Time-block" (31-May-2010/11:01:46+2:00) Warning: clock tick too short, the result is the loop body execution time! == 3.37364341085271E-5

Rebol 2.7.6 on Ubuntu Hardy 64-bit

1-Jun-10 1:26
367PeterWoodScript: "Time-block" (31-May-2010/11:01:46+2:00) Warning: clock tick too short, the result is the loop body execution time! == 2.09338521400778E-6

Rebol 2.7.5 on Mac OS X 10.64 bit

31-May-10 23:04
366Ladislavso, we can say, that the clock resolution in Linux is better than 55 microseconds31-May-10 22:35
365Ladislavnevertheless, it means, that you may well use this value as the clock granularity for the profiling purposes, since the actual granularity is even better31-May-10 22:33
364Ladislavthanks, Andreas, it does not mean, that the result is useless, actually, it just means, that the resolution is finer than Rebol is able to measure31-May-10 22:31
363Maximhence the use of word "timing" ;-P

adding timing resolution within the "passage of time" resolution would be arguably sufficient for me even if it meant we are ~0.1 ms off, as long as we get a constant increment of milliseconds (or better) to use.

right now the time resolution is sooo poor (on windows) its often frustrating.

31-May-10 22:05
362Andreas>> do http://www.fm.tul.cz/~ladislav/rebol/timblk.r Warning: clock tick too short, the result is the loop body execution time! == 5.53774319066148E-5

R2 2.7.7 on Linux, 64b (even though I guess the warning means the result is useless :)

31-May-10 22:05
361Ladislavyes, exactly31-May-10 21:59
360Maximwell there are different precisions ;-)

resolution and measure. what we are measuring here is the resolution. no?

31-May-10 21:59
359Ladislav"precise timing" - that is exactly the opposite meaning than what I am using. Example:

clocks usually display time in seconds, using second as the smallest unit. That does not mean, clocks are imprecise, they may actually be more precise, than a counter counting every microsecond, e.g., but with a 0.1 % error, meaning, that in one day, such a counter may be e.g. one and a half minute slow in one day

31-May-10 21:56
358Maximeven in R2, when you look at an event! object, the timing counter within is much more precise than now/precise... which is why I can use mouse move events to check time elapsed much more precisely than the mediocre 'time event generating, OS timer which /view is using.31-May-10 21:55
357MaximWhat I think too...

but if R3 can get precise timing info, it could arguably add this precise elapsed period since last *time* check and add it to the result when now/precise is called in a script.

just an idea... thinking out loud.

31-May-10 21:50
356Ladislav"...its strange that they do not also apply..." - nothing strange, it is a different kind of counter, not the one used to get the system time (that might mean, that it actually is less precise, e.g.)31-May-10 21:46
355Maximwrt "get the time"... I meant to say:

"get timing information"

31-May-10 21:44
354MaximAlthough I can only guess, I think the issue lies in that the actual os calls being used do not provide greater precision.

In R3 there are other means to get the time which do provide much greater precision, so its strange that they do not also apply to now/precise.

31-May-10 21:43
353Ladislavbut, certainly, it is possible to update the time every time the operating system is asked to provide the time. That way, the time can be updated as frequently, as the program is able to ask. I guess, that in Linux it works that way, any results?31-May-10 21:40
352Ladislavwell, this is not about "time calculation", it is about frequency of time update31-May-10 21:37
351Maximor whatever lib the core uses to calculate time31-May-10 21:36
350Maximit could have been that they use different timer libs.31-May-10 21:35
349Ladislav"The same result with R2 and R3" - of course, this value is a property of the operating system31-May-10 21:35
348Ladislavhmm, I just checked a Vista Business 64 bit (not updated for quite some time), and it used 15.5 milliseconds too...31-May-10 21:34
347SteeveThis test shows well why some graphical demos work bad on some systems even if the computers are faster than mine (Vista with a tiny celeron).31-May-10 18:41
346SteeveBtw, got the same results with R2 and R331-May-10 18:23
345GreggSorry, yes, SP2.31-May-10 18:18
344Ladislavaha, OK, the difference is still just 1.12%, i.e. it is in the tolerance31-May-10 18:16
343Paulmine is on 32 bit Vista running AMD.31-May-10 18:14
342Ladislavhmm, Paul's value is quite strange, taking into account, that Steeve measured it to be 1 ms31-May-10 18:13
341Ladislavthanks, I assume, Gregg, that your value is either for XP x64 SP2 or SP3?31-May-10 18:11
340GreggXP x64 == 1.54678899082569E-231-May-10 17:54
339Paul9.8876404494382E-4 Vista31-May-10 14:58
338Oldessorry, it's already in the file., And it's XP SP331-May-10 14:18
337OldesR2, XP ---> 1.54857142857143E-231-May-10 14:17
336LadislavHi all, I adjusted the http://www.fm.tul.cz/~ladislav/rebol/timblk.r file to reflect Steeve's measurement, but would like to obtain more results (results for other operating systems). If you have the results that are not already in the file, please, let me know.31-May-10 9:52
335Ladislavthanks, Steeve, that looks like 1 millisecond, taking into account the given 5% precision27-May-10 22:13
334SteeveVista --> 0.00099082568807339527-May-10 19:59
333LadislavAs opposed to that, I propose a more precise http://www.fm.tul.cz/~ladislav/rebol/timblk.r27-May-10 19:46
332OldesWhat do you mean by random values? I can see some differencies, but not so random. The function I use for timing is: tm: func [count [integer!] code [block!] /local t][t: now/time/precise loop count code probe now/time/precise - t]27-May-10 15:26
331PeterWoodI would be very suspicious if the timings where the same. All machines these days are running multiple processes so your script's access to the processor will differ each time you run it.

Rebol itslef may well be inconsistent as the garbage collector will run at differnet points in you tests.

27-May-10 14:02
330Terryone odd thing i find with timing in general is why do i get random values each time i run a script? shouldn't it always be the same?27-May-10 13:58
329LadislavWell, since your timings did not detect, that hash intersects are *much* faster than block intersects, this is a proof, that your timings do have very little in common with the speed of intersect27-May-10 13:54
328TerryMy timings were for a script, where intersecting was one part. Whether intersecting block or hash made no noticeable difference.27-May-10 13:47
327OldesI agree: >> b1: make hash! [1 2 3 4] b2: make hash! [2 3 4 5 6] >> tm 100000 [intersect b1 b2] == 0:00:00.313 >> b1: make block! [1 2 3 4] b2: make block! [2 3 4 5 6] >> tm 100000 [intersect b1 b2] == 0:00:00.76627-May-10 10:14
326LadislavTerry's informations are quite inexact. Intersection of hashes is *much* faster, than intersection of blocks, exactly as I expected.27-May-10 9:58
325TerryIn my comparisons with Redis, I was getting 7200 GETS (pulling the value of a key) per second from a DB with 100,000 key/value pairs.. With this rebol db, i can do 1,000,000 GETS in 0.64 seconds, from a db with 1,000,000 key/value pairs.27-May-10 6:03
324SteeveThere should be a ratio where the iterative method is faster than intersect, but some tests have to be done. ratio = (lenght? bigest-block) / (length? smallest-block)27-May-10 5:08
323Terryin that same 1 million objects, i have an object with fname = "Bob" .. i can pull that out in 0.0014 secs.27-May-10 5:06
322Terryi have a block with 1 million sub blocks representing objects, and they each have a random 'age value between 1 - 100.. i can pull the indexes of all the blocks where age = 42 in 0.7 seconds.. that's the slowest query i have.27-May-10 4:57
321Ladislavnevertheless, hashing can certainly be spared, so, doing Intersect "by hand" can be faster, in my opinion27-May-10 4:51
320Ladislav"the time is the same" - then, it looks, that the Intersect native is not "clever enough", doing an unnecessary operation.27-May-10 4:49
319Terryhowever, I think I've finally created a rocket db27-May-10 4:48
318Terryi just tried some comparisons and the time is the same27-May-10 4:47
317Ladislavbut, I did not do any speed comparisons, so it is up to you, Terry27-May-10 4:46
316LadislavWhen using Intersect on blocks, you should take into account, that their contents are hashed first. Therefore, using hashes instead of blocks, the hashing operation itself may be spared.27-May-10 4:44
315Ashleyintersect is a native so the only way you can get faster performance is if you know something about the blocks contents. For example, if both blocks contain sorted integers *and* one block is smaller than the other *and* the bigger block contains the first value of the smaller then the following code *may* be faster than intersect:

a: [2 3 4] b: [1 2 3 4 5] x: last b b: find b first a r: copy [] foreach i a [ all [i > x break] all [find b i append r i] ] b: first b

27-May-10 1:43
314Ladislavmy suggestion is to try it on hashes26-May-10 22:12
313Ladislav(if you want speed)26-May-10 22:07
312LadislavYou should not use Intersect on blocks26-May-10 22:07
311TerryAny thoughts on speeding up intersect when comparing large blocks?26-May-10 17:15
310Terry(I prefer the term "things" over "objects" as triples don't usually contain methods, just properties)22-May-10 18:51
309TerryThe problem with the other method was the overall cost of looking up properties .. if all you needed was the single property, (ie all things that are age = 75) then that was fine, but usually you want more info. So if you found 5000 things that match the query, you'd need to run the whole process 5000 times to get all their email addresses.

It just makes more practical sense to keep things as things, along with all of their properties.

22-May-10 18:50
308TerrySteeve, back to your method with some changes.. Some early benchmarks on this new method 3,000,000 objects

subjects: [ [age 42 fname "Steeve"][ fname "Tweety" age 75] [url "http://rebol.com"] ... ]

3,000,000 key / value pairs that index the subjects block

i1: [ "Steeve" 1 "Tweety" 2 "Rebol" 3 ... ]

then..

theindex: select i1 "Rebol" out: pick subjects theindex

I can do 1,000,000 of these in 1.3 seconds or one 769, 230 / s (older box with 2gb ram)

then finish the other two triples index (preds vals)

i2: [age [ 1 2] fname [1 2] url [ 3]]

i3 should have it's values converted to md5s (as suggested)

i3: [ #{12CE1E73A3EABF1623970A0C53B9CA1F} [3] ]

22-May-10 18:43
307Terryhttp://sociopathways.files.wordpress.com/2009/12/hash-browns.jpg21-May-10 15:33
306GrahamYou could make a complete hash of it.21-May-10 4:01
305TerryIt's becoming apparent that these large datasets need to be reduced when it comes to FINDing strings

imagine: ["one" "two" "three" "four" "five"... "one hundred million"]

should be

o: ["one" "one hundred million"] t; ["two" "three"] f: ["four" "five"]

or even

fo: ["four" "fox" "food"] fi: ["five" "fifty" "fiver niner"]

pull out the first twh chars and there's your search block

21-May-10 2:49
304Maximforeach is alway slower than find/part so whatever algorithm you use, replace it with that in its loop and it will be even better.21-May-10 0:11
303TerryIt ties in with my old AtomDB developments, but uses some of the methods we went over.. AtomDB used foreach to crawl.. fine for smaller datasets but too slow on large data20-May-10 23:02
302TerryOK, take all the noise above and throw it away.. i have the perfect solution (how's that for a tease?)20-May-10 22:59
301TerryThere's a fundamental flaw with this.. let say i find 3000 customers, but now i want to pull their email property foreach customers customer [ (pseudo) intersect the customer with i1 and email i2) once is fine.. but 3000 loops is really slow

probably need to store the properties as objects

data: [ "bob" [ email "bob@example.com" lname "Jones"] ...]

then continue to index email, lname etc

20-May-10 19:26
300Terrypulling a single unique email address (out of 429) took 0.013 so the time seems somewhat fixed around 0.12 seconds regardless of the density20-May-10 18:44
299Terrycrawling with foreach to get the same result is .09 or about 9 times slower20-May-10 18:37
298Terryoops.. decimal error.

.012 , not .0012

20-May-10 18:35
297TerryWithin that data (real world) i have 3,733 cable triples ("dfuflfi33s", "isa" "cable") ("dfuflfi33s" isa a unique 10 byte ID that i stick on everything)

Using the code posted earlier, i can pull a final block of indexes for each of those cables in 0.0012 seconds

20-May-10 18:26
296Terry1 GB RAM = 2,143,219 triples and their indexes (approx)20-May-10 18:20
295Terry(Rebol 2)20-May-10 18:17
294TerryMax: Indexing 244,327 triples took 4:20 on a rather snappy windows box with 6Gb RAM DB is consuming 114 Mb Ram20-May-10 18:16
293Terry**Izkata beats his chest, challenging all comers ** :)20-May-10 18:01
292Izkatayou're*20-May-10 17:58
291IzkataWell, if your doing it that way, "append/only blk reduce [a b c]" should be faster - no call to 'array

May as well also have "blk: make block! (length? orig) / 3" - don't remember how much of a speedup that was in your previous tests, though

20-May-10 17:57
290TerryIzkata.. here's another option.. not sure how efficient it is..

orig: ["one" "two" "three" "four" "five" "six"] blk: [] foreach [ a b c ] orig [ d: array/initial [1] reduce [a b c] append blk d]

20-May-10 17:45
289Terryprobably a faster way, but I'll let the gurus beat their chests over it :)20-May-10 17:35
288Terryindexing shouldn't take too long.. one foreach pass against the data20-May-10 17:34
287TerryMax, building the indexing functions now20-May-10 17:33
286SteeveLadislav, no prob with a map!20-May-10 16:46
285Maximterry, any benchmark results on a dense search with a few million records?

also: how long is generating the index on that data

and how much RAM is your triple index consuming?

20-May-10 16:19
284LadislavJust BTW, I doubt, that INTERSECT is the fastest way how to intersect two hashes, might need some profiling too.20-May-10 7:25
283TerryIt makes more sense for larger values.. nice when you can use a single symbol to represent a 1GB binary

Anyway, thanks everyone for the help.

20-May-10 7:24
282TerryIn my old AtomDB i made in Rebol some time ago, I had an index for each value.. ie: the number "42" was stored only once, and used the index with the inference..

values: [ "42" "November" "Baseball"..]

You could have a million birthdays recorded in "November", with a single element

billbdaymonth: 2 frankbdaymonth: 2 frankage: 1 the answer to life the universe and everything: 1

20-May-10 7:20
281Ladislav...but you are right, that there is always a way around20-May-10 7:14
280Terryer.. wait, there's no ambiguation.. http://en.wikipedia.org/wiki/Single_Source_of_Truth20-May-10 7:12
279Terryyeah.. that could remove some ambiguation20-May-10 7:11
278LadislavYes, sure, as I said, it is practical mainly in case, you want to have simple and fast triple delete operation20-May-10 7:09
277TerryI have thought of a fourth ID for a triple before.. but it represents the triple itself. .. rarely needed it though. .there's always a way around20-May-10 7:08
276TerryI'd tend to null it out instead, and reuse it later20-May-10 7:07
275Ladislavactually, that would be costly20-May-10 7:06
274TerryIf you remove a triple, you'd need to reindex everything, no?20-May-10 7:06
273Ladislav"how..." - this way, you will have a triple ID that does not change even if other triples are removed. That is not true for your index.20-May-10 7:05
272Terryhow does the 4th make it more practical?20-May-10 7:03
271Ladislav"Why not just use the index of the first part of the triple?" - yes, can be done that way, but if you need to remove the triples, you find out, that it may be practical20-May-10 7:03
270Terryso to determine what Sylvester is..

s: i1/("Sylvester") >> [4] p: i2/("isa") >> [1 4]

do an intesect to get "4" then the value = index ( 4) + 2

20-May-10 7:02
269Terry"To make it practical for other operations too, the triple store should be defined so, that:

1) every triple is stored as a fourtuple, the additional field being a unique triple ID..."

Why not just use the index of the first part of the triple?

ie: ["tweety" "isa" "canary" "Sylvester" "isa" "cat"...]

so 1 and 4 are the two triples( i1) and the next two are i2 and i3

i1: [ "Tweety" [1] "Sylvester" [4][ i2: ["isa" [1 4] i3: ["canary" [ 1] "cat" [4]]

20-May-10 6:59
268Terrythat's all the data a logic for a medium sized company stored on a floppy.20-May-10 6:49
267TerryAnd, this data is tight.. the 250,000 real world triples (manages a multi-million dollar electrical equipment rental biz, including 1500 contacts, invoicing, and 10,000s of items) is only 9mb... zips to 1.3mb20-May-10 6:36
266Terryanother advantage of a flat triple store is for simple key/value lookups ie: if you have a html element <div id="20984"> where the contents of the div is "pick ie 20984" doesn't get any faster than that20-May-10 6:25
265Terryi1 indexes / 3 is always decimal .33333 i2 indexes / 3 is always decimal .6666 i3 indexes / 3 is integer20-May-10 6:16
264Terrywith maybe the exception of pattern matching?20-May-10 6:14
263Terryyeah, i thought of that, but once the inference is there, not sure there's a need20-May-10 6:14
262Steeve... with find20-May-10 6:12
261SteeveBesides you don't need to store triples as blocks (they can be flattened) in the triple store, then you'll just need to do some math to calculate the index of a triple. The advantage is that you could still use the triple store do some slow searchings.20-May-10 6:12
260IzkataIt is? I've only used it for relatively small data, so never had a problem..20-May-10 6:12
259TerryI can hear my HD revving up20-May-10 6:11
258Terryjust realizing that now :)20-May-10 6:10
257Ladislavhmm, the Stagger function defined this way is O(N ** 2), i.e. not advisable for large data20-May-10 6:10
256IzkataAFAIK, R2 doesn't have a built in function for it (at least, I never ran across it)20-May-10 6:08
255Terrycool, thanks20-May-10 6:08
254IzkataFor R2, I wrote a function called Stagger for that: Stagger: func [L I /local][ forall L [ L/1: copy/part L I remove/part next L (I - 1) ] return L ] >> Stagger ["2c3cukne98" "femaleend" "E1016 Camlok" "2c3cukne98" "isa" "cable" "2c3cukne98" "uid" "8558"] 3 == [["2c3cukne98" "femaleend" "E1016 Camlok"] ["2c3cukne98" "isa" "cable"] ["2c3cukne98" "uid" "8558"]]20-May-10 6:07
253LadislavTo make it practical for other operations too, the triple store should be defined so, that:

1) every triple is stored as a fourtuple, the additional field being a unique triple ID 2) create the main index using the triple ID, 3) create three field indices searchable by by the field contents, and "inside" also by the main ID

This way, the triple storage will be moderately fast for triple removal as well

20-May-10 6:06
252Steevemap-each20-May-10 6:01
251Terryremold make-block reduce enlarge.. or somethin.. ?20-May-10 6:01
250Terrymy rebol is so rusty.. how do i turn ["2c3cukne98" "femaleend" "E1016 Camlok" "2c3cukne98" "isa" "cable" "2c3cukne98" "uid" "8558"... ]

into [["2c3cukne98" "femaleend" "E1016 Camlok"][ "2c3cukne98" "isa" "cable" ]["2c3cukne98" "uid" "8558"]...

20-May-10 6:00
249Terryyeah.. im just building some benchmarks with real data now20-May-10 5:58
248TerryThis finds all "Steve isa... "

If i wanted to find all "Canaries"..

p: i2/("isa") v: i3/("Canary")

result intersect p v

20-May-10 5:58
247Steeveand i1,i2,i3 should be of type hash! or map!20-May-10 5:57
246SteeveYeah !!! You got it :)20-May-10 5:56
245Terryie: [["Steeve" "isa" "person"]["Steeve" "age" "42"]["Tweety" "isa" "Canary"]["Tweety" "age" "75"]["Chirp" "isa" "Canary"]] i1: ["Steeve" [1 2] "Tweety" [3 4] "Chirp" [5]] i2: ["isa" [1 3 5] "age" [2 4]] i3: ["person" [1] "42" [2] "Canary" [3 5]]

rocket: func [it][ st: now/precise

while[it > 0][ s: i1/("Steeve") p: i2/("isa")

res1: intersect s p it: it - 1 ] prin join " -> " difference now/precise st ]

20-May-10 5:55
244TerryHere's what I did..20-May-10 5:55
243SteeveYeah !!! :)20-May-10 5:52
242LadislavMax, Steeve proposed the datastructure so, that the above feach operations become elementary ("instantaneous").20-May-10 5:51
241SteeveI would use map! (or hash!) as indexes and a key would contain one or several triples (inside a block) verb/"age": [ index indexn ...]20-May-10 5:49
240Maximsteeve, REBOL doesn't support path with strings, and furthermore, it would only return the first index, if you used it within a paren.

so I'd really like you to give a small snippet of code with your idea, I am curious about your method... cause I don't see *how* what you say works.

19-May-10 23:49
239TerryI knew someone would comback with a function.. might as well have been you Gregg. I like PHP's array_merge()19-May-10 23:32
238GreggTerry, I think INTERSECT is fine the way it is, and it's easy to wrap if you want.

fold: func [ series [series!] fn [any-function!] /local value ][ value: pick series 1 foreach item next series [value: fn value item] ]

intersect-all: func [ series [block!] "A block of series values to intersect" ][ fold series :intersect ]

19-May-10 21:46
237TerryWorks great Steeve. I haven't benchmarked it yet, but given the whole thing can sit in hashes or maps! , i would say the performance should be spectacular19-May-10 20:36
236Terryie: intersect [ blk1 blk2 blk3.. ]19-May-10 20:06
235TerryShouldn't Intersect take an block of blocks?19-May-10 20:04
234Pekrwhy :-)19-May-10 19:53
233TerryShould have listened to my mother and became a lawyer.19-May-10 19:53
232Terrynope.. i got it again :)19-May-10 19:52
231Terryhmm, i thought i got it. :( now I'm lost in block hell19-May-10 19:43
230SteeveAnd my comment... "it's remember me the IDE Plex(obsydian) in the nineties. it used widely the concept of triples (tuples) to modelize applications and databases."19-May-10 19:35
229Steevehttp://www.rebol.net/cgi-bin/r3blog.r?view=016119-May-10 19:35
228Terryi remember that19-May-10 19:29
227PekrSenteces from Lazysoft was another product of such kind ...19-May-10 19:29
226Pekryeah, associative stuff :-)19-May-10 19:29
225SunandaRon Everett presented a database that did much of what you want at DevCon2007. The live discussion is here: http://www.rebol.org/aga-display-posts.r?offset=0&post=r3wp500x1919 The video of the presentation may be on qtask.19-May-10 19:28
224Terryi get it.19-May-10 19:27
223SteeveI just think you don't get it, but I know my explanation sucks, sorry.19-May-10 19:27
222Steevethen intersect the 2 blocks19-May-10 19:23
221Steevewith index3/"75", you'll get those with 75 as the target19-May-10 19:22
220Steeveby doing, index2/"age" you'll get all the triple having the verb "age"19-May-10 19:22
219Terryso now i want to query, say , everything that is "age" "75" (like above)19-May-10 19:18
218Steeve(added In index1) "Tweety"= index (added In index2) "Isa"= index (added In index3) "Canary"= index19-May-10 19:16
217Steeve(added In index1) "Tweety"= index (added In index2) "Isa"= index (added In index3) "Canary"= index19-May-10 19:16
216Terry( only in Rebol does ie: actually mean ie: )19-May-10 19:16
215Terry>> ie: ["Tweety" "isa" "canary"] == ["Tweety" "isa" "canary"] >> index? find ie "Tweety" == 1

Great.. now what

19-May-10 19:15
214SteevePerhaps the explanations are not clear, but it's pretty clear in my head ;-)19-May-10 19:14
213Terryyeah, but ...19-May-10 19:13
212Steevethat's all...19-May-10 19:11
211SteeveAnd i use this index as the value for the 3 indexes, and i create the keys+value "Tweety": index "Isa": index "Canary": index19-May-10 19:10
210SteeveThe, I recovers its index from the block (actually it's the last one)19-May-10 19:08
209SteeveFirst i add the triple in the triples store (a simple block)19-May-10 19:07
208TerryI have a system working now that's fast enough.. but I'm a speed junkie.. there must be a BEST way (not better... BEST)19-May-10 19:07
207Terry"Tweety" "age" "75" "Steeve" "isa" "Rebol" "Steeve" "age" "unknown"19-May-10 19:07
206SteeveOk, I give it to you...19-May-10 19:07
205SteeveIt's the problem, I think Terry can't decide :-)19-May-10 19:06
204Terryok.. here's an example..

take this simple rdf triple "Tweety" "isa" "Canary" How would create 3 indexes to manage it and 10,000,000 like it?

19-May-10 19:06
203Steeve*current trial19-May-10 19:05
202SunandaGot to decide what is more important: -- time to build data structure -- time to update it (add/remove on the fly) -- time to search it And build data structures optimized to your priorities. There is no one true solution, just the best match for the situation at hand.19-May-10 19:04
201Steeveyour actual trial to make a lookup with foreach or find+loop is insanly slow by comparison19-May-10 19:03
200Steevebut it will be 100 or 1000 times faster, then to access the data using an index.19-May-10 19:02
199Terrythe other issue is the time it takes to build the checksum vs brute force19-May-10 18:58
198Terryfair enough.. not running bank accounts with this thing19-May-10 18:57
197Maximyou can negate collisions by building two checksums out of different properties of you data and merging them.19-May-10 18:57
196Steevewith an md5 checksum ??? don't be silly :-)19-May-10 18:56
195Terryyeah, but then you risk collisions19-May-10 18:55
194SteeveI already said to you to compute a checksum to build keys from large data, it's built-in in Rebol19-May-10 18:54
193Terry1 GB values as keys don't work very well.19-May-10 18:54
192Terryie: a value might be a large binary ..19-May-10 18:53
191Terryyeah Steeve, im scratching out notes on that now.. it's not quite as simple as it sounds19-May-10 18:53
190Terryi know .. keys can be integers that are indexes of values in map! or hash.19-May-10 18:52
189SteeveWhere's the dilema ? you just have to maintain 3 indexes at the same time (for triples), there isn't any other choice if you looking for speed on readings.19-May-10 18:52
188Maximterry, index? is not a procedure within rebol .. its the same as length? its a stored value which is simply looked up when you call index?

nothing will be as fast as index? its the "getting to" index which consumes cycles

19-May-10 18:51
187TerryI WILL NOT STOP TILL I HAVE A FAST AND SIMPLE TRIPLE STORE! (sleep is my enemy)19-May-10 18:49
186Terryrdf is to xml what War and Peace is to Cat in the Hat -- Triples are working even with Maxim's code above (just not in hashes for more than a query with a single value).. but i crave the speed of index? against large datasets.19-May-10 18:48
185Terryyeah, I've looked a a few19-May-10 18:40
184Andreasor have a look at cassandra and/or monetdb (w/o knowing anything about your intended usage)19-May-10 18:40
183Andreasyou could have a look at the various dedicated triplestores available (even though many of them have a semweb/rdf/... background).19-May-10 18:39
182Andreasgenerall speaking :) ?19-May-10 18:38
181Terrymy dilema is indexing triples in a key/value world19-May-10 18:36
180Maximah yes.... ususally a hash will have to skip over elements which return the same hash key.

so if your table has a few thousand similar items, you aren't benifiting from the hashing... and its quite possible that looking up a hash itself is actually longer when it has to skip over and over (comparing data on top of the hash).

though one could argue, that the speeds should be a bit slower than using a block, not this slower... possibly related to the implementation itself.

19-May-10 17:36
179Ladislav...then you are most probably out of luck trying to use hash! for indexing purposes19-May-10 17:32
178Ladislavhmm, what if the hash is optimized for unique elements?19-May-10 17:31
177Maximabove I said "its rehashing its content... which is strange."

that is a guess... it should say:

its *might* be rehashing its content... which *would be* strange.

19-May-10 17:15
176Terry495 matches against the same 732981 hash takes only .00319-May-10 17:13
175Maximthe example creates several sets of data with different organizations and it compares all of them amongst each other.

so with that script, you should be able to do all the analysis you need.

19-May-10 17:08
174Terry3270 matches19-May-10 17:07
173Maximthe example shows the problem very well.19-May-10 17:07
172Maxim(and a revision to ultimate-find, just after it)19-May-10 17:06
171Ladislavaha, you are using real-world data. OK, then, you should tell me how many matches you see19-May-10 17:06
170Maximladislav, there is a script earlier in this discussion which has a complete working example.19-May-10 17:06
169Terryit's maxim's ultimate-find above ( and im using real world data)19-May-10 17:06
168Maximthe only thing that I'm thinking is that when the hash index changes, its rehashing its content... which is strange.19-May-10 17:05
167Ladislavor, do you use the real-world data?19-May-10 17:05
166Ladislavthat is interesting, can you post your data generator?19-May-10 17:05
165Terryyeah19-May-10 17:05
164Ladislav".033 s, and same run against has is 0.6" - do you mean 0.6s, ie. roughly 18 times slower?19-May-10 17:04
163Terry(against has = against hash)19-May-10 17:02
162TerryYeah, i've tried some actual data finding 3270 matches out of a hash that is 732981 in length.. when it's block the search takes .033 s, and same run against has is 0.6

but if the matches are just a few, hash is 1000x faster

19-May-10 17:02
161LadislavI think, that it is quite natural. You should probably generate some random data having (approximately) similar properties as what you intend to process and try some variant approaches to really find out, which one is best for the task. Do you know, that it is possible to index just a specific record field, i.e. you don't need to make a hash containing all the data from the database?19-May-10 16:53
160TerryLooking at Maxim's ulitmate-find (above, monday 11:32) , does anyone have an idea why when dealing with hash! , the more matches it finds, the slower it gets?19-May-10 16:44

Return to Index Page