Detecting and extracting a section of text and calculating

ScottM · Post by **ScottM** » Sun Nov 07, 2010 10:35 am

Hello everyone,
I am new to RevMedia, creating my first program, and finding it difficult to find the appropiate command/s to search for a line from within a text document for a predefined set of text then read a set amount of characters then do a calculation. Take the following line:

<Sample21131>IX=21131 TC=147 IN=1 TT=L16:03:12:00 TB=00000000 RF=2 RW=1 RA=1 HW=0 IS=PAL OS=625 LA=114 LP=229 DN=0 DL=0 MH=0 DW=1 DA=1 CP=1 UP=153 UM=79 UA=121 VP=164 VM=104 VA=127 NR=1 MD=1 A1=1282 A2=1296 P1=2981 P2=3016 S1=0 S2=0 A3=0 A4=0 P3=0 P4=0 S3=1 S4=1 SV=1 CC=0</Sample21131>

This is what I would like to do within RevMedia:
After scanning the document for <Sample####> (the sample number is incremental within the file in this case its 21131) I need to then find the text TT=L within the line and extract the number following - 16:03:12:00, then extract the : between the numbers and then preform a calcualtion on the number.

I am not having much success trying to learn how to do this so hopefully someone can help me out.

Kind regards,
Scott

jmburnod · Post by **jmburnod** » Sun Nov 07, 2010 11:41 am

Hi Scott,
Welcome
i'm a poor frenchy and i'm very happy when i can answer before the english staff

if i understand what you want, you can try this stack

The script of btn "searchMyTime" :

Code: Select all

on mouseUp
   put fld "fMydata" into bufT
   put "TT=L" into bufS
   put the num of chars of bufS into nbc
   put offset(bufS,bufT) into UnOff
   put unoff+nbc into depPosTime 
   put "0123456789:" into TheCharOK
   repeat with i = depPosTime to depPosTime+100
      put char i of bufT into UnC
      if UnC is in TheCharOK then
        put i into EndPosTime
         next repeat
      else
         exit repeat
      end if
   end repeat
   put char depPosTime to EndPosTime of bufT into MyTime
   put MyTime into fld "fMyTime"
end mouseUp

All the best

Jean-Marc

bn · Post by bn » Sun Nov 07, 2010 1:08 pm

Hi Scott,

welcome to the forum.

I made a little stack with a lot of comments that should give you some ideas.

Look up in the dictionary what you dont know or ask here.
the stack has the .livecode suffix. If you change that into .rev or you open that file from within RevMedia it will work.

best regards

Bernd

Edit: is just see that Jean-Marc has posted a script. Two is better than one

ScottM · Post by **ScottM** » Mon Nov 08, 2010 2:25 am

Thank you Jean-Marc and Bernd you help is much appreciated.

I have used various forms of basic about ten years ago and only just started to get back in to developing software. I need to get my head around the scripting language and syntax of RevMedia. The manual helps but I find learning from others is a lot quicker so you may find me frequently posting as I continue to learn.

Kind Regards,
Scott

ScottM · Post by **ScottM** » Mon Nov 08, 2010 11:02 am

Hi Bernd and Jean-Marc,

I have another problem. The files I need to extract the information from can average 90mb in size and is from a XML with average 28000 lines of text. Bernd your example is great but when I copy and paste a complete XML in to the 'origData' field RevMedia hangs when trying to extract the values.

I have manged to allow the user to load the file as a whole into a field but what I would like to do is read a line of text from a file then repeately do the extraction of the TT value until it reaches the end of file eliminating the need to load all the information in to memory. Is this possible?

Kind regards,
Scott

bn · Post by bn » Mon Nov 08, 2010 11:11 am

Marc,
could you post a couple of lines representative of the data?

You can access the file line by line. But this is slow. Even the repeat structure I used in the example is slow. There is another one 'repeat for each line aLine in tData' that is a lot faster. And I guess you could load 90 MB into Ram.

Once we figure out why the script hangs I would post a faster handler, the one I posted is easier to understand. And you would have to indicate in what form you want the extracted data, do you want a list of it?

regards

Bernd

ScottM · Post by **ScottM** » Sat Nov 13, 2010 7:12 am

Hi,

Sorry for the late reply. The XML is large therefore here is the link to download the zipped file from YouSendIt. The link will be valid for 1 week.

https://www.yousendit.com/download/dkly ... VG52Wmc9PQ

As for the data I would like to query the time code of each sample within the file and check if it is sequentail. If it is not then it would be considered a break in time and also counting any duplicate time code. The result would be returning the amount of time code breaks and duplicates, if any, from within the XML and generating a text document listing the break and duplicates points.

I am currently teaching myself about chunks in RevMedia. Is this the right path to take to do the above?

Kind regards,
Scott

bn · Post by bn » Mon Nov 15, 2010 12:18 pm

Scott,

I downloaded your data and made a small stack to extract the occurenc of multiple samples with the same timecode. Have a look and see the comments in the script. It works on your test data but does little error checking. So if your data is not always exactly the same structure you might have to adapt/error check.

As for the increment in the timecode: there is some inconsistency in your data I dont understand:

<Sample8829>IX=8829 TC=133 IN=1 TT=L15:54:59:23 TB=00000000
<Sample8830>IX=8830 TC=134 IN=1 TT=L15:54:59:24 TB=00000000
<Sample8831>IX=8831 TC=135 IN=1 TT=L15:54:00:00 TB=00000000
<Sample8832>IX=8832 TC=136 IN=1 TT=L15:55:00:01 TB=00000000
<Sample8833>IX=8833 TC=137 IN=1 TT=L15:55:00:02 TB=00000000
<Sample8834>IX=8834 TC=138 IN=1 TT=L15:55:00:03 TB=00000000

if you look at the data you will notice that the increment from seconds to minutes is not consistent for the first entry for the new minute (15:55), it takes the old minute and resets that to zero (15:54:00:00) instead of 15:55:00:00
I am not shure if that is the way you want your data or if this is an error in the time stamping. This happens for all the increments in minutes and also affects the increment in hours:

<Sample16329>IX=16329 TC=209 IN=1 TT=L15:59:59:23 TB=00000000
<Sample16330>IX=16330 TC=210 IN=1 TT=L15:59:59:24 TB=00000000
<Sample16331>IX=16331 TC=211 IN=1 TT=L15:59:00:00 TB=00000000
<Sample16332>IX=16332 TC=212 IN=1 TT=L16:00:00:01 TB=00000000
<Sample16333>IX=16333 TC=213 IN=1 TT=L16:00:00:02 TB=00000000

That is one of the reasons why I did not attempt to look for the breaks, because I am not shure about this. It would throw an error,but I dont know if it is supposed to be this way. What happens if you have more than 24 Hours?

regards

Bernd

ScottM · Post by **ScottM** » Mon Nov 15, 2010 2:09 pm

Wow Bernd,

Thank you and very much appreciated. I am amazed how quickly it loaded and produced the list. I have written a simular program in thinBasic, wrote it in 1 day, that checks for duplicates and time code errors. It is very, very slow. It would of taken me a very long time to work this out and do it in RevMedia though I am learning. I have thinBasic create a text file with the structure of the results as follows.

271591 = 14:18:30:00
272091 = 14:18:00:00
272268 = 14:19:07:01 (Duplicate TC)
272269 = 14:19:07:03
273591 = 14:19:50:00
274091 = 14:20:10:00
274309 = 14:20:28:17 (Duplicate TC)
274310 = 14:20:28:19
274841 = 14:20:40:00
275091 = 14:20:50:00
275341 = 14:21:00:00
276591 = 14:21:00:00
277027 = 14:22:17:10 (Duplicate TC)
277028 = 14:22:17:12
277824 = 14:22:49:07 (Duplicate TC)
277825 = 14:22:49:09
277841 = 14:22:40:00
278091 = 14:22:50:00
278341 = 14:23:00:00
278841 = 14:23:20:00
279591 = 14:23:50:00
280091 = 14:24:10:00
280261 = 14:24:26:19 (Duplicate TC)
280262 = 14:24:26:21
280341 = 14:24:20:00
280479 = 14:24:35:12 (Duplicate TC)
280480 = 14:24:35:14
280868 = 14:24:51:01 (Duplicate TC)
280869 = 14:24:51:03
281032 = 14:24:57:15 (Duplicate TC)
281033 = 14:24:57:17
281091 = 14:24:00:00
281158 = 14:25:02:16 (Duplicate TC)
281159 = 14:25:02:16 (Duplicate TC)
281160 = 14:25:02:16 (Duplicate TC)
281161 = 14:25:02:16 (Duplicate TC)
281162 = 14:25:02:15
281163 = 14:25:02:15 (Duplicate TC)
281164 = 14:25:02:15 (Duplicate TC)
281165 = 14:25:02:15 (Duplicate TC)
281166 = 14:25:02:15 (Duplicate TC)
281167 = 14:25:02:15 (Duplicate TC)

TC Break Count = 739
Duplicate TC Count = 289
Total TC Issues = 1028

The first number is the sample number taken from the XML. Second number is the timecode listing time code breaks (any time code that is not sequential, like what you are seeing) and duplicates. Then tags the text file with the amount of time code breaks, duplicates and then a total of time code issues.

I am grateful for you support Bernd and learning very fast from your efforts. Was really amazed how quickly RevMedia performed the task. I will be purchasing complete version of livecode by the end of the week.

Kind Regards,
Scott

bn · Post by bn » Thu Nov 18, 2010 10:31 pm

Scott,

I added the calculation of the time code. The format is not what you want but it puts everything into the field.
No guarantees though. I did test the code against a small sample with known problems. But it is up to you to make shure it is working correctly.
Also to adapt the format to your liking.

But with the comments and reading up on Livecode you should fairly soon be able to do that.

good luck, and dont hesitate to ask when you get stuck.

regards

Bernd

Edit Oh, dont be disappointed, now it takes 2,5 seconds to take your 90 MB, 300.000 plus lines apart.

ScottM · Post by **ScottM** » Thu Nov 25, 2010 9:25 am

Lol Bernd, 2.5 seconds, so sloooow

Thank you for your help, I have learnt a lot in a short time.

Kind Regards,
Scott

LiveCode Forums.

Detecting and extracting a section of text and calculating

Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating

Re: Detecting and extracting a section of text and calculating