Asked — Edited
Resolved Resolved by WBS00001!

Reading Through An Rss File To Get Specific Data

I have been able to get and save the attached file to my hard drive. I can read it but I can not figure out how to pull just the data I want. In this case it is a stock quote for VZ. I want to just pull out the symbol and stock price.

I would appreciate any examples of how to accomplish that.

Thank You


ARC Pro

Upgrade to ARC Pro

With ARC Pro, your robot is not just a machine; it's your creative partner in the journey of technological exploration.

PRO
USA
#17  

No I don't think all the indexes (although someone else might want them all. I would like DJIA, NASDAQ, and NYA. Let me try attaching one more time.NewData.zip

#18  

While doing the revisions I found more characters that the script language doesn't like so I was forced to revise the HTMLScrub program yet again. So I'm afraid you will have to overwrite the existing one with the one I have attached to this post. At least this time though it will be compatible with the previous script. I tested it with many sites and the ARC took all the modified results with no problem so maybe this will be the last revision because of that.

Also I have attached the new script for the new web site. It contains extensive revisions so you may want to just use it and change the appropriate lines again. These are:


$VZFile ="C:\VZ.txt"
$ModVZFile ="C:\ModVZ.txt"
  
  #The full path and name of the HTML scrubber program.
$ScrubProg ="F:\HTML_ScrubD7\HTMLScrub.exe" 

Change the paths to where the files are on your computer.

There is a new variable:


$IndexesToGet ="DJIA,COMP,NYA" #Note: COMP is NASDAQ

It contains the indexes you mentioned. If you want to add more just add them to the list with a comma in between each. The names placed in the list will, however, need to be the ones in the actual web site code as in the example shown (COMP is NASDAQ). They also need to be the same case as in the web code. Looks like they are all uppercase.

Basically the script works by looping through the $IndexesToGet string, extracting each name in turn (plus a prefix specific to the area of code we are looking for) and searching for that name in the modified web code. Once found, the lines which follow are short and simple to get the data from. The outer repeatwhile extracts the names of the indexes to find, while the inner repeatwhile loop actually finds it and processes the data.

Let me know if you have problems or don't understand something.

HTMLScrubFiles.zip

PRO
USA
#19  

I used the script in the zip file, modified all the location variables. but the htmlscrub is not creating the vzmod file. I searched my entire hard drive just to be sure I didn't misplace it, but is is not anywhere. Thoughts?

PRO
USA
#20  

never mind. figured it out. I removed the config file, added it back. it all works. will test and see if any issues.

#22  

I was running some tests on the Yahoo web site before posting a response and found out it is nearly a mega-byte in size. That makes for a huge file to process. In addition, it has tons of characters which must be scrubbed for the Script Language to accept it. It takes 2 to 3 minutes to scrub the whole thing, depending on the speed of the computer used. In this case particularly, that is unnecessary since we only need a small portion of the code at the beginning. By comparison, the Marketwatch site was only about 70K total in size.

I decided the easiest way to address this problem would be to to place a limit on the file size processed. This is done by setting a number in the HTMLScrub.ini file. Currently this number will be set to 100K but it can be changed to whatever is desired by changing the number in the ini file. What it does is take up to 100K bytes of whatever web site is used. If you need it to take more, you can change the number in the ini file. Just keep in mind the more it has to take in, the longer it will take to scrub it for use by ARC.

When you run the new HTMLScrub.exe program for the first time, it will automatically add those two lines to the ini file so you don't have to do anything in that regard.

I will address your question in the next post.

The new HTMLScrub.exe file is attached to this post.

HTMLScrub.zip

PRO
USA
#23  

I have a question before I use the new htmlscrub file, will it cause a problem with the older version? The older version is working like a champ and I have tweaked the script and it does exactly what I want. Let me know your thoughts before I download and install.

#24  

It should be fine. The web site that relates to the current script is well under 100K in size. Of course it's always wise to save the older executable file just in case, but I can't think of anything this change would affect with what you have now.