Reading Through An Rss File To Get Specific Data

PRO

dbeard

USA

Asked Sep 2015 — Edited Nov 2016

Resolved by WBS00001!

Skip to comments Jump to end

I have been able to get and save the attached file to my hard drive. I can read it but I can not figure out how to pull just the data I want. In this case it is a stock quote for VZ. I want to just pull out the symbol and stock price.

I would appreciate any examples of how to accomplish that.

Thank You

Jump to end

Upgrade to ARC Pro

Discover the limitless potential of robot programming with Synthiam ARC Pro – where innovation and creativity meet seamlessly.

Compare Pro Features View Subscription Plans

dbeard

PRO

USA

#17 Sep 2015

No I don't think all the indexes (although someone else might want them all. I would like DJIA, NASDAQ, and NYA. Let me try attaching one more time.NewData.zip

WBS00001

USA

#18 Sep 2015

While doing the revisions I found more characters that the script language doesn't like so I was forced to revise the HTMLScrub program yet again. So I'm afraid you will have to overwrite the existing one with the one I have attached to this post. At least this time though it will be compatible with the previous script. I tested it with many sites and the ARC took all the modified results with no problem so maybe this will be the last revision because of that.

Also I have attached the new script for the new web site. It contains extensive revisions so you may want to just use it and change the appropriate lines again. These are:


$VZFile =&quot;C:\VZ.txt&quot;
$ModVZFile =&quot;C:\ModVZ.txt&quot;
  
  #The full path and name of the HTML scrubber program.
$ScrubProg =&quot;F:\HTML_ScrubD7\HTMLScrub.exe&quot;

Change the paths to where the files are on your computer.

There is a new variable:


$IndexesToGet =&quot;DJIA,COMP,NYA&quot; #Note: COMP is NASDAQ

It contains the indexes you mentioned. If you want to add more just add them to the list with a comma in between each. The names placed in the list will, however, need to be the ones in the actual web site code as in the example shown (COMP is NASDAQ). They also need to be the same case as in the web code. Looks like they are all uppercase.

Basically the script works by looping through the $IndexesToGet string, extracting each name in turn (plus a prefix specific to the area of code we are looking for) and searching for that name in the modified web code. Once found, the lines which follow are short and simple to get the data from. The outer repeatwhile extracts the names of the indexes to find, while the inner repeatwhile loop actually finds it and processes the data.

Let me know if you have problems or don't understand something.

HTMLScrubFiles.zip

dbeard

PRO

USA

#19 Sep 2015

I used the script in the zip file, modified all the location variables. but the htmlscrub is not creating the vzmod file. I searched my entire hard drive just to be sure I didn't misplace it, but is is not anywhere. Thoughts?

dbeard

PRO

USA

#20 Sep 2015

never mind. figured it out. I removed the config file, added it back. it all works. will test and see if any issues.

dbeard

PRO

USA

#21 Sep 2015

another question. At this link http://finance.yahoo.com/news/stock-market-briefing-com-182401296.html#

Are reqular stock market updates. Would it be possible to set up a script that would check and if a new update issued, maybe read off just the 1st paragraph? Not the whole update, way to much.

WBS00001

USA

#22 Sep 2015

I was running some tests on the Yahoo web site before posting a response and found out it is nearly a mega-byte in size. That makes for a huge file to process. In addition, it has tons of characters which must be scrubbed for the Script Language to accept it. It takes 2 to 3 minutes to scrub the whole thing, depending on the speed of the computer used. In this case particularly, that is unnecessary since we only need a small portion of the code at the beginning. By comparison, the Marketwatch site was only about 70K total in size.

I decided the easiest way to address this problem would be to to place a limit on the file size processed. This is done by setting a number in the HTMLScrub.ini file. Currently this number will be set to 100K but it can be changed to whatever is desired by changing the number in the ini file. What it does is take up to 100K bytes of whatever web site is used. If you need it to take more, you can change the number in the ini file. Just keep in mind the more it has to take in, the longer it will take to scrub it for use by ARC.

When you run the new HTMLScrub.exe program for the first time, it will automatically add those two lines to the ini file so you don't have to do anything in that regard.

I will address your question in the next post.

The new HTMLScrub.exe file is attached to this post.

HTMLScrub.zip

dbeard

PRO

USA

#23 Sep 2015

I have a question before I use the new htmlscrub file, will it cause a problem with the older version? The older version is working like a champ and I have tweaked the script and it does exactly what I want. Let me know your thoughts before I download and install.

WBS00001

USA

#24 Sep 2015

It should be fine. The web site that relates to the current script is well under 100K in size. Of course it's always wise to save the older executable file just in case, but I can't think of anything this change would affect with what you have now.

dbeard

Reading Through An Rss File To Get Specific Data

Upgrade to ARC Pro

Products

Community

Support

About