Reading Through An Rss File To Get Specific Data

PRO

dbeard

USA

Asked Sep 2015 — Edited Nov 2016

Resolved by WBS00001!

Skip to comments Jump to end

I have been able to get and save the attached file to my hard drive. I can read it but I can not figure out how to pull just the data I want. In this case it is a stock quote for VZ. I want to just pull out the symbol and stock price.

I would appreciate any examples of how to accomplish that.

Thank You

Jump to end

Upgrade to ARC Pro

Experience the transformation – subscribe to Synthiam ARC Pro and watch your robot evolve into a marvel of innovation and intelligence.

Compare Pro Features View Subscription Plans

WBS00001

USA

#9 Sep 2015

I have a solution for you. Not great but workable. I have attached a zip file to this post. It contains 2 files: HTMLScrub.exe and HTMLScrub.ini. Put them in whatever directory you like. Also I have uploaded a project with a script for reading the data from the website that you used before to get the quote data. it is called "HTML Reader". In it you will find a script called "Test". This script will read a modified file derived from the website download. It first calls the website and downloads the data like you did before. It then calls the HTMLScrub.exe program which will modify the file and make it ready for reading by the script. Then the HTMLScrub program will automatically close itself. It only takes a couple of seconds to do it's thing.

The rest of the script reads the modified file, line by line and looks for the data to read out, which it also does as it finds it.

At the moment it says the data to the PC via SayWait commands. To make it go to the EZB4 you will need to change those SayWaits to SayEzbWait.

As you go through the script you will see places you can modify the text to say more what you wish for each category.

Before you can make it all do anything, however, you will need to put in the path to where the VZ.txt file should go. You will need to do this in two places.

One is in the HTMLScrub.ini file. Open it with Notepad and you will see a line which currently says "FilePath=C:\VZ.txt". Change the path to wherever you want the VZ.txt file to go. I believe you had it going to "C:\Users\Public\Documents" before. That will be fine here too. You can make the name of the file whatever you wish as well. So you could change the "C:\VZ.txt" to "C:\Users\Public\Documents\VZ.txt. Then save the file.

That's all that is needed as far as the HTMLScrub.exe program is concerned. It will modify the file and create an output file which will have the prefix "Mod" attached to it. In this case it would be ModVZ.txt. It will be the ModVZ.txt file that the script will use.

Then you need to modify the filepaths in the script to match as follows:

In the script are the following lines:


#Enter the full path and name of the files to be used
$VZFile =&quot;C:\VZ.txt&quot;
$ModVZFile =&quot;C:\ModVZ.txt&quot;
  
  #The full path and name of the HTML scrubber program.
$ScrubProg =&quot;F:\HTML_ScrubD7\HTMLScrub.exe&quot;

You will need to change the paths of the various parts to whatever you will use. If you change the name of the file that will have to be reflected in the script as well.

For instance you would do the following if you put C:\Users\Public\Documents\VZ.txt in the HTMLScrub.ini file:


$VZFile =&quot;C:\VZ.txt&quot;
$ModVZFile =&quot;C:\ModVZ.txt&quot;

     #Would be changed to:

$VZFile =&quot;C:\Users\Public\Documents\VZ.txt&quot;
$ModVZFile =&quot;C:\Users\Public\Documents\ModVZ.txt&quot;

The "$ScrubProg ="F:\HTML_ScrubD7\HTMLScrub.exe" line would be changet to wherever you put the HTMLScrub.exe and HTMLScrup.ini file. If you put them in "C:\Program Files\HTML_Scrub", for example, then that is what you would put in the script as in:


$ScrubProg =&quot;C:\Program Files\HTML_Scrub\HTMLScrub.exe&quot;

And that's it. If all went well, it should get the latest data fom the website then say what the values are when you run the script.

Let me know if you have trouble and/or questions as to how it all works.

HTMLScrubber.zip

WBS00001

USA

#10 Sep 2015

Just to let you know, I revised the project (HTML Reader) to make the code more efficient and uploaded it again.

EDIT I forgot to mention that the HTMLScrub.exe file had to also be modified to go along with the revised project file. I have attached it to this post. Please unzip the executable and put in the directory you placed the other one. You might want to save the old HTMLScrub.exe file just in case. Maybe rename it to something else and leave it in the directory. Then copy the new one to that directory. Same with the HTML Reader project file. Rename the current one you have and download the new one. So, basically, new project, new executable. Old project, old executable.

If you use the new one you will have to change the settings in the Test script as well. Like you did with the old one. There is a new setting in the new project file to make it easier to change from sending the speech to the PC or the robot. At the top are the lines:


$Silence =0
$SendSpeechToPC =1
$SendSpeechToRobot =2
$SendSpeechTo =$SendSpeechToPC #Default. Change as needed

Just set $SendSpeechTo to whatever you wish to do.

HTMLScrub.zip

dbeard

PRO

USA

#11 Sep 2015

Back from Labor Day weekend, will give it a try.

dbeard

PRO

USA

#12 Sep 2015

It all works, with the exception of the Percent Change line. It reads some of the control characters before it gets to the data. But I may be able to figure it out by editing the script. thanks for all your hard work, hopefully this will be useful for others also.

dbeard

PRO

USA

#13 Sep 2015

another question. I am using another file, I can locate the string I am looking for but not the value. Do you count the lines excluding the line I found the string on or do I include it?

WBS00001

USA

#14 Sep 2015

Sorry you're having problems with the script. I'll take the second question first. Yes, you include the line the string is on in the count.

What is happening in the Percent Change line. is they changed the color in the HTML code from red to green and I was keying in on the phrase 'red>' which is now 'green>'. So, to get it back for the moment you can change that part in the script. Obviously that is not a good long term solution since it will change again. I was afraid of this and had a backup ( read 'more complicated' ) plan to overcome this.

In that regard, I am working out a different method for getting to the data desired. The data are bounded by a > and < symboi pair, or a '>' and the '&nbsp' phrase (&nbsp=no break space). There is one which has just '&nbsp' phrases bounding it.

The problem is there may be multiple such symbols and phrases in the incoming lines of text. So the method I will employ will allow for multiple characters or phrases to be sent to the line processing routine (:ExtractVal) so that the method can take the text away in chunks until it gets to the right one. Sort of like Pacman taking bites of the text until it gets to the data you want. This will be a better long term solution since it uses the basic structure of the line instead of any specific text in it.

To do that you would have to count the '>' or the '&nbsp' phrases (or whatever may be used in the future) ahead of the one that is just in front of the data to be spoken. Then, what is left of the line is processed as before. In this case the line looks like this:


&lt;TD nowrap align=right width=50%&gt;&lt;font color=green&gt;2.37%&lt; font&gt;&amp;nbsp

  #Here, we want to use the '&gt;' symbol
  #After the first bite is taken out of it, the result will be:
&lt;font color=green&gt;2.37%&lt; font&gt;&amp;nbsp

The next IndexOf for the '>' symbol that is done on the line will now find the '>' just ahead of the 2.37% value we are looking for.

To do this, I will change the $StartStr lines to include multiple symbols, or phrases, separated by the Line character( | ). It is usually found on the same key as the backslash ( \ ). In this case it would be:


$StartStr =&quot;&gt;|&gt;&quot; 

  #In other cases it could be like this:
$StartStr =&quot;&gt;|&gt;|&gt;&quot; #Two '&gt;' symbols ahead of the one we want.

In other cases there are no extra symbols ahead of the one we want like this one:


&lt;TD nowrap&gt;&amp;nbsp As of: 8 Sep 2015 16:00:00 EDT&amp;nbsp

Here we key in on a &nbsp phrase and there is only one ahead of the data, so the $StartStr will simply be:


$StartStr =&quot;&amp;nbsp&quot;

No separation characters needed.

I should have this ready sometime today and will simply put the revised script code directly in the post for you to copy and paste into your script, along with instructions.

dbeard

PRO

USA

#15 Sep 2015

Before you put to much work into this. I started thinking about this and I think it would be better to get a status on the overall market then individual stocks. So I found a website that provides data on market index. It is http://www.marketwatch.com/tools/marketsummary/indices/indices.asp?indexid=1&groupid=37.

I have attached the two files in a zip.

Let me know what you think.

WBS00001

USA

#16 Sep 2015

I'm afraid you didn't attach the files to the post. Nonetheless, the same principles will apply regardless of what website is used, so the same code can be used.

What is it you would like to have read out from the site? If it is all the indexes, that can be done without much modifications. Perhaps put in a loop for the reading of the text from the website, index by index, for code efficiency. If it is specific indexes that can be done by creating a string containing all the appropriate index names.

dbeard

Reading Through An Rss File To Get Specific Data

Upgrade to ARC Pro

Products

Community

Support

About