Home
The Present
Military Service
Blair Family History
Blair Family Tree
Blair Database
Blair Publications
Blair Photo Album
O'Hara Family History
McLuckie's
Gilfillan Families
Other Blair Families
Old Parish Records
Blair Links
WebRing
Guest Book
Contact Us

Cleaning the Text Files to Import into a Database and/or Spreadsheet

[OPRs] Home ] Searching and Saving the Indexes ] [ Cleaning Up the Files ] Converting to Database ] Converting to Spreadsheet ]

Several people have asked how I convert the LDS-FHC OPRs text files to a database. Unfortunately there is no simple conversion process that I know of. This section covers cleaning the BIRTH text to a format suitable to import into a database or a spreadsheet.

The LDS-FHC OPR text files are a "fixed length" text file. Each record (individual) is 3 lines long. Each record contains the following fields: Name and Sex, Event, Date, Batch (Ba) Number, Father’s Name, Place (the event occurred), Source (So) Call Number, Mother’s Name, Additional Information, and Printout (PR) Number. Each of these fields is a fixed length. For example the Name and Sex field is 40 characters long and starts at position 10. (Don’t ask me what positions 1 thru 9 are for, I haven’t a clue.)

A typical LDS-FHC OPRs text file looks something like below when viewed in wordPad WITHOUT wrapping text or a word processor set for a LEGAL (14 x 8.5) paper and Courier Font. Some of the spaces have been removed to it fit on the page. For best results, view with the screen maximized. The numbers along left side are provided for reference only.

1.     Scottish Church Records - Version 2.00                                                                               \
2.     25 JUN 1998                                        HOLDING FILE ENTRIES                                       Page 1 \
3.     =====================================================================================================================\
4.                                                                                    Additional            Batch and Source\
5.     Names (Sex)                         Event  Date/Place                          Information           Information     \
6.     ---------------------------------------------------------------------------------------------------------------------\
7.     (continued) 
       Isabella BLAIR (F)....................  C: 23 Sep 1828                                               Ba: C112725     \
         Father: William BLAIR                    Arbroath, Angus, Scotland                                 So: 993332      \
         Mother: Anne FAIRWEATHER                                                                                           \
\
       Isabella Millar BLAIR (F).............  B: 22 Jul 1828                                               Ba: C113224     \
         Father: William BLAIR                    Tealing, Angus, Scotland                                  So: 993509      \
         Mother: Isabella BUCHAN                                                                                            \
\
       Isabell BLAIR (F).....................  C: 7 Feb 1830                                                Ba: C112828     \
         Father: Thomas BLAIR                     Dundee, Angus, Scotland                                   So: 993402      \
         Mother: Isabell MCINTOSH                                                                           Pr: 6900646     \
\
       Isabella BLAIR (F)....................  C: 22 Aug 1831                                               Ba: C113014     \
         Father: George BLAIR                     Liff Benvie And Invergowrie, Angus,                       So: 993484      \
         Mother: Helen FINN                       Scotland                                                                  \
\
       Isabella BLAIR (F)....................  C: 18 Oct 1835                                               Ba: C114406     \
         Father: Alexander BLAIR                  Kinglassie, Fife, Scotland                                So: 1040196     \
         Mother: Catherine LUMSDEN                                                                          Pr: 6900783     \
\
       Isabella BLAIR (F)....................  B: 22 Nov 1841                                               Ba: C112754     \
         Father: William BLAIR                    Brechin, Angus, Scotland            FR1976                So: 993413      \
         Mother: Mary MCINTYRE                                                                                              \
\
       Isabell Ness BLAIR (F)................  C: 28 Mar 1841                                               Ba: C112829     \
         Father: James BLAIR                      Dundee, Angus, Scotland             FR5268                So: 993408      \
         Mother: Margaret NESS                                                                              Pr: 6900646     \
\
       Isabella Buchan BLAIR (F).............  C: 14 Dec 1849                                               Ba: C112829     \
         Father: Francis BLAIR                    Dundee, Angus, Scotland             FR5457                So: 993408      \
         Mother: Jean MURRAY                                                                                Pr: 6900646     \
\
       Isabella BLAIR (F)....................  C: 11 Nov 1849                                               Ba: C114455     \
         Father: Alexander BLAIR                  Leuchars, Fife, Scotland            FR474                 So: 1040189     \
         Mother: Catherine NESS                                                                                             \
\
       Isabella Thomson BLAIR (F)............  C: 9 Aug 1852                                                Ba: C113194     \
         Father: Francis BLAIR                    Saint Vigeans, Angus, Scotland                            So: 993506      \
         Mother: Helen MCKINLAY                                                                                             \
\
       Isaballa BLAIR (F)....................  C: 8 Mar 1853                                                Ba: C114157     \
         Father: David BLAIR                      Ceres, Fife, Scotland               FR1166                So: 1040155     \
         Mother: Emilia HONEYMAN                                                                            Pr: 6900759     \
\
8.     =====================================================================================================================\
9.              Events:  B=Birth  C=Christening  M=Marriage  S=Miscellaneous\
10        Batch/Source:  Ba=Batch  So=Source  Pr=Printout\
11.    Special Symbols:  * Film contains no additional information.   ^\'08- Some information was estimated or altered.\
12.                      @ Names and relationships of others stated in source.   # Additional relatives listed in source.\
13.                         + Additional sources for batch.\
14.    =====================================================================================================================\
15.    (continued)
                      

To do my conversion I use Windows 95 and MS Office 97 (MS Word 97 and Excel 97). I also use Lotus 123 version 5. I know that Lotus WordPro 97 can also be used instead of MS Word 97 to clean up the FHC text files. Some of you have asked about using MS Works – I have NO idea if MS Works will do the conversion, as I have NEVER used it. I also have NO idea about using DOS word processors or any other word processors.

There are three major stumbling blocks to converting the text file to a database. First is the extraneous information at the beginning and end of each page of the text file. All you really want is the information beginning with the name and ending with the PR number. Somehow you have to get rid of all the other junk.

The second problem is the <CR>s at the ends of lines containing the "Real" information. These records are fixed length text and should only have a <CR> after the PR number.

The third problem is that even after you remove the extraneous information and <CR>s you still have to do a lot of fine tuning with the remaining information.

The name field looks like this: James BLAIR (M)........................ To use it in a database you should delete all the ......., separate the (M) from James BLAIR and probably split the name into first and last.

Father and Mothers name should also be split into first and last.

I think you see the pattern.

Before you begin working on any text file MAKE SURE YOU MAKE A COPY OF IT FIRST. DO NOT WORK ON THE ORIGINAL TEXT FILE. The procedures below are complicated and it is easy to make a mistake. After opening a file I always do a "save as" text with a new file name before working on the file.

The first thing I do is open the text file with MS Word 97. On the menu bar click on "File" and "Page Setup". Set the margins, gutter, header and footer to 0. Set the paper size to legal landscape (14 x 8.5). Press OK. Ignore the warning message about the margin size (Press Ignore).

Move the cursor to the very top of the page. I use the "Find and Replace" function to delete unwanted lines. On the menu bar click on EDIT and REPLACE. Put you cursor in the "Find What" Box and type ^m (<shift 6> and then m). Make sure the "Replace With" box is empty. Click "Replace All". This will remove all the manual page breaks inserted between the individual screens you saved.

Starting at the top of the page move your cursor to line 1. " Scottish Church Records - Version 2.00".
        a. Press <Home>
        b. Holding down the <Shift> key press <End> and highlight the entire line
        c. press <CTRL><C> at the same time.
        d. Press <Home>
        e. On the menu bar click on EDIT and REPLACE
        f. Put you cursor in the "Find What" Box and press <CTRL><V> at the same time.
        g. Press ^p (<shift> 6 and p) --- THIS IS IMPORTANT ---
        h. Click "Replace All"
        i You'll see the following message "Word has completed its search of the document and has made xx replacements"
        j. Click on OK
        k. Click on Close

Move the cursor down to line 2., the line with the date.
        Repeat steps a thru g.
        Move your cursor just to left of the "1" in "Page 1".
        Hold down the <Shift> key and press the right arrow. This should highlight the "1"
        Press <^> (<shift> 6) and then <?>
        Repeat steps h thru l above.
        This removes all lines with pages 1 thru 9

On the menu bar click on EDIT and FIND
        In the "Find What" Box type in Page 10 and click "find next"
        Page 10 should be highlighted. Close the "Find" box and press <Home>
        Repeat steps b. thru g. above
        Move your cursor just to left of the "0" in "Page 10".
        Hold down the <Shift> key and press the right arrow. This should highlight the "0"
        Press <^> (<shift> 6) and then <?>
        Repeat steps h thru l above.
        This removes all lines with pages 10 thru 19 (if there are that many pages)

Press <Ctrl><Home> to take you back to the top of the document

Move the cursor down to line 3 and repeat step a. thru k.

Repeat steps a. thru k. for lines 4 thru 10 and lines 12 thru 15. Skip line 11 for the moment. When you have removed the lines 1 thru 16 (except 11) you will have a line with the word "(continue)" on it. Move your cursor to this line and repeat steps a. thru k.

Line 11 is a a bit tricky because it contains "special characters". Move your cursor to line 12. Repeat steps a. thru g. In the "Find What" Box move your cursor to the right of the "^". Hold down the <shift> key and press the right arrow once. This should highlight the space between the "^" and the "-". Press the "^" (<shift>6) twice and then the "?".

It should look like this: ^^^?-.

Repeat steps h thru k above.

You should have now deleted all the unwanted lines of text between the records. Move the cursor to the bottom of the file. You should see a couple of paragraph markers and 4 lines of text. Move the cursor to the first paragraph marker below the last record. Holding the <shift> key press the down arrow until it will go no further. Press <Delete>. All the unwanted text should now be gone.

Press <Ctrl><Home> at the same time. This will move the cursor to the very top of the file. Hold done the <shift> key and press <End>. This should highlight a series of blank spaces and a paragraph marker. Press <Delete>.

You now have a "Clean" document except for unwanted paragraph markers <CRs>. With your cursor at the very top of the document, open the "Replace" Box. In "Find What" Box type ^p^p^p. In the "Replace With" Box type ^p^p. Press "Replace All". When the Replace is complete, close the "Replace" box move the cursor to the top of the document and repeat this Find and Replace. This time it should say 0 replacements. If it doesn't repeat this step until it does say 0 replacements. Close the "Replace" Box.

Move the cursor to the top of the document. Open the "Replace" Box. In "Find What" Box type ^p^p. In the "Replace With" Box type ^p^t^p. Press "Replace All". Close the "Replace" Box.

Move the cursor to the top of the document. Open the "Replace" Box. In "Find What" Box type ^p. In the "Replace With" Box delete everything in it. This box must be empty. Press "Replace All". Close the "Replace" Box.

Move the cursor to the top of the document. Open the "Replace" Box. In "Find What" Box type ^t. In the "Replace With" Box type ^p. Press "Replace All". Close the "Replace" Box.

Move the cursor to the top of the document. Open the "Replace" Box. In "Find What" Box type ^p^p. In the "Replace With" Box type ^p. Press "Replace All". Close the "Replace" Box.

You should now have a TOTALLY CLEAN document. To check, change the Page Setting - Paper Size to 22 x 8.5. Move the cursor to the top of the page. >From the menu bar select EDIT - SELECT ALL. This will highlight the entire document. From the menu bar select FORMAT - FONT. Set the Font to "Courier New" and Font Size to 6. Review the document. Every record should be on a single line. All the fields should line up under each other. If this is happens - CONGRATULATIONS - you now have a file ready to import into Excel 97. Save the clean file as a Text File. I usually give the final file a new name. If everything does not line up you may be able to make manual adjustments to individual lines depending on how badly they align.

A couple of important notes. I use "< >" to indicate specific keys. Just press the keys. ALWAYS make sure you unhighlight any text you have copied from the document into the "Replace" Box before you do a replace. That is why I say press <Home> after pressing <Ctrl><C>. When replacing lines, ALWAYS make sure the "Find What" Box ends with "^p". ALWAYS make sure you save the clean file.

The steps above are very detailed and intended for those who have minimum experience using MS Word 97. Experienced users will realize there are several short cuts which can be taken. The biggest short cut is NOT closing the "Replace" Box every time. You can simple click outside the "Replace" Box when you have finished doing the replace. The box will remain on the screen. This can save a fair amount of time, particularly when doing the final replacements of the ^ps and ^ts. You'll find during these operations that the cursor is already at the top of the document.

Cleaning a text using the above procedure will take about 30 minutes. The size of the file has little bearing on how long the operation takes. I often combine several files into one large file before converting.

These procedures have been thoroughly tested using MS Office 97. As far as I know they will also work using MS Word 6.0 for Windows 3.1. These procedures will also work using Lotus WordPro with some modification. First when you open the file in WordPro the font may need to be changed to "Courier New" and Font Size to 10. Second I don't know of a way to do Find and Replace of a Page Break. You will have to manually delete all the page breaks first. Third use ^r instead of ^p. And finally on removing line 12 - find the "^" in the "Find What" Box. Place the cursor to the left (or right) of it and type ^. You do not have to add the ^?.

Please don't ask about using other programs because I DON"T KNOW. Other word processors with Find and Replace functions may do just fine if they support replacing long lines. Many WP unfortunately will only support a limited number of characters and many will not support replacing special characters such as <Tabs> and <CRs>.

[OPRs] Home ] Searching and Saving the Indexes ] [ Cleaning Up the Files ] Converting to Database ] Converting to Spreadsheet ]

 

Go to Origins.net - Trace your Origins online

 

 

 

 

 

E-Mail comments or question to John A. Blair
©
November 1, 2002, Blairgenealogy.com
This Site was Designed and is Maintained by
Datamation

 


God Bless America and its Allies