The End all of Text Manipulators

The End all of Text Manipulators

When a person wants to make a new module for theWord (or any such Bible program) you find the original text in very different states. Sometimes it is in html on a page, at other times it is in a PDF and needs to be converted to a Microsoft Word format to use it easily. But in many cases, there are formatting problems in the text. I will go through these formatting problems both here a little bit and in the future. But the point is you just have to use a text editor and use search and replace. Anything will work really, but only one stands out above the rest.

Introducing OpenOffice

Why do I have Microsoft Word installed on my computer and use OpenOffice (a free word processor)? The reason is because although you can do these things in Microsoft Word, it is not easy. The folks at OpenOffice have broken it down to be so easy that it is ridiculous.


Watchman Nee The Communion of the Holy Spirit is a 3 part, 15 Chapter work on various matters of the Holy Spirit. From the Deeper Life Movement
PDF:Watchman Nee The Communion of the Holy Spirit.
theWord:Watchman Nee The Communion of the Holy Spirit.
MySword:Watchman Nee The Communion of the Holy Spirit.
eSword:Watchman Nee The Communion of the Holy Spirit.

What is a Text manipulator?

What we want to do is take any text, analyze its formatting problems and then correct them into a standard text format.

Example 1 – Extra paragraphs between the text filled paragraphs

bla bla bla

bla bla bla

bla bla bla

This is a simple problem. You just put the cursor in the line and delete or backspace. That is fine for most people. But when you want to prepare a text and publish it, you don’t want to work 10 hours on doing this. I have had texts that were 500+ pages long with this problem. Am I going to go through one by one and delete these? No I need a macro.

So you can do this in many text editors (note, not ALL of them). So I made a Microsoft Word macro. It was complicated and long (I don’t remember exactly but around 40 lines of code). Then I go beboping along using my macro and I stumble across something that didn’t work. I took time and studied it. The paragraph mark has a space in front of it. The very seeing and discerning of what the problem is difficult sometimes because these text editors are not all that good at showing special characters.

Then I fixed the macro and again, somebody put a space and a tab before the paragraph mark. Why do people do that? Probably because they don’t know how to make a margin at the bottom of a paragraph mark. But this is how you find works and text that you want, and very simply I have to deal with it, and I am not going to try to correct everybody out there as far as their word processor editing skills, and a portion of them posted these works however and have since have passed away so it is a mute issue.

At left is how I can see what is going on in OpenOffice with showing hidden characters.

But knowing the problem is not fixing the problem.

The tabs issue is very rare (though the text that had it was a book outline with 50 chapters). So I have not made a special adaptation of the macro for this. But I made two macros in OpenOffice. One is “Quick” and the other “Full” for when there are spaces before the paragraph mark.

Introducing Alternative Search and Replace

The reason why I use hands down OpenOffice is because of this plugin in that they have made room for. Microsoft doesn’t even have the ability to use a third party plugin from what I know. If they don’t have it natively in their app sorry. Even if they do have it, using some of these things in Microsoft Word is very difficult.

So this is the Quick Macro.

So let me get petty here, and interpret this code for you. The [Name] is what shows up on the listing for naming this macro. [all] does nothing but it is the naming convention they use, so that means it will work on all the text and not just text selected. Think convert paragraphs to linefeeds when I use [sel].

 

[Name] Text [all] Delete empty Paragraphs Quick

[Find]^$
[Replace]
[Parameters] MsgOff Regular 
[Command] ReplaceAll

Each of the lines with brackets is a specific order in the macro.

[Find] What is searched for, and in this case ^$ is an empty paragraph code.

[Replace] is empty

[Parameters] MsgOff Messages off, Regular is regular expression or the search will be literal.

[Command] ReplaceAll is replace all occurrences.

Now the full macro

[Name] Text [all] Delete empty Paragraphs Full

[Find] \p
[Replace]\p
[Parameters] MsgOff Regular
[Command] ReplaceAll

[Find] \p
[Replace]\p
[Parameters] MsgOff Regular
[Command] ReplaceAll

[Find] \p
[Replace]\p
[Parameters] MsgOff Regular
[Command] ReplaceAll

[Find]\n\p
[Replace]\p
[Parameters] MsgOff Regular
[Command] ReplaceAll

;You need to have the ^$ last

[Find] $
[Replace]
[Parameters] MsgOff Regular 
[Command] ReplaceAll

[Find] $
[Replace]
[Parameters] MsgOff Regular 
[Command] ReplaceAll

[Find] $
[Replace]
[Parameters] MsgOff Regular 
[Command] ReplaceAll

[Find] $
[Replace]
[Parameters] MsgOff Regular 
[Command] ReplaceAll

[Find] $
[Replace]
[Parameters] MsgOff Regular 
[Command] ReplaceAll

[Find]\p{2,}
[Replace]\p\p
[Parameters] MsgOff Regular 
[Command] ReplaceAll
; The above code comes with Alt Search for replacing empty paragraphs

[Find]^$
[Replace]
[Parameters] MsgOff Regular 
[Command] ReplaceAll

As you can see, I do quite a bit of prereplacing formatting before I actually do the work part. This is because I have run across exceptions, and although this macro takes longer to run, it is more thorough. When the Quick one doesn’t quite do it, I run this one. Note that this is insignificant and I replace by hand when there are like 2 or 3 occurrences. But when I have 100+ pages to work through, this is gold.

Example 2 – Text that has a paragraph and would be better to have a line feed or visa versa.

Since I mention paragraph to linefeed conversion, let me include the code here.

[Name] Text [Sel.] linefeed selected paragraphs
; search paragragh $ replace with linefeed /n
[Find]$
[Replace]\n
[Parameters] MsgOff Regular CurrSelection 
[Command] ReplaceAll

Note that a semicolon at the beginning of the line is a comment.

You select a number of paragraphs and run this macro and it will convert them. I found this especially good for poetry. I find how many lines are in the stanza (4 or 5 or even 3) mark them, and run the mark. It makes very quick work of a laborious job.

The Alt Search and Replace Interface in OpenOffice

You can use this in the non-batch mode and it is very powerful also.

More Articles from this Category




The End all of Text Manipulators