Tags: ascii, astring, characters, peeps, perl, printable, programming, regex, values
Regex to remove non printable characters
On Programmer » Perl
5,978 words with 9 Comments; publish: Wed, 30 Apr 2008 18:45:00 GMT; (20062.50, « »)
Hi peeps,
I'd like to remove all characters with ascii values > 127 from a
string...that's to say i'd like to remove non printable chars...
is the following fine?
my $input =~ s/[^ -~]+//g;
thanks ever so much!
http://perl.itags.org/q_perl_69825.html
All Comments
Leave a comment...
- 9 Comments

- At 2007-12-21 09:54PM, "Larry" wrote:
> Hi peeps,
> I'd like to remove all characters with ascii values > 127 from a
> string...that's to say i'd like to remove non printable chars...
You might want:
$string =~ s/\P{IsPrint}//g;
See perldoc perlre
Glenn Jackman
"You can only be young once. But you can always be immature." -- Dave Barry
#1; Wed, 30 Apr 2008 18:46:00 GMT

- On Sat, 22 Dec 2007 03:54:33 +0100, Larry <dontmewithme.perl.itags.org.got.it> wrote:
> I'd like to remove all characters with ascii values > 127 from a
ASCII is a 7 bit encoding system where sometimes the eights bit is used as
parity bit. There are no ASCII characters > 127, therefore your request
doesn't make sense.
>string...that's to say i'd like to remove non printable chars...
In case you are not talking about ASCII but about e.g Windows-1252 or
ISO-Latin-x or any of the dozen other code pages that share the lower 128
characters with ASCII then please be advised that the vast majority of
those characters > 127 _ARE_ printable, at least in your typical commonly
used code pages.
The non-printable characters can be found in the lower part from 0x00 to
0x1F, no matter if ASCII or Windows-1252 or ISO-Latin-x or many, many
others.
Therefore your request makes even less sense. Maybe you want to clarify
first what you are talking about?
>is the following fine?
>my $input =~ s/[^ -~]+//g;
That will remove pretty much all the lower case English letters and a few
special characters. Wonder what they have to do with non-printable or
non-ASCII.
jue
#2; Wed, 30 Apr 2008 18:47:00 GMT

- Larry wrote:
> I'd like to remove all characters with ascii values > 127 from a
> string
$input =~ s/[^[:ascii:]]+//g;
>...that's to say i'd like to remove non printable chars...
$input =~ s/[^[:print:]]+//g;
> is the following fine?
> my $input =~ s/[^ -~]+//g;
my() creates a new variable with no contents so there is nothing for the
substitution operator to remove.
$ perl -wle'my $input =~ s/[^ -~]+//g;'
Use of uninitialized value in substitution (s///) at -e line 1.
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
#3; Wed, 30 Apr 2008 18:48:00 GMT

- In article <fe0bj.9527$wy2.5863.perl.itags.org.edtnps90>,
"John W. Krahn" <dummy.perl.itags.org.example.com> wrote:
> $input =~ s/[^[:ascii:]]+//g;
>
> $input =~ s/[^[:print:]]+//g;
is this fine?
$input =~ tr/\x80-\xFF//d;
#4; Wed, 30 Apr 2008 18:49:00 GMT

- Larry schreef:
> John W. Krahn:
> [remove non printable chars]
> is this fine?
> $input =~ tr/\x80-\xFF//d;
No. How about chr(0x00)..chr(0x1F)?
And characters > "\x{FF}"?
Affijn, Ruud
"Gewoon is een tijger."
#5; Wed, 30 Apr 2008 18:50:00 GMT

- On Sat, 22 Dec 2007 05:53:18 +0100, Larry <dontmewithme.perl.itags.org.got.it> wrote:
>In article <fe0bj.9527$wy2.5863.perl.itags.org.edtnps90>,
> "John W. Krahn" <dummy.perl.itags.org.example.com> wrote:
>
>is this fine?
>$input =~ tr/\x80-\xFF//d;
Depends what you are looking for (you still didn't clarify).
It will remove non-ASCII character in the typical 8-bit encodings.
It will _NOT_ remove non-printable characters.
Maybe you should make up your mind and let us know _which_ of these two
you are actually trying to do.
jue
#6; Wed, 30 Apr 2008 18:51:00 GMT

- Larry wrote:
> Hi peeps,
> I'd like to remove all characters with ascii values > 127 from
> a string...that's to say i'd like to remove non printable chars...
> is the following fine?
> my $input =~ s/[^ -~]+//g;
> thanks ever so much!
Maybe this do it
my $input =~ s/[\x00-\x09\x0B\x0C\x0E-\x1F\x80-\xFF]//g;
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)
Please reply to <petr AT practisoft DOT cz>
#7; Wed, 30 Apr 2008 18:52:00 GMT

- On Sat, 22 Dec 2007 05:53:18 +0100
Larry <dontmewithme.perl.itags.org.got.it> wrote:
> In article <fe0bj.9527$wy2.5863.perl.itags.org.edtnps90>,
> "John W. Krahn" <dummy.perl.itags.org.example.com> wrote:
>
> is this fine?
> $input =~ tr/\x80-\xFF//d;
Your subject line says you want a regex. The tr/// operator doesn't use reg
ular expressions.
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
#8; Wed, 30 Apr 2008 18:53:00 GMT

- "John W. Krahn" <krahnj.perl.itags.org.telus.net> wrote:
>Larry <dontmewithme.perl.itags.org.got.it> wrote:
>Your subject line says you want a regex. The tr/// operator doesn't use regular ex
pressions.
Good point. However, if you are splitting hairs, then let's be accurate:
Regular expressions match a string but they never remove anything as
requested by the OP. Therefore taking literally the OPs question is
non-sensical in the first place.
And he still didn't tell us if he wanted to remove non-ASCII or
non-printable, two very different categories which have no relationship with
each other whatsoever.
jue
#9; Wed, 30 Apr 2008 18:54:00 GMT