Tags: ascii, astring, characters, peeps, perl, printable, programming, regex, values

Regex to remove non printable characters

On Programmer » Perl

5,978 words with 9 Comments; publish: Wed, 30 Apr 2008 18:45:00 GMT; (20062.50, « »)

Hi peeps,

I'd like to remove all characters with ascii values > 127 from a

string...that's to say i'd like to remove non printable chars...

is the following fine?

my $input =~ s/[^ -~]+//g;

thanks ever so much!

All Comments

Leave a comment...

  • 9 Comments
    • At 2007-12-21 09:54PM, "Larry" wrote:

      > Hi peeps,

      > I'd like to remove all characters with ascii values > 127 from a

      > string...that's to say i'd like to remove non printable chars...

      You might want:

      $string =~ s/\P{IsPrint}//g;

      See perldoc perlre

      Glenn Jackman

      "You can only be young once. But you can always be immature." -- Dave Barry

      #1; Wed, 30 Apr 2008 18:46:00 GMT
    • On Sat, 22 Dec 2007 03:54:33 +0100, Larry <dontmewithme.perl.itags.org.got.it> wrote:

      > I'd like to remove all characters with ascii values > 127 from a

      ASCII is a 7 bit encoding system where sometimes the eights bit is used as

      parity bit. There are no ASCII characters > 127, therefore your request

      doesn't make sense.

      >string...that's to say i'd like to remove non printable chars...

      In case you are not talking about ASCII but about e.g Windows-1252 or

      ISO-Latin-x or any of the dozen other code pages that share the lower 128

      characters with ASCII then please be advised that the vast majority of

      those characters > 127 _ARE_ printable, at least in your typical commonly

      used code pages.

      The non-printable characters can be found in the lower part from 0x00 to

      0x1F, no matter if ASCII or Windows-1252 or ISO-Latin-x or many, many

      others.

      Therefore your request makes even less sense. Maybe you want to clarify

      first what you are talking about?

      >is the following fine?

      >my $input =~ s/[^ -~]+//g;

      That will remove pretty much all the lower case English letters and a few

      special characters. Wonder what they have to do with non-printable or

      non-ASCII.

      jue

      #2; Wed, 30 Apr 2008 18:47:00 GMT
    • Larry wrote:

      > I'd like to remove all characters with ascii values > 127 from a

      > string

      $input =~ s/[^[:ascii:]]+//g;

      >...that's to say i'd like to remove non printable chars...

      $input =~ s/[^[:print:]]+//g;

      > is the following fine?

      > my $input =~ s/[^ -~]+//g;

      my() creates a new variable with no contents so there is nothing for the

      substitution operator to remove.

      $ perl -wle'my $input =~ s/[^ -~]+//g;'

      Use of uninitialized value in substitution (s///) at -e line 1.

      John

      --

      Perl isn't a toolbox, but a small machine shop where you

      can special-order certain sorts of tools at low cost and

      in short order. -- Larry Wall

      #3; Wed, 30 Apr 2008 18:48:00 GMT
    • In article <fe0bj.9527$wy2.5863.perl.itags.org.edtnps90>,

      "John W. Krahn" <dummy.perl.itags.org.example.com> wrote:

      > $input =~ s/[^[:ascii:]]+//g;

      >

      > $input =~ s/[^[:print:]]+//g;

      is this fine?

      $input =~ tr/\x80-\xFF//d;

      #4; Wed, 30 Apr 2008 18:49:00 GMT
    • Larry schreef:

      > John W. Krahn:

      > [remove non printable chars]

      > is this fine?

      > $input =~ tr/\x80-\xFF//d;

      No. How about chr(0x00)..chr(0x1F)?

      And characters > "\x{FF}"?

      Affijn, Ruud

      "Gewoon is een tijger."

      #5; Wed, 30 Apr 2008 18:50:00 GMT
    • On Sat, 22 Dec 2007 05:53:18 +0100, Larry <dontmewithme.perl.itags.org.got.it> wrote:

      >In article <fe0bj.9527$wy2.5863.perl.itags.org.edtnps90>,

      > "John W. Krahn" <dummy.perl.itags.org.example.com> wrote:

      >

      >is this fine?

      >$input =~ tr/\x80-\xFF//d;

      Depends what you are looking for (you still didn't clarify).

      It will remove non-ASCII character in the typical 8-bit encodings.

      It will _NOT_ remove non-printable characters.

      Maybe you should make up your mind and let us know _which_ of these two

      you are actually trying to do.

      jue

      #6; Wed, 30 Apr 2008 18:51:00 GMT
    • Larry wrote:

      > Hi peeps,

      > I'd like to remove all characters with ascii values > 127 from

      > a string...that's to say i'd like to remove non printable chars...

      > is the following fine?

      > my $input =~ s/[^ -~]+//g;

      > thanks ever so much!

      Maybe this do it

      my $input =~ s/[\x00-\x09\x0B\x0C\x0E-\x1F\x80-\xFF]//g;

      Petr Vileta, Czech republic

      (My server rejects all messages from Yahoo and Hotmail. Send me your

      mail from another non-spammer site please.)

      Please reply to <petr AT practisoft DOT cz>

      #7; Wed, 30 Apr 2008 18:52:00 GMT
    • On Sat, 22 Dec 2007 05:53:18 +0100

      Larry <dontmewithme.perl.itags.org.got.it> wrote:

      > In article <fe0bj.9527$wy2.5863.perl.itags.org.edtnps90>,

      > "John W. Krahn" <dummy.perl.itags.org.example.com> wrote:

      >

      > is this fine?

      > $input =~ tr/\x80-\xFF//d;

      Your subject line says you want a regex. The tr/// operator doesn't use reg

      ular expressions.

      John

      --

      Perl isn't a toolbox, but a small machine shop where you

      can special-order certain sorts of tools at low cost and

      in short order. -- Larry Wall

      #8; Wed, 30 Apr 2008 18:53:00 GMT
    • "John W. Krahn" <krahnj.perl.itags.org.telus.net> wrote:

      >Larry <dontmewithme.perl.itags.org.got.it> wrote:

      >Your subject line says you want a regex. The tr/// operator doesn't use regular ex

      pressions.

      Good point. However, if you are splitting hairs, then let's be accurate:

      Regular expressions match a string but they never remove anything as

      requested by the OP. Therefore taking literally the OPs question is

      non-sensical in the first place.

      And he still didn't tell us if he wanted to remove non-ASCII or

      non-printable, two very different categories which have no relationship with

      each other whatsoever.

      jue

      #9; Wed, 30 Apr 2008 18:54:00 GMT