勉強日記

チラ裏

LPIC あずき本v5.0 ch3 3.3 テキスト処理フィルタ

www.shoeisha.co.jp


テキスト処理フィルタ

テキストフィルタコマンド

cat

  • ファイル内容を標準出力
  • ファイル名省略時、標準入力を標準出力
  • -n
    • 行番号表示
ls -l | cat -n
     1  total 64
     2  -rw-r--r--   1 root root 12076 Dec  5 01:37 anaconda-post.log
     3  lrwxrwxrwx   1 root root     7 Dec  5 01:36 bin -> usr/bin
     4  drwxr-xr-x   5 root root   360 Feb 25 02:56 dev
     5  drwxr-xr-x   1 root root  4096 Feb 25 02:56 etc
     6  -rw-r--r--   1 root root    10 Feb 25 03:20 hoge.txt
     7  drwxr-xr-x   2 root root  4096 Apr 11  2018 home
     8  lrwxrwxrwx   1 root root     7 Dec  5 01:36 lib -> usr/lib
     9  lrwxrwxrwx   1 root root     9 Dec  5 01:36 lib64 -> usr/lib64
    10  -rw-r--r--   1 root root  1094 Feb 25 03:31 ls_log
    11  drwxr-xr-x   2 root root  4096 Apr 11  2018 media
    12  drwxr-xr-x   2 root root  4096 Apr 11  2018 mnt
    13  drwxr-xr-x   2 root root  4096 Apr 11  2018 opt
    14  dr-xr-xr-x 174 root root     0 Feb 25 02:56 proc
    15  dr-xr-x---   1 root root  4096 Feb 25 03:02 root
    16  drwxr-xr-x  11 root root  4096 Dec  5 01:37 run
    17  lrwxrwxrwx   1 root root     8 Dec  5 01:36 sbin -> usr/sbin
    18  drwxr-xr-x   2 root root  4096 Apr 11  2018 srv
    19  dr-xr-xr-x  13 root root     0 Feb 25 03:16 sys
    20  drwxrwxrwt   1 root root  4096 Feb 25 03:20 tmp
    21  drwxr-xr-x  13 root root  4096 Dec  5 01:36 usr
    22  drwxr-xr-x  18 root root  4096 Dec  5 01:36 var

nl

  • ファイル内容を行番号付与して標準出力
    • number of line的な意味かな
  • ファイル名省略時、標準入力
ls -n | nl
     1  total 64
     2  -rw-r--r--   1 0 0 12076 Dec  5 01:37 anaconda-post.log
     3  lrwxrwxrwx   1 0 0     7 Dec  5 01:36 bin -> usr/bin
     4  drwxr-xr-x   5 0 0   360 Feb 25 02:56 dev
     5  drwxr-xr-x   1 0 0  4096 Feb 25 02:56 etc
     6  -rw-r--r--   1 0 0    10 Feb 25 03:20 hoge.txt
     7  drwxr-xr-x   2 0 0  4096 Apr 11  2018 home
     8  lrwxrwxrwx   1 0 0     7 Dec  5 01:36 lib -> usr/lib
     9  lrwxrwxrwx   1 0 0     9 Dec  5 01:36 lib64 -> usr/lib64
    10  -rw-r--r--   1 0 0  1094 Feb 25 03:31 ls_log
    11  drwxr-xr-x   2 0 0  4096 Apr 11  2018 media
    12  drwxr-xr-x   2 0 0  4096 Apr 11  2018 mnt
    13  drwxr-xr-x   2 0 0  4096 Apr 11  2018 opt
    14  dr-xr-xr-x 175 0 0     0 Feb 25 02:56 proc
    15  dr-xr-x---   1 0 0  4096 Feb 25 03:02 root
    16  drwxr-xr-x  11 0 0  4096 Dec  5 01:37 run
    17  lrwxrwxrwx   1 0 0     8 Dec  5 01:36 sbin -> usr/sbin
    18  drwxr-xr-x   2 0 0  4096 Apr 11  2018 srv
    19  dr-xr-xr-x  13 0 0     0 Feb 25 03:16 sys
    20  drwxrwxrwt   1 0 0  4096 Feb 25 03:20 tmp
    21  drwxr-xr-x  13 0 0  4096 Dec  5 01:36 usr
    22  drwxr-xr-x  18 0 0  4096 Dec  5 01:36 var

オプション

  • -h STYLE, --header-numbering=STYLE
    • ヘッダに番号付与
    • STYLE
      • a
        • 全行に番号付与
      • n
        • 番号付与しない
      • t
        • 空でない行にのみ番号振る
  • -b STYLE, --body-numbering=STYLE
    • ボディに番号付与
  • -f STYLE, --footer-numbering=STYLE
    • フッタに番号付与
  • -n FORMAT, --number-format=FORMAT
nl --help | nl -b t -n rz
000001  Usage: nl [OPTION]... [FILE]...
000002  Write each FILE to standard output, with line numbers added.
000003  With no FILE, or when FILE is -, read standard input.

000004  Mandatory arguments to long options are mandatory for short options too.
000005    -b, --body-numbering=STYLE      use STYLE for numbering body lines
000006    -d, --section-delimiter=CC      use CC for separating logical pages
000007    -f, --footer-numbering=STYLE    use STYLE for numbering footer lines
000008    -h, --header-numbering=STYLE    use STYLE for numbering header lines
000009    -i, --line-increment=NUMBER     line number increment at each line
000010    -l, --join-blank-lines=NUMBER   group of NUMBER empty lines counted as one
000011    -n, --number-format=FORMAT      insert line numbers according to FORMAT
000012    -p, --no-renumber               do not reset line numbers at logical pages
000013    -s, --number-separator=STRING   add STRING after (possible) line number
000014    -v, --starting-line-number=NUMBER  first line number on each logical page
000015    -w, --number-width=NUMBER       use NUMBER columns for line numbers
000016        --help     display this help and exit
000017        --version  output version information and exit

000018  By default, selects -v1 -i1 -l1 -sTAB -w6 -nrn -hn -bt -fn.  CC are
000019  two delimiter characters for separating logical pages, a missing
000020  second character implies :.  Type \\ for \.  STYLE is one of:

000021    a         number all lines
000022    t         number only nonempty lines
000023    n         number no lines
000024    pBRE      number only lines that contain a match for the basic regular
000025              expression, BRE

000026  FORMAT is one of:

000027    ln   left justified, no leading zeros
000028    rn   right justified, no leading zeros
000029    rz   right justified, leading zeros


000030  GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
000031  Report nl translation bugs to <http://translationproject.org/team/>
000032  For complete documentation, run: info coreutils 'nl invocation'

od

  • バイナリファイルの内容を8進や16進で表示
    • デフォルト8進
    • デフォルト2バイト区切り
man od
...
       -t, --format=TYPE
              select output format or formats
  • -t <format>, --format=<format>
    • フォーマット指定
      • <format>:=(o|x)(1|2|4|8)
      • <format>:=c
  • -c
    • ASCII文字
    • -t cに同じ
  • -o
    • 8進(デフォルト)
      • 2ビット、3ビット、3ビットの3桁
    • -t o2に同じ
  • -x
    • 16進
    • -t x2に同じ
  • octal
od /etc/localtime
0000000 055124 063151 000062 000000 000000 000000 000000 000000
0000020 000000 000000 000000 002000 000000 002000 000000 000000
0000040 000000 004400 000000 002000 000000 006000 000200 000000
0000060 037327 070002 166727 170131 174330 070372 146731 170073
0000100 003733 170000 126733 170035 163334 170342 106335 170377
0000120 000403 000402 000402 000402 000002 101400 000003 000000
0000140 106000 000640 000004 077000 000220 000010 077000 000220
0000160 046010 052115 045000 052104 045000 052123 000000 000000
0000200 000001 000000 052001 064532 031146 000000 000000 000000
0000220 000000 000000 000000 000000 000000 000000 000004 000000
0000240 000004 000000 000000 000000 000012 000000 000004 000000
0000260 174014 000000 000000 000000 177400 177777 062777 122302
0000300 177560 177777 153777 001076 177560 177777 153777 054755
0000320 177760 177777 154377 175370 177560 177777 154777 035715
0000340 177760 177777 155777 000007 177760 177777 155777 016655
0000360 177760 177777 156377 161346 177760 177777 156777 177614
0000400 000360 000403 000402 000402 000402 000002 101400 000003
0000420 000000 106000 000640 000004 077000 000220 000010 077000
0000440 000220 046010 052115 045000 052104 045000 052123 000000
0000460 000000 000001 000000 005001 051512 026524 005071
0000476
  • hex
od -x /etc/localtime 
0000000 5a54 6669 0032 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0400 0000 0400 0000 0000
0000040 0000 0900 0000 0400 0000 0c00 0080 0000
0000060 3ed7 7002 edd7 f059 f8d8 70fa cdd9 f03b
0000100 07db f000 addb f01d e6dc f0e2 8cdd f0ff
0000120 0103 0102 0102 0102 0002 8300 0003 0000
0000140 8c00 01a0 0004 7e00 0090 0008 7e00 0090
0000160 4c08 544d 4a00 5444 4a00 5453 0000 0000
0000200 0001 0000 5401 695a 3266 0000 0000 0000
0000220 0000 0000 0000 0000 0000 0000 0004 0000
0000240 0004 0000 0000 0000 000a 0000 0004 0000
0000260 f80c 0000 0000 0000 ff00 ffff 65ff a4c2
0000300 ff70 ffff d7ff 023e ff70 ffff d7ff 59ed
0000320 fff0 ffff d8ff faf8 ff70 ffff d9ff 3bcd
0000340 fff0 ffff dbff 0007 fff0 ffff dbff 1dad
0000360 fff0 ffff dcff e2e6 fff0 ffff ddff ff8c
0000400 00f0 0103 0102 0102 0102 0002 8300 0003
0000420 0000 8c00 01a0 0004 7e00 0090 0008 7e00
0000440 0090 4c08 544d 4a00 5444 4a00 5453 0000
0000460 0000 0001 0000 0a01 534a 2d54 0a39
0000476
  • ASCII
od -t c /etc/localtime 
0000000   T   Z   i   f   2  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0 004  \0  \0  \0 004  \0  \0  \0  \0
0000040  \0  \0  \0  \t  \0  \0  \0 004  \0  \0  \0  \f 200  \0  \0  \0
0000060 327   > 002   p 327 355   Y 360 330 370 372   p 331 315   ; 360
0000100 333  \a  \0 360 333 255 035 360 334 346 342 360 335 214 377 360
0000120 003 001 002 001 002 001 002 001 002  \0  \0 203 003  \0  \0  \0
0000140  \0 214 240 001 004  \0  \0   ~ 220  \0  \b  \0  \0   ~ 220  \0
0000160  \b   L   M   T  \0   J   D   T  \0   J   S   T  \0  \0  \0  \0
0000200 001  \0  \0  \0 001   T   Z   i   f   2  \0  \0  \0  \0  \0  \0
0000220  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 004  \0  \0  \0
0000240 004  \0  \0  \0  \0  \0  \0  \0  \n  \0  \0  \0 004  \0  \0  \0
0000260  \f 370  \0  \0  \0  \0  \0  \0  \0 377 377 377 377   e 302 244
0000300   p 377 377 377 377 327   > 002   p 377 377 377 377 327 355   Y
0000320 360 377 377 377 377 330 370 372   p 377 377 377 377 331 315   ;
0000340 360 377 377 377 377 333  \a  \0 360 377 377 377 377 333 255 035
0000360 360 377 377 377 377 334 346 342 360 377 377 377 377 335 214 377
0000400 360  \0 003 001 002 001 002 001 002 001 002  \0  \0 203 003  \0
0000420  \0  \0  \0 214 240 001 004  \0  \0   ~ 220  \0  \b  \0  \0   ~
0000440 220  \0  \b   L   M   T  \0   J   D   T  \0   J   S   T  \0  \0
0000460  \0  \0 001  \0  \0  \0 001  \n   J   S   T   -   9  \n
0000476
  • ファイル先頭表示
man head
...
       -c, --bytes=[-]NUM
              print the first NUM bytes  of  each  file;  with  the
              leading '-', print all but the last NUM bytes of each
              file

       -n, --lines=[-]NUM
              print the first NUM lines instead of  the  first  10;
              with  the  leading  '-',  print  all but the last NUM
              lines of each file
...
  • -<NUM>, -n [-]<NUM>, --lines=[-]<NUM>
    • 行数指定
      • <NUM>は正の整数
      • 非推奨
    • -nもしくは--lines-をつけて数値指定すると、末尾から指定の行数除いたものを出力
      • tail -n <NUM>と相補的な結果になる
    • 指定なきとき10行
  • -c <NUM>, --bytes=<NUM>
    • バイト数指定
    • KBとか使えるっぽい
       NUM may have a multiplier suffix: b 512, kB 1000, K 1024, MB
       1000*1000, M 1024*1024, GB 1000*1000*1000, G 1024*1024*1024,
       and so on for T, P, E, Z, Y.
  • ぜんぶ
cat /etc/crontab
# /etc/crontab: system-wide crontab
# Unlike any other crontab you don't have to run the `crontab'
# command to install the new version when you edit this file
# and files in /etc/cron.d. These files also have username fields,
# that none of the other crontabs do.

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# m h dom mon dow user  command
17 *    * * *   root    cd / && run-parts --report /etc/cron.hourly
25 6    * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6    * * 7   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6    1 * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )
#
  • 先頭5行
head -n 5 /etc/crontab
# /etc/crontab: system-wide crontab
# Unlike any other crontab you don't have to run the `crontab'
# command to install the new version when you edit this file
# and files in /etc/cron.d. These files also have username fields,
# that none of the other crontabs do.
  • 末尾5行以外
head -n -5 /etc/crontab
# /etc/crontab: system-wide crontab
# Unlike any other crontab you don't have to run the `crontab'
# command to install the new version when you edit this file
# and files in /etc/cron.d. These files also have username fields,
# that none of the other crontabs do.

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# m h dom mon dow user  command
  • 先頭64バイト
head -c 64 /etc/crontab
# /etc/crontab: system-wide crontab
# Unlike any other crontab y

tail

  • ファイル末尾
  • headで指定できることはだいたいできる
man tail
...
       -c, --bytes=[+]NUM
              output the last NUM bytes; or use -c +NUM  to  output
              starting with byte NUM of each file

       -f, --follow[={name|descriptor}]
              output appended data as the file grows;

              an absent option argument means 'descriptor'

       -F     same as --follow=name --retry

       -n, --lines=[+]NUM
              output the last NUM lines, instead of the last 10; or
              use -n +NUM to output starting with line NUM
...
  • リネームを伴うfollowオプションの話
    • -F使え
       With  --follow  (-f),  tail  defaults  to following the file
       descriptor, which means that  even  if  a  tail'ed  file  is
       renamed,  tail will continue to track its end.  This default
       behavior is not desirable when you really want to track  the
       actual  name of the file, not the file descriptor (e.g., log
       rotation).  Use --follow=name in  that  case.   That  causes
       tail  to  track  the  named  file in a way that accommodates
       renaming, removal and creation.
  • +<NUM>
    • 頭から<NUM>-1行除いたものを表示
    • head -n <NUM>等と相補的に見せかけてちょっと違う
  • -f, --follow
    • 末尾に追加された行を表示し続ける
    • Ctrl + Cで終了

cut

  • 射影演算的なやつ
  • 主なオプション
man cut
...
       -c, --characters=LIST
              select only these characters

       -d, --delimiter=DELIM
              use DELIM instead of TAB for field delimiter

       -f, --fields=LIST
              select  only  these fields;  also print any line that
              contains no delimiter character, unless the -s option
              is specified
...
  • <LIST>はカンマ区切りで複数の列または列区間(閉区間)を指定できる
       Use one, and only one of -b, -c or -f.  Each LIST is made up
       of one range, or many ranges separated by commas.   Selected
       input  is  written in the same order that it is read, and is
       written exactly once.  Each range is one of:

       N      N'th byte, character or field, counted from 1

       N-     from N'th byte, character or field, to end of line

       N-M    from N'th to M'th (included) byte, character or field

       -M     from first to  M'th  (included)  byte,  character  or
              field
  • -c <LIST>, --characters=<LIST>
    • 取り出す文字位置指定
  • -d<DELIM>, --delimiter=<DELIM>
    • MySQL-pよろしく、スペース不要
      • あってもいい
    • <DELIM>は1文字
  • -f <LIST>, --fields=<LIST>
    • 取り出すフィールド指定
cut -d:: -f 2 /etc/passwd
/usr/bin/cut: 区切り文字に指定できるのは 1 文字だけです
Try '/usr/bin/cut --help' for more information.
  • 各ユーザのユーザ名とホームディレクトリとデフォルトシェルを得る
cut -d: -f 1,6-7 /etc/passwd
root:/root:/bin/bash
daemon:/usr/sbin:/usr/sbin/nologin
bin:/bin:/usr/sbin/nologin
sys:/dev:/usr/sbin/nologin
sync:/bin:/bin/sync
games:/usr/games:/usr/sbin/nologin
man:/var/cache/man:/usr/sbin/nologin
lp:/var/spool/lpd:/usr/sbin/nologin
mail:/var/mail:/usr/sbin/nologin
news:/var/spool/news:/usr/sbin/nologin
uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:/bin:/usr/sbin/nologin
www-data:/var/www:/usr/sbin/nologin
backup:/var/backups:/usr/sbin/nologin
list:/var/list:/usr/sbin/nologin
irc:/var/run/ircd:/usr/sbin/nologin
gnats:/var/lib/gnats:/usr/sbin/nologin
nobody:/nonexistent:/usr/sbin/nologin
systemd-timesync:/run/systemd:/bin/false
systemd-network:/run/systemd/netif:/bin/false
systemd-resolve:/run/systemd/resolve:/bin/false
syslog:/home/syslog:/bin/false
messagebus:/var/run/dbus:/bin/false
_apt:/nonexistent:/bin/false
uuidd:/run/uuidd:/bin/false
rtkit:/proc:/bin/false
avahi-autoipd:/var/lib/avahi-autoipd:/bin/false
usbmux:/var/lib/usbmux:/bin/false
dnsmasq:/var/lib/misc:/bin/false
whoopsie:/nonexistent:/bin/false
kernoops:/:/bin/false
speech-dispatcher:/var/run/speech-dispatcher:/bin/false
avahi:/var/run/avahi-daemon:/bin/false
saned:/var/lib/saned:/bin/false
pulse:/var/run/pulse:/bin/false
colord:/var/lib/colord:/bin/false
hplip:/var/run/hplip:/bin/false
geoclue:/var/lib/geoclue:/bin/false
gdm:/var/lib/gdm3:/bin/false
wand:/home/wand:/bin/bash
cups-pk-helper:/home/cups-pk-helper:/usr/sbin/nologin
gnome-initial-setup:/run/gnome-initial-setup/:/bin/false
hoge:/home/hoge:/bin/sh
sshd:/run/sshd:/usr/sbin/nologin
  • ls -lの、タイムスタンプ以降の部分を表示
ls -l | cut -c 46

join

  • Ver.5でどっかいった
  • SQLでいうところのINNER JOIN

products.txt

1 orange
2 banana
3 apple
4 melon

quantities.txt

1 10
2 3
3 5
join products.txt quantities.txt 
1 orange 10
2 banana 3
3 apple 5

paste

  • ファイルを行ごとに水平連結
    • cf. catは追記
  • デフォルトタブ区切り
paste quantities.txt  products.txt 
1 10 1 orange
2 3 2 banana
3 5 3 apple
    4 melon
  • -d <LIST>, --delimiters=<LIST>
    • デリミタ指定
paste -d: quantities.txt  products.txt 
1 10:1 orange
2 3:2 banana
3 5:3 apple
:4 melon
  • デリミタ複数指定
paste -d:, quantities.txt  products.txt  quantities.txt
1 10:1 orange,1 10
2 3:2 banana,2 3
3 5:3 apple,3 5
:4 melon,

tr

  • Translateの意か
TR(1)                      User Commands                      TR(1)

NAME
       tr - translate or delete characters
  • 主なオプション
man tr
...
SYNOPSIS
       tr [OPTION]... SET1 [SET2]

DESCRIPTION
       Translate,  squeeze,  and/or delete characters from standard
       input, writing to standard output.
...

       -d, --delete
              delete characters in SET1, do not translate

       -s, --squeeze-repeats
              replace each sequence of a repeated character that is
              listed  in  the  last  specified  SET,  with a single
              occurrence of that character
...              
  • オプション無し
    • 文字列<SET1>でマッチした文字列を<SET2>で置換
      • <SET2>は空文字列NG
        • 削除を行いたい場合は-d
  • -d, --delete
    • 置換ではなく、文字列<SET1>でマッチした文字列の削除
  • -s, --squeeze-repeats
    • 連続するパターン文字列を1文字として処理
    • ls -lの空白をまとめるなど
  • 文字クラス
文字クラス 説明
[:alpha:] 英字
[:lower:] 英小文字
[:upper:] 英大文字
[:digit:] 数字
[:alnum:] 英数字
[:space:] スペース
  • ls -lのタイムスタンプ以降を表示する例
ls -l | tr -s [:space:] | cut -d ' ' -f 6-
2019-04-18 09:27 products.txt
2019-04-18 09:25 quantities.txt

sort

  • 行単位でソート
man sort
...
       -b, --ignore-leading-blanks
              ignore leading blanks
...
       -f, --ignore-case
              fold lower case to upper case characters
...
       -r, --reverse
              reverse the result of comparisons
...
       -n, --numeric-sort
              compare according to string numerical value
...
       -k, --key=KEYDEF
              sort via a key; KEYDEF gives location and type
  • 主なオプション
    • -b, --ignore-leading-blanks
      • 行頭の空白無視
    • -f, --ignore-case
      • 大文字小文字の区別無視
    • -r, --reverse
      • 逆順(降順)ソート
    • -n, --numeric-sort
      • 数字を数値扱いしてソート
      • cf. -h, --human-numeric-sort
        • 接頭辞を考慮(KとかGとか)
    • -k, --key=KEYDEF
      • ソートに用いるキー列を指定
    • -c, --check
      • ソートを行わない
      • ソート済かどうか確認する
    • -u, --unique
      • ソートのち重複行の最初の同一行のみ出力
      • -cを併用した場合、下記を確認する
        • ソート済であること
        • かつ、重複行のなきこと
cat sort_sample.txt
hoge
piyo
fuga
hoge
hoge
sort sort_sample.txt | tee sort_sample_sorted.txt
fuga
hoge
hoge
hoge
piyo
sort -u sort_sample.txt | tee sort_sample_uniqued.txt
fuga
hoge
piyo
  • ソート済確認
sort -c sort_sample.txt
/usr/bin/sort: sort_sample.txt:3: 順序が不規則: fuga
sort -c sort_sample_sorted.txt





sort -cu sort_sample_sorted.txt
/usr/bin/sort: sort_sample_sorted.txt:3: 順序が不規則: hoge
sort -cu sort_sample_uniqued.txt



split

  • 指定のサイズごとにファイル分割
SPLIT(1)                                                   User Commands                                                   SPLIT(1)

NAME
       split - split a file into pieces

SYNOPSIS
       split [OPTION]... [FILE [PREFIX]]
split -5 /etc/crontab crontab_
ls crontab_*
crontab_aa  crontab_ab  crontab_ac

uniq

  • デフォルト: 入力テキストストリームの重複行を1行にまとめる
    • ソート済であること
cat sort_sample.txt
hoge
piyo
fuga
hoge
hoge
uniq sort_sample.txt
  • 連続している部分のみ重複が取り除かれる
hoge
piyo
fuga
hoge
  • hogeを1つにまとめたい場合はsortしておくこと
  • sortしてからuniqするくらいならsort -uでいい説
sort sort_sample.txt | uniq
# sort -u sort_sample.txt # おなじ
fuga
hoge
piyo
  • オプション
man uniq
...
       -d, --repeated
              only print duplicate lines, one for each group
...
       -u, --unique
              only print unique lines
...              
  • -d, --repeated
    • 重複行(duplicate lines)のみ表示
  • -u, --unique
    • 重複していない行(unique lines)のみ表示
sort sort_sample.txt | uniq -d
hoge
sort sort_sample.txt | uniq -u
fuga
piyo

pr

  • LPIC ver.5.0でどっかいった
man pr
PR(1)                                                      User Commands                                                      PR(1)

NAME
       pr - convert text files for printing

SYNOPSIS
       pr [OPTION]... [FILE]...

DESCRIPTION
       Paginate or columnate FILE(s) for printing.

       With no FILE, or when FILE is -, read standard input.

       Mandatory arguments to long options are mandatory for short options too.

       +FIRST_PAGE[:LAST_PAGE], --pages=FIRST_PAGE[:LAST_PAGE]
              begin [stop] printing with page FIRST_[LAST_]PAGE
  • ファイルを印刷用に表示するやつ
  • prコマンドのオンラインマニュアルの1-2ページを印刷用に表示
man pr | pr +1:2

2019-04-18 18:51                                                1 ページ


PR(1)                            User Commands                           PR(1)

NAME
       pr - convert text files for printing

SYNOPSIS
       pr [OPTION]... [FILE]...

DESCRIPTION
       Paginate or columnate FILE(s) for printing.

       With no FILE, or when FILE is -, read standard input.

       Mandatory  arguments  to  long  options are mandatory for short options
       too.

       +FIRST_PAGE[:LAST_PAGE], --pages=FIRST_PAGE[:LAST_PAGE]
              begin [stop] printing with page FIRST_[LAST_]PAGE

       -COLUMN, --columns=COLUMN
              output COLUMN columns and print columns down, unless -a is used.
              Balance number of lines in the columns on each page

       -a, --across
              print  columns across rather than down, used together with -COL‐
              UMN

       -c, --show-control-chars
              use hat notation (^G) and octal backslash notation

       -d, --double-space
              double space the output

       -D, --date-format=FORMAT
              use FORMAT for the header date

       -e[CHAR[WIDTH]], --expand-tabs[=CHAR[WIDTH]]
              expand input CHARs (TABs) to tab WIDTH (8)

       -F, -f, --form-feed
              use form feeds instead of  newlines  to  separate  pages  (by  a
              3-line  page header with -F or a 5-line header and trailer with‐
              out -F)

       -h, --header=HEADER
              use a centered HEADER instead of filename in page header, -h  ""
              prints a blank line, don't use -h""

       -i[CHAR[WIDTH]], --output-tabs[=CHAR[WIDTH]]
              replace spaces with CHARs (TABs) to tab WIDTH (8)

       -J, --join-lines
              merge full lines, turns off -W line truncation, no column align‐
              ment, --sep-string[=STRING] sets separators

       -l, --length=PAGE_LENGTH







2019-04-18 18:51                                                2 ページ


              set the page length to PAGE_LENGTH (66) lines (default number of
              lines of text 56, and with -F 63).  implies -t if PAGE_LENGTH <=
              10

       -m, --merge
              print all files in parallel, one in each column, truncate lines,
              but join lines of full length with -J

       -n[SEP[DIGITS]], --number-lines[=SEP[DIGITS]]
              number  lines,  use  DIGITS  (5) digits, then SEP (TAB), default
              counting starts with 1st line of input file

       -N, --first-line-number=NUMBER
              start counting with NUMBER at 1st line  of  first  page  printed
              (see +FIRST_PAGE)

       -o, --indent=MARGIN
              offset  each line with MARGIN (zero) spaces, do not affect -w or
              -W, MARGIN will be added to PAGE_WIDTH

       -r, --no-file-warnings
              omit warning when a file cannot be opened

       -s[CHAR], --separator[=CHAR]
              separate columns by a single character, default for CHAR is  the
              <TAB>  character  without  -w  and  'no char' with -w.  -s[CHAR]
              turns off line truncation of all 3  column  options  (-COLUMN|-a
              -COLUMN|-m) except -w is set

       -S[STRING], --sep-string[=STRING]
              separate  columns by STRING, without -S: Default separator <TAB>
              with -J and <space> otherwise (same as -S" "), no effect on col‐
              umn options

       -t, --omit-header
              omit page headers and trailers; implied if PAGE_LENGTH <= 10

       -T, --omit-pagination
              omit page headers and trailers, eliminate any pagination by form
              feeds set in input files

       -v, --show-nonprinting
              use octal backslash notation

       -w, --width=PAGE_WIDTH
              set page  width  to  PAGE_WIDTH  (72)  characters  for  multiple
              text-column output only, -s[char] turns off (72)

       -W, --page-width=PAGE_WIDTH
              set  page  width  to PAGE_WIDTH (72) characters always, truncate
              lines, except -J option is set, no interference with -S or -s

       --help display this help and exit

       --version
              output version information and exit




fmt

  • LPIC ver5.0でどっかいった
man fmt
FMT(1)                                                     User Commands                                                     FMT(1)

NAME
       fmt - simple optimal text formatter

SYNOPSIS
       fmt [-WIDTH] [OPTION]... [FILE]...

DESCRIPTION
       Reformat  each  paragraph  in  the  FILE(s),  writing  to  standard  output.   The  option  -WIDTH is an abbreviated form of
       --width=DIGITS.
  • テキストフォーマッタ
  • 行あたりの文字数制限等
cat /etc/crontab | fmt -20
# /etc/crontab:
system-wide crontab
# Unlike any other
crontab you don't
have to run the
`crontab' # command
to install the new
version when you
edit this file
...
  • 【備忘録】文書中の単語をリストアップするのに使えそう
    • プログラム中の識別子とか
cat /etc/crontab | fmt -1 | sort -u
#
&&
(
)
*
--report
-x
/
/etc/cron.d.
/etc/cron.daily
/etc/cron.hourly
/etc/cron.monthly
/etc/cron.weekly
/etc/crontab:
/usr/sbin/anacron
1
17
25
47
52
6
7
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
SHELL=/bin/sh
These
Unlike
`crontab'
also
and
any
cd
command
crontab
crontabs
do.
dom
don't
dow
edit
fields,
file
files
h
have
in
install
m
mon
new
none
of
other
root
run
run-parts
system-wide
test
that
the
this
to
user
username
version
when
you
||

expand

  • LPIC ver5.0でどっかいった
  • untabifyするやつ
man expand
EXPAND(1)                                                  User Commands                                                  EXPAND(1)

NAME
       expand - convert tabs to spaces

SYNOPSIS
       expand [OPTION]... [FILE]...
...
       -i, --initial
              do not convert tabs after non blanks

       -t, --tabs=N
              have tabs N characters apart, not 8
...
  • 主なオプション
    • -i, --initial
      • 行頭のタブのみ対象
    • -t N, --tabs=N
      • 何スペにするか指定
      • デフォルト8スペ
cat tab_sample.txt 
 hoge
    piyo    fuga
  • デフォルト、全タブが8スペ揃えになる
    • piyoが4文字なので、fugaの前のは(8-4)=4スペ
expand tab_sample.txt 
        hoge
        piyo    fuga
  • 2スペ
expand -t 2 tab_sample.txt 
  hoge
  piyo  fuga
  • 行頭のみ4スペ
expand -t 4 -i tab_sample.txt 
    hoge
    piyo    fuga

unexpand

  • LPIC ver5.0でどっかいった
  • tabifyするやつ
man unexpand
UNEXPAND(1)                   User Commands                   UNEXPAND(1)

NAME
       unexpand - convert spaces to tabs

SYNOPSIS
       unexpand [OPTION]... [FILE]...
...
...
       -a, --all
              convert all blanks, instead of just initial blanks
...
       -t, --tabs=N
              have tabs N characters apart instead of 8 (enables -a)
...
  • 主なオプション
    • -a, --all
      • 全てのスペースをタブに
      • cf. デフォルトでは行頭のみ
    • -t N, --tabs=N
      • 何スペをタブにするか
      • デフォルト8スペ

wc

  • word count
  • word以外にもいろいろ数えられる
    • 行数
    • 文字数
  • デフォルト:行数、単語数、文字数出力
wc /etc/crontab
 15 124 722 /etc/crontab
  • 個別
wc -l /etc/crontab
wc -w /etc/crontab
wc -c /etc/crontab
15 /etc/crontab
124 /etc/crontab
722 /etc/crontab

xargs

  • 標準入力から受け取った文字列を、与えられたコマンドに引数として流し込む
    • cf. 標準入力
  • サンプル

xargs_sample/sample.txt, xargs_sample/dir/sample.txt

abc
def
find xargs_sample -name 'sample.txt'
xargs_sample/sample.txt
xargs_sample/dir/sample.txt
  • パイプ = 標準入力に流し込む場合との比較
find xargs_sample -name 'sample.txt' | echo
  • echoは標準入力されてもなにもしない



  • catは標準入力をそのまま表示する
find xargs_sample -name 'sample.txt' | cat
xargs_sample/sample.txt
xargs_sample/dir/sample.txt
  • xargsで引数として流し込む場合
find xargs_sample -name 'sample.txt' | xargs echo
  • echoは引数をそのまま表示する
xargs_sample/sample.txt xargs_sample/dir/sample.txt
  • catは引数をファイル名として解釈し、中身を表示する
    • 2つ指定されているのでconcatenateされる
find xargs_sample -name 'sample.txt' | xargs cat
abc
def
abc
def
  • パイプはこれと同じことが起きている
cat << EOF
> xargs_sample/sample.txt
> xargs_sample/dir/sample.txt
> EOF
xargs_sample/sample.txt
xargs_sample/dir/sample.txt
  • xargsはこれと同じことが起きている
cat xargs_sample/sample.txt xargs_sample/dir/sample.txt
abc
def
abc
def

zcat, bzcat, xzcat

  • それぞれgzip, bzip2, xzで圧縮されたファイルを伸張・catするやつ

ファイルのチェックサム

  • ハッシュ関数
    • ファイルを一定の長さの文字列にする
      • 全単射ではない
      • 元のデータを復元することはできない
  • 種類いっぱい
    • md5sum
    • sha1sum
    • sha256sum
    • sha512sum
  • オプション