README.md 8.08 KB
Newer Older
Jerry Cheung's avatar
Jerry Cheung committed
1
# HTML::Pipeline [![Build Status](https://secure.travis-ci.org/jch/html-pipeline.png)](http://travis-ci.org/jch/html-pipeline)
2
3
4

GitHub HTML processing filters and utilities. This module includes a small
framework for defining DOM based content filters and applying them to user
Jerry Cheung's avatar
Jerry Cheung committed
5
6
provided content. Read an introduction about this project in
[this blog post](https://github.com/blog/1311-html-pipeline-chainable-content-filters).
7
8
9
10
11

## Installation

Add this line to your application's Gemfile:

12
13
14
```ruby
gem 'html-pipeline'
```
15
16
17

And then execute:

18
19
20
```sh
$ bundle
```
21
22
23

Or install it yourself as:

24
25
26
```sh
$ gem install html-pipeline
```
27
28
29

## Usage

30
31
32
33
This library provides a handful of chainable HTML filters to transform user
content into markup. A filter takes an HTML string or
`Nokogiri::HTML::DocumentFragment`, optionally manipulates it, and then
outputs the result.
Chris Wanstrath's avatar
Chris Wanstrath committed
34

35
For example, to transform Markdown source into Markdown HTML:
Chris Wanstrath's avatar
Chris Wanstrath committed
36

37
```ruby
Jerry Cheung's avatar
Jerry Cheung committed
38
39
require 'html/pipeline'

Jerry Cheung's avatar
Jerry Cheung committed
40
41
filter = HTML::Pipeline::MarkdownFilter.new("Hi **world**!")
filter.call
42
```
Chris Wanstrath's avatar
Chris Wanstrath committed
43

44
45
46
47
Filters can be combined into a pipeline which causes each filter to hand its
output to the next filter's input. So if you wanted to have content be
filtered through Markdown and be syntax highlighted, you can create the
following pipeline:
Chris Wanstrath's avatar
Chris Wanstrath committed
48

49
```ruby
Jerry Cheung's avatar
Jerry Cheung committed
50
pipeline = HTML::Pipeline.new [
51
52
53
  HTML::Pipeline::MarkdownFilter,
  HTML::Pipeline::SyntaxHighlightFilter
]
54
result = pipeline.call <<-CODE
55
This is *great*:
Jerry Cheung's avatar
Jerry Cheung committed
56

Matt Todd's avatar
Matt Todd committed
57
    some_code(:first)
Jerry Cheung's avatar
Jerry Cheung committed
58
59

CODE
60
result[:output].to_s
61
```
Chris Wanstrath's avatar
Chris Wanstrath committed
62
63
64

Prints:

65
66
```html
<p>This is <em>great</em>:</p>
Chris Wanstrath's avatar
Chris Wanstrath committed
67

68
69
70
71
72
<div class="highlight">
<pre><span class="n">some_code</span><span class="p">(</span><span class="ss">:first</span><span class="p">)</span>
</pre>
</div>
```
73

74
75
76
77
78
79
Some filters take an optional **context** and/or **result** hash. These are
used to pass around arguments and metadata between filters in a pipeline. For
example, if you want don't want to use GitHub formatted Markdown, you can
pass an option in the context hash:

```ruby
Jerry Cheung's avatar
Jerry Cheung committed
80
81
filter = HTML::Pipeline::MarkdownFilter.new("Hi **world**!", :gfm => false)
filter.call
82
83
```

84
85
86
## Filters

* `MentionFilter` - replace `@user` mentions with links
87
* `AbsoluteSourceFilter` - replace relative image urls with fully qualified versions
88
* `AutoLinkFilter` - auto_linking urls in HTML
Jerry Cheung's avatar
Jerry Cheung committed
89
* `CamoFilter` - replace http image urls with [camo-fied](https://github.com/atmos/camo) https versions
90
91
* `EmailReplyFilter` - util filter for working with emails
* `EmojiFilter` - everyone loves [emoji](http://www.emoji-cheat-sheet.com/)!
92
* `HttpsFilter` - HTML Filter for replacing http github urls with https versions.
93
94
95
* `ImageMaxWidthFilter` - link to full size image for large images
* `MarkdownFilter` - convert markdown to html
* `PlainTextInputFilter` - html escape text and wrap the result in a div
Pascal Borreli's avatar
Pascal Borreli committed
96
* `SanitizationFilter` - whitelist sanitize user markup
97
* `SyntaxHighlightFilter` - [code syntax highlighter](#syntax-highlighting)
98
99
100
* `TextileFilter` - convert textile to html
* `TableOfContentsFilter` - anchor headings with name attributes

101
102
103
104
105
106
107
108
109
110
111
112
## Syntax highlighting

`SyntaxHighlightFilter` uses [github-linguist](https://github.com/github/linguist)
to detect and highlight languages. It isn't included as a dependency by default
because it's a large dependency and
[a hassle to build on heroku](https://github.com/jch/html-pipeline/issues/33).
To use the filter, add the following to your Gemfile:

```ruby
gem 'github-linguist'
```

Jerry Cheung's avatar
Jerry Cheung committed
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
## Examples

We define different pipelines for different parts of our app. Here are a few
paraphrased snippets to get you started:

```ruby
# The context hash is how you pass options between different filters.
# See individual filter source for explanation of options.
context = {
  :asset_root => "http://your-domain.com/where/your/images/live/icons",
  :base_url   => "http://your-domain.com"
}

# Pipeline providing sanitization and image hijacking but no mention
# related features.
SimplePipeline = Pipeline.new [
  SanitizationFilter,
  TableOfContentsFilter, # add 'name' anchors to all headers
  CamoFilter,
  ImageMaxWidthFilter,
  SyntaxHighlightFilter,
  EmojiFilter,
  AutolinkFilter
136
], context
Jerry Cheung's avatar
Jerry Cheung committed
137
138
139
140
141
142
143
144
145
146
147

# Pipeline used for user provided content on the web
MarkdownPipeline = Pipeline.new [
  MarkdownFilter,
  SanitizationFilter,
  CamoFilter,
  ImageMaxWidthFilter,
  HttpsFilter,
  MentionFilter,
  EmojiFilter,
  SyntaxHighlightFilter
148
], context.merge(:gfm => true) # enable github formatted markdown
Jerry Cheung's avatar
Jerry Cheung committed
149
150
151
152


# Define a pipeline based on another pipeline's filters
NonGFMMarkdownPipeline = Pipeline.new(MarkdownPipeline.filters,
153
  context.merge(:gfm => false))
Jerry Cheung's avatar
Jerry Cheung committed
154
155
156
157
158

# Pipelines aren't limited to the web. You can use them for email
# processing also.
HtmlEmailPipeline = Pipeline.new [
  ImageMaxWidthFilter
159
], {}
Jerry Cheung's avatar
Jerry Cheung committed
160
161
162
163
164

# Just emoji.
EmojiPipeline = Pipeline.new [
  HTMLInputFilter,
  EmojiFilter
165
], context
Jerry Cheung's avatar
Jerry Cheung committed
166
167
```

168
## Extending
Ben Ubois's avatar
Ben Ubois committed
169
170
171
172
To write a custom filter, you need a class with a `call` method that inherits
from `HTML::Pipeline::Filter`.

For example this filter adds a base url to images that are root relative:
173
174
175
176
177
178
179

```ruby
require 'uri'

class RootRelativeFilter < HTML::Pipeline::Filter

  def call
Matt Todd's avatar
Matt Todd committed
180
    doc.search("img").each do |img|
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
      next if img['src'].nil?
      src = img['src'].strip
      if src.start_with? '/'
        img["src"] = URI.join(context[:base_url], src).to_s
      end
    end
    doc
  end

end
```

Now this filter can be used in a pipeline:

```ruby
Ben Ubois's avatar
Ben Ubois committed
196
Pipeline.new [ RootRelativeFilter ], { :base_url => 'http://somehost.com' }
197
198
```

Matt Todd's avatar
Matt Todd committed
199
200
## Instrumenting

201
202
203
204
Filters and Pipelines can be set up to be instrumented when called. The pipeline
must be setup with an [ActiveSupport::Notifications]
(http://api.rubyonrails.org/classes/ActiveSupport/Notifications.html)
compatible service object and a name. New pipeline objects will default to the
Matt Todd's avatar
Matt Todd committed
205
206
207
208
209
210
211
212
`HTML::Pipeline.default_instrumentation_service` object.

``` ruby
# the AS::Notifications-compatible service object
service = ActiveSupport::Notifications

# instrument a specific pipeline
pipeline = HTML::Pipeline.new [MarkdownFilter], context
213
pipeline.setup_instrumentation "MarkdownPipeline", service
Matt Todd's avatar
Matt Todd committed
214

215
# or set default instrumentation service for all new pipelines
Matt Todd's avatar
Matt Todd committed
216
HTML::Pipeline.default_instrumentation_service = service
217
218
pipeline = HTML::Pipeline.new [MarkdownFilter], context
pipeline.setup_instrumentation "MarkdownPipeline"
Matt Todd's avatar
Matt Todd committed
219
220
221
222
223
224
225
226
227
```

Filters are instrumented when they are run through the pipeline. A
`call_filter.html_pipeline` event is published once the filter finishes. The
`payload` should include the `filter` name. Each filter will trigger its own
instrumentation call.

``` ruby
service.subscribe "call_filter.html_pipeline" do |event, start, ending, transaction_id, payload|
228
  payload[:pipeline] #=> "MarkdownPipeline", set with `setup_instrumentation`
Matt Todd's avatar
Matt Todd committed
229
  payload[:filter] #=> "MarkdownFilter"
Matt Todd's avatar
Matt Todd committed
230
231
  payload[:context] #=> context Hash
  payload[:result] #=> instance of result class
Matt Todd's avatar
Matt Todd committed
232
  payload[:result][:output] #=> output HTML String or Nokogiri::DocumentFragment
Matt Todd's avatar
Matt Todd committed
233
234
235
end
```

Matt Todd's avatar
Matt Todd committed
236
237
238
239
The full pipeline is also instrumented:

``` ruby
service.subscribe "call_pipeline.html_pipeline" do |event, start, ending, transaction_id, payload|
240
  payload[:pipeline] #=> "MarkdownPipeline", set with `setup_instrumentation`
Matt Todd's avatar
Matt Todd committed
241
  payload[:filters] #=> ["MarkdownFilter"]
Matt Todd's avatar
Matt Todd committed
242
243
244
  payload[:doc] #=> HTML String or Nokogiri::DocumentFragment
  payload[:context] #=> context Hash
  payload[:result] #=> instance of result class
Matt Todd's avatar
Matt Todd committed
245
  payload[:result][:output] #=> output HTML String or Nokogiri::DocumentFragment
Matt Todd's avatar
Matt Todd committed
246
247
248
end
```

Jerry Cheung's avatar
Jerry Cheung committed
249
250
251
252
## Documentation

Full reference documentation can be [found here](http://rubydoc.info/gems/html-pipeline/frames).

253
254
255
## Development

To see what has changed in recent versions, see the [CHANGELOG](https://github.com/jch/html-pipeline/blob/master/CHANGELOG.md).
Jerry Cheung's avatar
Jerry Cheung committed
256

257
```sh
258
bundle
Jerry Cheung's avatar
Jerry Cheung committed
259
260
261
rake test
```

262
263
## Contributing

264
1. [Fork it](https://help.github.com/articles/fork-a-repo)
265
266
267
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Added some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
268
5. Create new [Pull Request](https://help.github.com/articles/using-pull-requests)
269

Jerry Cheung's avatar
Jerry Cheung committed
270
271
## Contributors

Jerry Cheung's avatar
Jerry Cheung committed
272
Thanks to all of [these contributors](https://github.com/jch/html-pipeline/graphs/contributors).
273

Matt Enright's avatar
Matt Enright committed
274
Project is a member of the [OSS Manifesto](http://ossmanifesto.org/).