1
00:00:00,080 --> 00:00:06,640
you know it's pretty easy to take words on your computer screen and put them on

2
00:00:04,000 --> 00:00:10,880
a physical sheet of paper just click print and unless you've forgotten to

3
00:00:08,240 --> 00:00:16,560
fork out an extortion level amount of money for a new cartridge you'll have

4
00:00:13,120 --> 00:00:18,560
fresh warm satisfying documents just a

5
00:00:16,560 --> 00:00:22,960
few moments later but going in the opposite direction scanning dead tree

6
00:00:20,800 --> 00:00:27,359
information into your pc is actually quite a bit trickier i mean sure flatbed

7
00:00:25,519 --> 00:00:30,720
scanners aren't all that difficult to operate per se

8
00:00:28,800 --> 00:00:35,280
but many of them are basically just taking a picture of the document and

9
00:00:32,880 --> 00:00:39,760
saving it onto your pc meaning not only will it probably not look very crisp due

10
00:00:37,760 --> 00:00:44,640
to file compression and little bits of dust in your scanner but you can't edit

11
00:00:42,480 --> 00:00:48,239
a clean copy of your document in your favorite word processor because the

12
00:00:46,160 --> 00:00:50,960
scanner won't recognize each individual character

13
00:00:49,600 --> 00:00:55,920
fortunately there are a number of devices out there that enable optical

14
00:00:52,960 --> 00:01:00,480
character recognition or ocr where each character on a page is scanned

15
00:00:57,920 --> 00:01:05,920
individually so your papers are uploaded as actual text documents instead of

16
00:01:02,800 --> 00:01:08,479
messy jpegs but how exactly does that

17
00:01:05,920 --> 00:01:12,479
work and is one kind of optical scanner better than another well because the

18
00:01:10,240 --> 00:01:16,560
whole concept of translating text into electronic signal is pretty broad there

19
00:01:15,439 --> 00:01:21,840
have been lots of different implementations of ocr over the years in

20
00:01:19,280 --> 00:01:27,360
fact one of the earliest electric ocr devices the optophone was invented all

21
00:01:24,880 --> 00:01:33,200
the way back in 1914 this bizarre looking contraption relied

22
00:01:29,840 --> 00:01:35,439
on the special behavior of selenium

23
00:01:33,200 --> 00:01:39,920
which conducts electricity differently in light and darkness

24
00:01:37,520 --> 00:01:45,600
as it scanned the words on a page the optophone distinguished between the dark

25
00:01:42,320 --> 00:01:47,759
ink of text and lighter blank spaces

26
00:01:45,600 --> 00:01:53,280
generating tones that corresponded to different letters making it possible for

27
00:01:49,600 --> 00:01:56,000
blind people to read with some practice

28
00:01:53,280 --> 00:01:59,920
later in 1931 a machine was developed that could convert printed text to

29
00:01:58,079 --> 00:02:04,399
telegraph code one of the first technologies to translate printed

30
00:02:01,759 --> 00:02:09,840
characters to electrical impulses rather than sounds but it wasn't until the

31
00:02:06,520 --> 00:02:12,480
1960s and 70s that ocr began to take a

32
00:02:09,840 --> 00:02:17,440
more familiar modern form with postal services using ocr to read addresses and

33
00:02:15,680 --> 00:02:21,440
software that could recognize many different fonts

34
00:02:18,800 --> 00:02:25,760
so back to present day when you scan a document how exactly does the software

35
00:02:23,599 --> 00:02:30,720
know what it's looking at well the first step is to cut out artifacts so your ocr

36
00:02:28,319 --> 00:02:35,680
program can concentrate on the text and nothing else so it attempts to remove

37
00:02:32,879 --> 00:02:41,200
dust and other various graphics align the text properly and convert any colors

38
00:02:38,239 --> 00:02:45,440
or shades of grey in the image to black and white only making the words

39
00:02:42,959 --> 00:02:50,000
themselves easier to recognize the next step is to figure out which characters

40
00:02:47,120 --> 00:02:54,800
are on the page simpler forms of ocr compare each scanned letter pixel by

41
00:02:52,080 --> 00:03:00,400
pixel to a known database of fonts and decide on the closest match smarter ocr

42
00:02:58,080 --> 00:03:05,360
however takes this step farther by breaking down each character down to

43
00:03:02,640 --> 00:03:09,840
constituent elements like curves and corners and looking for matching

44
00:03:07,280 --> 00:03:13,120
physical features and actual letters you can think of the differences between

45
00:03:11,280 --> 00:03:17,040
these two approaches similarly to the difference between raster and vector

46
00:03:15,040 --> 00:03:22,080
images which you can learn more about up here ocr software can also make use of a

47
00:03:20,319 --> 00:03:26,319
dictionary so it won't accidentally spit out nonsense words due to inaccurate

48
00:03:24,400 --> 00:03:31,519
scanning for example if your scanner sees this but it can't quite tell

49
00:03:28,879 --> 00:03:36,560
whether the middle letter is an o or an a it can check its own dictionary to

50
00:03:33,760 --> 00:03:42,080
decide that the word is actually dog and not dag giving ocr software situational

51
00:03:40,000 --> 00:03:47,760
information can further cut down on errors such as telling it to only try to

52
00:03:44,959 --> 00:03:51,519
match numbers if it's reading zip codes on an envelope

53
00:03:49,200 --> 00:03:56,239
even with these tricks however ocr obviously is not perfect which you've

54
00:03:54,159 --> 00:04:00,640
probably seen for yourself if you've ever used it but with greater

55
00:03:58,879 --> 00:04:04,799
processing power and machine learning techniques that allow software to

56
00:04:02,400 --> 00:04:09,439
recognize more subtle patterns over time ocr has become versatile enough to

57
00:04:07,200 --> 00:04:13,360
recognize harder to read typefaces inconsistently printed material and even

58
00:04:12,000 --> 00:04:17,919
handwriting and free ocr cloud processing services

59
00:04:15,760 --> 00:04:22,240
like google drive which has a lot more machine learning capability than your

60
00:04:19,519 --> 00:04:27,600
home pc for which i hope are fairly obvious reasons have made ocr more

61
00:04:24,639 --> 00:04:31,199
accessible than ever no word yet though on whether google will take it a step

62
00:04:29,199 --> 00:04:34,960
further and launch google interpretive dance translator

63
00:04:32,960 --> 00:04:38,720
i don't know what i'm doing ignore me

64
00:04:36,400 --> 00:04:42,000
are you racing against the clock as a freelancer trying to start your

65
00:04:40,320 --> 00:04:45,040
challenging but rewarding interpretive dance company with the growth of the

66
00:04:43,759 --> 00:04:49,120
internet there's never been more opportunities for these self-employed to

67
00:04:47,280 --> 00:04:52,400
meet this need check out freshbooks cloud accounting software designed for

68
00:04:50,960 --> 00:04:56,479
the way that you work it's the simplest and easiest way

69
00:04:54,400 --> 00:05:00,720
to be more productive organized and more importantly get paid quickly

70
00:04:59,360 --> 00:05:05,120
you can create and send professional looking invoices in less than 30 seconds

71
00:05:03,199 --> 00:05:08,639
which is super important and you can set up online payments with

72
00:05:06,800 --> 00:05:13,360
just a couple clicks and get paid up to four days faster you can even see when a

73
00:05:11,280 --> 00:05:17,759
client has seen your invoice so there's no more guessing games freshbooks is

74
00:05:15,440 --> 00:05:22,320
offering a 30-day unrestricted free trial to our viewers to claim it go to

75
00:05:20,759 --> 00:05:25,520
freshbooks.comtechwiki and enter techquickie in the how did you hear

76
00:05:23,759 --> 00:05:30,320
about us section thanks for watching this video don't forget to like it or

77
00:05:27,600 --> 00:05:33,759
dislike it uh get subscribed check out our other channels

78
00:05:31,680 --> 00:05:37,919
and don't forget that i'm the worst dancer that has

79
00:05:38,560 --> 00:05:44,759
wow dennis just roasted the crap out of

80
00:05:41,759 --> 00:05:44,759
me
