1
00:00:00,320 --> 00:00:08,200
okay do you remember that project I was working on where for the better part of

2
00:00:03,639 --> 00:00:10,440
6 months I built up this badass 36 core

3
00:00:08,200 --> 00:00:15,400
dual xon server machine to handle our video encoding and transcoding tasks

4
00:00:12,639 --> 00:00:21,560
over the network here well fast forward almost a year and many many hours spent

5
00:00:19,359 --> 00:00:26,039
on diagnosis not to mention a kick in the right direction from this post over

6
00:00:23,199 --> 00:00:32,759
on Puget systems I think I finally figured out why we never got quite the

7
00:00:29,279 --> 00:00:35,040
performance that I expected is it

8
00:00:32,759 --> 00:00:41,760
possible then that a $4,000 22 core CPU could be outperformed

9
00:00:39,480 --> 00:00:47,160
by one that costs only a few hundred bucks for video encoding is it possible

10
00:00:45,039 --> 00:00:52,640
that I made a mistake nothing to hold on

11
00:00:50,000 --> 00:00:56,480
to fails a lot if they reading the sign I'm definitely getting their attention

12
00:00:54,199 --> 00:01:00,680
so does one of the recurring themes of these laptop or bus videos

13
00:00:59,239 --> 00:01:06,119
become line failure

14
00:01:02,399 --> 00:01:09,410
montages I I mean aside from those

15
00:01:06,119 --> 00:01:12,470
ones let's find

16
00:01:16,759 --> 00:01:24,920
out freshbooks is the super simple invoicing solution that lets you get

17
00:01:21,320 --> 00:01:26,640
organized save time and get paid faster

18
00:01:24,920 --> 00:01:30,960
click now at the link in the video description to try it for

19
00:01:28,520 --> 00:01:36,040
free Okay so to open this video up we need to take a closer than usual look at

20
00:01:33,560 --> 00:01:41,640
my test bench I wanted to eliminate bottlenecks wherever possible so that

21
00:01:38,280 --> 00:01:44,040
the CPU is the only factor in my

22
00:01:41,640 --> 00:01:49,479
performance evaluation so for that reason most of the performance testing

23
00:01:46,439 --> 00:01:54,479
was done on an Intel 750 series 1.2 TB

24
00:01:49,479 --> 00:01:57,320
ndme SSD a GTX Titan x 128 gigs of ddr4

25
00:01:54,479 --> 00:02:03,039
quad Channel memory on an x99 Deluxe 2 motherboard and the CPUs tested are as

26
00:01:59,920 --> 00:02:07,560
follows Intel's top of the server line

27
00:02:03,039 --> 00:02:10,640
2699 V4 22 coron they top of the

28
00:02:07,560 --> 00:02:11,840
high-end desktop line 10 core core i7

29
00:02:10,640 --> 00:02:19,519
extreme 6950x the 8 core and 6 core 6900k and

30
00:02:16,680 --> 00:02:25,120
6800k and finally I decided to throw in their Flagship mainstream 6700k quad

31
00:02:22,920 --> 00:02:31,080
core to give us the most complete picture possible at the end of the day

32
00:02:28,239 --> 00:02:35,000
as for the video tests I apologize in advance if the codec or encoder

33
00:02:33,400 --> 00:02:38,840
application that you personally prefer wasn't covered but this was done as much

34
00:02:37,200 --> 00:02:43,640
to optimize the Linus Media Group workflow as it was for the purposes of

35
00:02:40,840 --> 00:02:49,400
creating a video so I'm looking at four different scenarios that we encounter

36
00:02:45,920 --> 00:02:53,680
pretty much daily one transcoding a 4K

37
00:02:49,400 --> 00:02:55,800
mxf off of our Sony fs5 to 1080p copor

38
00:02:53,680 --> 00:03:01,400
our mezzanine codec of choice for editing two exporting a finished project

39
00:02:59,319 --> 00:03:07,680
in this case a green screened episode as fast as possible directly to h264 for

40
00:03:04,560 --> 00:03:10,440
publication to YouTube Three a quick

41
00:03:07,680 --> 00:03:14,920
export in cfor how we normally export so that a network media encoder machine

42
00:03:12,480 --> 00:03:21,040
with a watch folder can transcode it to h264 and automatically upload it to the

43
00:03:17,360 --> 00:03:24,879
channel and four finally the performance

44
00:03:21,040 --> 00:03:27,560
of that copor to h264 conversion with

45
00:03:24,879 --> 00:03:32,799
the 1080p to 4K upsampling that we perform for the reasons we covered more

46
00:03:29,360 --> 00:03:35,519
thoroughly in this video here so I ran

47
00:03:32,799 --> 00:03:39,920
every test with and without Cuda acceleration enabled in Adobe Media

48
00:03:37,360 --> 00:03:45,560
encoder and used a second machine to capture the screen output with CPU and

49
00:03:42,640 --> 00:03:52,560
GPU usage displayed so I could review it later let's begin then with scenario one

50
00:03:49,439 --> 00:03:55,560
this is what most people probably expect

51
00:03:52,560 --> 00:03:57,840
from a multicore CPU in a video encoding

52
00:03:55,560 --> 00:04:02,920
Benchmark traditionally this is one of the easiest workloads to scale AC

53
00:03:59,760 --> 00:04:04,879
crossmore course and our CPU usage graph

54
00:04:02,920 --> 00:04:09,840
indicates that all is working beautifully throwing a GPU into the mix

55
00:04:07,879 --> 00:04:14,680
levels the playing field somewhat but this won't surprise anyone who knows how

56
00:04:11,799 --> 00:04:20,919
GPU dependent a video Codec cineform is and how that bastard law of diminishing

57
00:04:17,519 --> 00:04:24,080
returns Works moving on to exporting a

58
00:04:20,919 --> 00:04:27,280
project directly from our cfor timeline

59
00:04:24,080 --> 00:04:30,680
in CPU only mode we see nice scaling

60
00:04:27,280 --> 00:04:32,800
with more cores but maybe not quite the

61
00:04:30,680 --> 00:04:37,240
dominance we'd expect from a chip with and yes I know it doesn't quite work

62
00:04:34,160 --> 00:04:40,440
this way like 60 GHz of theoretical

63
00:04:37,240 --> 00:04:44,919
total performance this is a hint of

64
00:04:40,440 --> 00:04:47,840
things to come and Bam throwing a GPU

65
00:04:44,919 --> 00:04:54,320
into the mix paints a much more extreme picture here the Cuda accelerated code

66
00:04:50,880 --> 00:04:57,759
path not only reaps very little benefit

67
00:04:54,320 --> 00:05:00,280
from more than six cores it punishes

68
00:04:57,759 --> 00:05:07,680
CPUs with lower clock speed in a way that I really didn't expect observed GPU

69
00:05:04,199 --> 00:05:10,720
usage is much lower than any other

70
00:05:07,680 --> 00:05:14,240
processor in this test for our $4,000

71
00:05:10,720 --> 00:05:17,840
chip and the CPU usage we see of about

72
00:05:14,240 --> 00:05:20,240
25% tells us this is not a heavily

73
00:05:17,840 --> 00:05:25,440
threaded workload oops all right so let's break that down

74
00:05:23,000 --> 00:05:31,880
then into the individual steps and find out where our heavy multi thousand

75
00:05:28,400 --> 00:05:34,319
investment in an Uber Zeon falls apart

76
00:05:31,880 --> 00:05:38,880
exporting the project from a cfor 1080p timeline to a cineform 1080p file

77
00:05:37,039 --> 00:05:43,280
theoretically Elsewhere on the network but I'm using my NVMe drive as a target

78
00:05:41,400 --> 00:05:48,880
for these benchmarks for consistency sake is pretty flat across the board and

79
00:05:46,120 --> 00:05:53,880
curiously this is true with or without Cuda acceleration enabled in media

80
00:05:50,600 --> 00:05:56,360
encoder GPU usage is 85% regardless of

81
00:05:53,880 --> 00:06:02,759
which drop down so this is clearly nearly 100% GPU dependent which leads us

82
00:06:00,160 --> 00:06:10,319
then to the second step in the process converting from cineform 1080 to h264 4K

83
00:06:07,039 --> 00:06:14,080
in CP mode only we see a similar Trend

84
00:06:10,319 --> 00:06:16,599
to our initial injest test more horses

85
00:06:14,080 --> 00:06:22,880
is better but only to a point then in GPU assisted mode there it is we are

86
00:06:20,120 --> 00:06:28,280
almost entirely Bound by per core performance with a lowly quad core

87
00:06:25,479 --> 00:06:33,680
costing one/ tenth as much handily beating our Zeon be

88
00:06:31,039 --> 00:06:38,000
so then did I horribly misconfigure our video encoding injest stations and

89
00:06:35,520 --> 00:06:42,520
output server our Zeon basically pointless NVIDIA

90
00:06:39,800 --> 00:06:48,280
work well if you're looking simply at the graphs I just showed you along with

91
00:06:44,759 --> 00:06:50,599
these charts of approximate CPU and GPU

92
00:06:48,280 --> 00:06:56,560
usage in all the different scenarios I tested then it's pretty clear that these

93
00:06:53,199 --> 00:06:58,879
lower clocked many core chips are being

94
00:06:56,560 --> 00:07:04,160
underutilized and the money though I fortunately didn't pay for them would be

95
00:07:00,759 --> 00:07:06,280
better invested almost anywhere else but

96
00:07:04,160 --> 00:07:10,400
as always the real world isn't really that simple and it's going to come down

97
00:07:07,879 --> 00:07:15,560
to the needs and workflow of each individual or organization

98
00:07:12,960 --> 00:07:20,560
virtualization can be used to get damn near 100% scaling out of as many cores

99
00:07:17,919 --> 00:07:25,199
as you please encoding software like Sor and squeeze can process many files at a

100
00:07:23,120 --> 00:07:31,440
time and on the subject of different software testing any given codec in any

101
00:07:28,960 --> 00:07:36,000
given soft software could yield very different results from what you're

102
00:07:32,599 --> 00:07:38,879
looking at here so there's no way around

103
00:07:36,000 --> 00:07:43,759
testing just make sure that when you do so for yourself you go in without any

104
00:07:41,960 --> 00:07:49,039
assumptions about what the right tool for the job will end up being so you can

105
00:07:46,479 --> 00:07:55,159
avoid pulling a Linus speaking of tools for the job it's

106
00:07:52,319 --> 00:07:58,840
summer apparently something something boarding Planes Trains driving a car

107
00:07:57,360 --> 00:08:05,039
leave your worries behind okay I don't know what any of that stuff in my notes is but today's sponsor is tunnel bear

108
00:08:03,080 --> 00:08:11,080
and if today's lack of online privacy brings out your inner grizzly

109
00:08:07,720 --> 00:08:14,639
bear ra then you can try tunnel bear

110
00:08:11,080 --> 00:08:16,720
it's simple and it is free to try at the

111
00:08:14,639 --> 00:08:21,560
link in the video description it's the easy to use VPN that makes it so you can

112
00:08:19,120 --> 00:08:26,199
browse privately and enjoy a more open internet without all that hassle

113
00:08:23,800 --> 00:08:31,319
associated with more complex VPN Solutions any you know port forwarding

114
00:08:28,560 --> 00:08:37,120
or DNS or any nonsense like that you just click the button and boom you can

115
00:08:34,279 --> 00:08:41,200
tunnel into up to 20 different countries and it will appear to the websites and

116
00:08:39,159 --> 00:08:44,640
services that you are using as though you are coming from that country and

117
00:08:43,039 --> 00:08:50,959
tunnel bear has a top rated privacy policy and does not log your activity so

118
00:08:48,240 --> 00:08:54,959
try it free with 500 megabytes and no credit card required and if you decide

119
00:08:53,320 --> 00:08:59,519
you like it and you want to get a year of unlimited data you can save 10% by

120
00:08:57,680 --> 00:09:03,920
going to tunnel bear.com T Linked In the video description so

121
00:09:02,600 --> 00:09:07,760
thanks for watching guys if this video sucked you know what to do but if it was awesome get subscribed hit that like

122
00:09:06,440 --> 00:09:12,720
button or even check out the link to where to buy the stuff we featured at

123
00:09:10,200 --> 00:09:16,040
Amazon in the video description also linked in the description is our merch

124
00:09:14,120 --> 00:09:19,399
store which has cool t-shirts just like this one and our community Forum which

125
00:09:17,839 --> 00:09:22,880
you should totally join now that you're done doing all that stuff you're probably wondering what to watch next so

126
00:09:21,560 --> 00:09:28,640
check out that little button in the top right to check out our latest video over

127
00:09:24,880 --> 00:09:28,640
on Channel Super Fun
